Syllabus
Unit I
Introduction – Overview of the Data Mining Process – The Steps in Data Mining – Preliminary Steps – Predictive Power and Over fitting – Building a Predictive Model – Data Exploration and Dimension Reduction – Data Visualization – Dimension Reduction – Correlation Analysis – Reducing the Number of Categories in Categorical Variables – Converting a Categorical Variable to a Numerical Variable -Principal Components Analysis – Performance Evaluation – Evaluating Predictive Performance – Judging Classifier Performance.
Unit II
Prediction and Classification Methods – Multiple Linear Regression – Explanatory vs. Predictive Modeling – Estimating the Regression Equation and Prediction – The k-NN Classifier (Categorical Outcome) – The Naive Bayes Classifier – Classification and Regression Trees – Evaluating the Performance of a Classification Tree – Avoiding Overfitting – Logistic Regression – Neural Nets – Fitting a Network to Data – Discriminant Analysis – Classification Performance of Discriminant Analysis – Combining Methods: Ensembles and Uplift Modeling – Association Rules and Collaborative Filtering – Cluster Analysis – Measuring Distance – Hierarchical (Agglomerative) Clustering – The k-Means Algorithm.
Unit III
Forecasting Time Series – Descriptive vs. Predictive Modeling. Popular Forecasting Methods in Business – Regression-Based Forecasting – A Model with Trend – A Model with Seasonality – A Model with Trend and Seasonality – Autocorrelation and ARIMA Models – Smoothing Methods – Introduction – Moving Average – Simple Exponential Smoothing – Data Analytics – Social Network Analytics – Directed vs. Undirected Networks – Visualizing and Analyzing Networks – Using Network Metrics in Prediction and Classification -Text Mining – The Tabular Representation of Text: Term-Document Matrix and “Bag-of-Words” – Bag-of-Words vs. Meaning Extraction at Document Level – Preprocessing the Text – Implementing Data Mining Methods-Case Studies.
Objectives and Outcomes
Course Objectives
- The course presents an applied approach to data mining concepts and methods using Python software for illustration.
- Students will learn how to implement a variety of popular data mining algorithms to tackle business problems and opportunities.
- It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining, and network analysis.
Course Outcomes
CO1: Apply data mining processes, visualize data spread, and employ various techniques of data reduction such as
PCA, build predictive models, and evaluate the models
CO2: Apply feature extraction techniques and design a solution for a classification problem employing Regression,
NB Classifier, and Decision trees and their variants
CO3: Apply ARIMA and other forecasting methods in business
CO4: Implement Data Analytics on social networks
CO5: Apply knowledge of text representation for extraction and display of embedded information
CO-PO Mapping
PO/
PSO
|
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO6
|
PO9
|
PO10
|
PO11
|
PO12
|
PSO1
|
PSO2
|
CO
|
CO1
|
2
|
3
|
3
|
|
3
|
|
|
|
3
|
3
|
|
|
3
|
2
|
CO2
|
2
|
|
3
|
|
|
|
|
|
|
|
|
|
3
|
2
|
CO3
|
2
|
|
3
|
3
|
2
|
|
|
|
3
|
3
|
|
|
3
|
2
|
CO4
|
2
|
1
|
2
|
|
2
|
|
|
|
|
|
|
|
3
|
2
|
CO5
|
3
|
2
|
|
|
|
|
|
|
|
|
|
|
3
|
2
|
Evaluation Pattern
Evaluation Pattern: 70:30
Assessment
|
Internal
|
End Semester
|
MidTerm Exam
|
20
|
|
Continuous Assessment – Theory (*CAT)
|
10
|
|
Continuous Assessment – Lab (*CAL)
|
40
|
|
**End Semester
|
|
30 (50 Marks; 2 hours exam)
|
*CAT – Can be Quizzes, Assignments, and Reports
*CAL – Can be Lab Assessments, Project, and Report
**End Semester can be theory examination/ lab-based examination/ project presentation
Text Books / References
Textbook(s)
Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl Jr KC. “Data mining for business analytics: concepts, techniques, and applications”, R. John Wiley & Sons; 2017.
Reference(s)
VanderPlas J. “Python data science handbook: essential tools for working with data”. ” O’Reilly Media, Inc.”; 2016.
McKinney W. “Python for data analysis: Data wrangling with Pandas, NumPy, and IPython”. ” O’Reilly Media, Inc.”; 2012.