Syllabus
Unit I
Introduction to Data Mining: Introduction, What is Data Mining, Definition, KDD, Challenges, Data Mining Tasks, Data Preprocessing, Data Cleaning, Missing data, Dimensionality Reduction, Feature Subset Selection, Discretization and Binaryzation, Data Transformation; Measures of Similarity and Dissimilarity- Basics.
Unit II
Association Rules: Problem Definition, Frequent Item Set Generation, The APRIORI Principle, Support and Confidence Measures, Association Rule Generation; APRIOIRI Algorithm. Bayesian Belief Networks and Additional Topics Regarding Classification.
Unit III
Clustering: Problem Definition, Clustering Overview, Evaluation of Clustering Algorithms, Partitioning Clustering-K-Means Algorithm, K-Means Additional issues, PAM Algorithm; Hierarchical Clustering-Agglomerative Methods and divisive methods, Key Issues in Hierarchical Clustering, Strengths and Weakness.
Unit IV
Outlier Detection: Outliers and Outlier Analysis -What Are Outliers?, Types of Outliers ,Challenges of Outlier Detection, Outlier Detection Methods, Statistical Approaches, Parametric Methods, Nonparametric Methods, Proximity-Based Approaches, Clustering-Based Approaches, Classification-Based Approaches, Mining Contextual and Collective Outliers.
Unit V
Dimensionality Reduction: Principal-Component Analysis, Singular-Value Decomposition, and CUR Decomposition. Link Analysis: Page Rank, Efficient Computation of Page Rank, Topic-Sensitive Page Rank, Link Spam, Hubs and Authorities. Recommendation Systems: A Model for Recommendation Systems, Content-Based Recommendations, and the Netflix Challenge.
Objectives and Outcomes
Course Outcomes:
CO1: Familiarize data mining basic concepts and understand association rule mining.
CO2: Learn to implement clustering techniques on unsupervised data
CO3: Implementing various approaches for dealing with outliers
CO4: Capable of implementing dimensionality reduction techniques on massive datasets
CO5: Understanding the working process of recommendation systems
CO-PO Mapping:
|
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO5
|
PO6
|
PO7
|
PO8
|
PO9
|
PO10
|
PO11
|
PO12
|
CO1
|
2
|
3
|
2
|
2
|
2
|
2
|
2
|
|
|
|
|
2
|
2
|
CO2
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
|
|
|
|
2
|
2
|
CO3
|
2
|
2
|
2
|
2
|
2
|
3
|
2
|
|
|
|
|
2
|
2
|
CO4
|
2
|
2
|
1
|
2
|
2
|
2
|
2
|
|
|
|
|
1
|
2
|
CO5
|
1
|
2
|
1
|
1
|
1
|
2
|
2
|
|
|
|
|
1
|
2
|
Text Books / References
Text Books/ Reference Books and Websites:
- Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
- Rajaraman, A., & Ullman, J. D. (2011). Mining of massive datasets. Cambridge University Press.
- https://nptel.ac.in/courses/106/105/106105174/
- https://nptel.ac.in/content/storage2/nptel_data3/html/mhrd/ict/text/110105083/lec52.pdf
- Ngo, T. (2011). Data mining: practical machine learning tools and technique, by ian h. witten, eibe frank, mark a. hell. ACM SIGSOFT Software Engineering Notes, 36(5), 51-52.