Syllabus
Basics of Data Mining – computational approaches – statistical limits on data mining – MapReduce – Distributed File Systems . MapReduce. Algorithms using MapReduce. Extensions to MapReduce. Mining Data Streams: The Stream Data Model – Sampling Data in a Stream – Filtering Streams. Link analysis, Frequent itemsets, Clustering, Advertising on web, Recommendation system, Mining Social-Network Graphs, Dimensionality Reduction, Large-Scale Machine Learning.
Objectives and Outcomes
Course Outcomes:
CO1: Understand the basics of data mining and its limitations.
CO2: Gain knowledge about data mining streams.
CO3: Understand the clustering techniques for data mining.
CO4: Apply the dimensionality reduction algorithm for social network analysis.
CO-PO Mapping:
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO5
|
PO6
|
PO7
|
PO8
|
PO9
|
PO10
|
PO11
|
PO12
|
CO1
|
2
|
2
|
2
|
2
|
2
|
2
|
–
|
–
|
–
|
–
|
1
|
1
|
CO2
|
3
|
3
|
2
|
2
|
2
|
2
|
–
|
–
|
–
|
–
|
1
|
1
|
CO3
|
2
|
2
|
3
|
2
|
2
|
2
|
–
|
–
|
–
|
–
|
1
|
1
|
CO4
|
3
|
3
|
3
|
2
|
2
|
2
|
–
|
–
|
–
|
–
|
1
|
1
|
Text Books / References
Text Books / References Books
- Jure Leskovec , Anand Rajaraman, Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press, 2014.
2. Tom White, Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale O’Reilly Media; 4th edition, 2015.