Basics of Data Mining – Computational Approaches – Statistical Limits on Data Mining – Bonferroni’s Principle – Importance of Words in Documents – Hash Functions – Indexes – Secondary Storage – The Base of Natural Logarithms – Power Laws -MapReduce – Distributed File Systems- Algorithms Using MapReduce . Extensions to MapReduce. Finding Similar Items – Applications of Near-Neighbor Search – Shingling of Documents – Similarity-Preserving Summaries of Sets – Locality- Sensitive Hashing for Documents – Distance Measures.
Mining Data Streams: The Stream Data Model – Sampling Data in a Stream – Filtering Streams – Blooms Filter. Link Analysis: PageRank – Efficient Computation of PageRank – Topic-Sensitive PageRank – Link Spam. Frequent Itemsets: The Market-Basket Model – Market Baskets and the A-Priori Algorithm – Handling Larger Datasets in Main Memory- The Algorithm of Park, Chen, and Yu – The Multistage Algorithm – The Multi-hash Algorithm.
Clustering: Introduction to Clustering Techniques -Points, Spaces, and Distances – Clustering Strategies – The Curse of Dimensionality. Hierarchical Clustering – K-means Algorithms – The Algorithm of Bradley, Fayyad, and Reina – CURE algorithm – Clustering in Non-Euclidean Spaces. Recommendation Systems: A Model for Recommendation Systems – Content-Based Recommendations – Collaborative Filtering – UV Decomposition. Dimensionality Reduction. Mining Social-Network Graphs: Social Networks as Graphs – Clustering of Social-Network Graphs – Direct Discovery of Communities – Partitioning of Graphs – Finding Overlapping Communities – Simrank. Dimensionality Reduction: Eigenvalues and Eigenvectors of Symmetric Matrices- Principal-Component Analysis – Singular-Value Decomposition
Text Book / References
- Jeffrey David Ullman, Jure Leskovec, Anand Rajaraman, “Mining of Massive Data Sets”, ebook, Cambridge University Press, 2020.
- Jiawei Han, Micheline Kamber, Jian Pei, ‘Data Mining. Concepts and Techniques’, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems), Elsevier, 2012.