Basics of Data Mining – Computational Approaches – Statistical Limits on Data Mining – Bonferroni’s Principle – MapReduce – Distributed File Systems . MapReduce . Algorithms Using MapReduce .
Extensions to MapReduce. Finding Similar Items – Applications of Near-Neighbor Search – Shingling of Documents – Similarity-Preserving Summaries of Sets – Locality-Sensitive Hashing for Documents – Distance Measures
Mining Data Streams: The Stream Data Model – Sampling Data in a Stream – Filtering Streams. Link Analysis: PageRank – Efficient Computation of PageRank – Topic-Sensitive PageRank – Link Spam. Frequent Itemsets : The Market-Basket Model – Market Baskets and the A-Priori Algorithm – Handling Larger Datasets in Main Memory. Clustering: Introduction to Clustering Techniques – Hierarchical Clustering – K-means Algorithms – CURE algorithm.
Recommendation Systems: A Model for Recommendation Systems – Content-Based Recommendations – Collaborative Filtering – Dimensionality Reduction. Mining Social- Network Graphs: Social Networks as Graphs – Clustering of Social-Network Graphs – Direct Discovery of Communities – Partitioning of Graphs – Finding Overlapping Communities – Simrank. Dimensionality Reduction: Eigenvalues and Eigenvectors of Symmetric Matrices- Principal-Component Analysis – Singular-Value Decomposition . Large-Scale Machine Learning – Machine-Learning Model – Perceptrons – Support-Vector Machines .