Back close

Course Detail

Course Name Mining of Massive Datasets
Course Code 24AI638
Program M. Tech. in Artificial Intelligence
Semester Soft Core
Credits 4
Campus Amritapuri ,Coimbatore

Syllabus

Basics of Data Mining – Computational Approaches – Statistical Limits on Data Mining – Bonferroni’s Principle – Importance of Words in Documents – Hash Functions – Indexes – Secondary Storage – The Base of Natural Logarithms – Power Laws -MapReduce – Distributed File Systems- Algorithms Using MapReduce . Extensions to MapReduce. Finding Similar Items – Applications of Near-Neighbor Search – Shingling of Documents – Similarity-Preserving Summaries of Sets – Locality- Sensitive Hashing for Documents – Distance Measures.

 

Mining Data Streams: The Stream Data Model – Sampling Data in a Stream – Filtering Streams – Blooms Filter. Link Analysis: PageRank – Efficient Computation of PageRank – Topic-Sensitive PageRank – Link Spam. Frequent Itemsets: The Market-Basket Model – Market Baskets and the A-Priori Algorithm – Handling Larger Datasets in Main Memory- The Algorithm of Park, Chen, and Yu – The Multistage Algorithm – The Multi-hash Algorithm.

 

Clustering: Introduction to Clustering Techniques -Points, Spaces, and Distances – Clustering Strategies – The Curse of Dimensionality. Hierarchical Clustering – K-means Algorithms – The Algorithm of Bradley, Fayyad, and Reina – CURE algorithm – Clustering in Non-Euclidean Spaces. Recommendation Systems: A Model for Recommendation Systems – Content-Based Recommendations – Collaborative Filtering – UV Decomposition. Dimensionality Reduction. Mining Social-Network Graphs: Social Networks as Graphs – Clustering of Social-Network Graphs – Direct Discovery of Communities – Partitioning of Graphs – Finding Overlapping Communities – Simrank. Dimensionality Reduction: Eigenvalues and Eigenvectors of Symmetric Matrices- Principal-Component Analysis – Singular-Value Decomposition

 

Text Book / References

 

  1. Jeffrey David Ullman, Jure Leskovec, Anand Rajaraman, “Mining of Massive Data Sets”, ebook, Cambridge University Press, 2020.
  2. Jiawei Han, Micheline Kamber, Jian Pei, ‘Data Mining. Concepts and Techniques’, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems), Elsevier, 2012.

Objectives and Outcomes

Preamble

With the rise of user-web interaction and networking, as well as technological advances in processing power and storage capability, the demand for effective and sophisticated knowledge discovery techniques has grown exponentially. Businesses need to transform large quantities of information into intelligence that can be used to make smart business decisions. The importance of data to business decisions, strategy and behavior has proven unparalleled in recent years. Predictive analytics, data mining and machine learning are tools giving us new methods for analyzing massive data sets. Companies place true value on individuals who understand and manipulate large data sets to provide informative outcomes.

 

Course Objectives

  • To acquire knowledge of existing systems and approaches to handle large scale data science problems.
  • Apply different data mining techniques to handle large amounts of data.
  • Analyze the performance of various data mining techniques when applied to diverse data sets.
  • To gain insights into the design of different machine learning algorithms applied for data mining.

 

Course Outcomes

COs

Description

CO1

Understand the importance of data to be applied for predictive analytics that considers data mining and machine learning as tools to analyze massive data sets.

CO2

Introduce the design and operation of various big-data systems like Hadoop, Spark and Hive.

CO3

Apply suitable data mining algorithms to handle huge document databases and infinite streams of data to mine large social networks and web graphs. 

CO4

Use case studies as a powerful analytical tool that will provide first-hand insight into how big data problems and their solutions allow companies like Google to succeed in the market.

CO5

Design large scale machine learning algorithms with practical hands-on experience for analyzing very large amounts of data.

 

Prerequisites

  • Machine Learning.

CO-PO Mapping

 

COs

Description

PO1

PO2

PO3

PO4

PO5

CO1

Understand the importance of data to be applied for predictive analytics that considers data mining and machine learning as tools to analyze massive data sets.

3

1

CO2

Introduce the design and operation of various big-data systems like Hadoop, Spark and Hive.

2

2

3

CO3

Apply suitable data mining algorithms to handle huge document databases and infinite streams of data to mine large social networks and web graphs. 

3

3

3

2

CO4

Use case studies as a powerful analytical tool that will provide first-hand insight into how big data problems and their solutions allow companies like Google to succeed in the market.

2

3

2

2

2

CO5

Design large scale machine learning algorithms with practical hands-on experience for analyzing very large amounts of data.

3

3

3

3

Evaluation Pattern

Evaluation Pattern – 70:30

 

  • Midterm Exam – 20%
  • Lab Assignments – 25%
  • Project – 25%
  • End Semester Exam – 30%

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now