Mining of Massive Datasets - Amrita Vishwa Vidyapeetham

Course Detail

Course Name	Mining of Massive Datasets
Course Code	24AI638
Program	M. Tech. in Artificial Intelligence
Semester	Soft Core
Credits	4
Campus	Amritapuri ,Coimbatore

Syllabus

Basics of Data Mining – Computational Approaches – Statistical Limits on Data Mining – Bonferroni’s Principle – Importance of Words in Documents – Hash Functions – Indexes – Secondary Storage – The Base of Natural Logarithms – Power Laws -MapReduce – Distributed File Systems- Algorithms Using MapReduce . Extensions to MapReduce. Finding Similar Items – Applications of Near-Neighbor Search – Shingling of Documents – Similarity-Preserving Summaries of Sets – Locality- Sensitive Hashing for Documents – Distance Measures.

Mining Data Streams: The Stream Data Model – Sampling Data in a Stream – Filtering Streams – Blooms Filter. Link Analysis: PageRank – Efficient Computation of PageRank – Topic-Sensitive PageRank – Link Spam. Frequent Itemsets: The Market-Basket Model – Market Baskets and the A-Priori Algorithm – Handling Larger Datasets in Main Memory- The Algorithm of Park, Chen, and Yu – The Multistage Algorithm – The Multi-hash Algorithm.

Clustering: Introduction to Clustering Techniques -Points, Spaces, and Distances – Clustering Strategies – The Curse of Dimensionality. Hierarchical Clustering – K-means Algorithms – The Algorithm of Bradley, Fayyad, and Reina – CURE algorithm – Clustering in Non-Euclidean Spaces. Recommendation Systems: A Model for Recommendation Systems – Content-Based Recommendations – Collaborative Filtering – UV Decomposition. Dimensionality Reduction. Mining Social-Network Graphs: Social Networks as Graphs – Clustering of Social-Network Graphs – Direct Discovery of Communities – Partitioning of Graphs – Finding Overlapping Communities – Simrank. Dimensionality Reduction: Eigenvalues and Eigenvectors of Symmetric Matrices- Principal-Component Analysis – Singular-Value Decomposition

Text Book / References

Jeffrey David Ullman, Jure Leskovec, Anand Rajaraman, “Mining of Massive Data Sets”, ebook, Cambridge University Press, 2020.
Jiawei Han, Micheline Kamber, Jian Pei, ‘Data Mining. Concepts and Techniques’, 3rd Edition (The Morgan Kaufmann Series in Data Management Systems), Elsevier, 2012.

Objectives and Outcomes

Preamble

With the rise of user-web interaction and networking, as well as technological advances in processing power and storage capability, the demand for effective and sophisticated knowledge discovery techniques has grown exponentially. Businesses need to transform large quantities of information into intelligence that can be used to make smart business decisions. The importance of data to business decisions, strategy and behavior has proven unparalleled in recent years. Predictive analytics, data mining and machine learning are tools giving us new methods for analyzing massive data sets. Companies place true value on individuals who understand and manipulate large data sets to provide informative outcomes.

Course Objectives

To acquire knowledge of existing systems and approaches to handle large scale data science problems.
Apply different data mining techniques to handle large amounts of data.
Analyze the performance of various data mining techniques when applied to diverse data sets.
To gain insights into the design of different machine learning algorithms applied for data mining.

Course Outcomes

COs	Description
CO1	Understand the importance of data to be applied for predictive analytics that considers data mining and machine learning as tools to analyze massive data sets.
CO2	Introduce the design and operation of various big-data systems like Hadoop, Spark and Hive.
CO3	Apply suitable data mining algorithms to handle huge document databases and infinite streams of data to mine large social networks and web graphs.
CO4	Use case studies as a powerful analytical tool that will provide first-hand insight into how big data problems and their solutions allow companies like Google to succeed in the market.
CO5	Design large scale machine learning algorithms with practical hands-on experience for analyzing very large amounts of data.

Prerequisites

Machine Learning.

CO-PO Mapping

COs	Description	PO1	PO2	PO3	PO4	PO5
CO1	Understand the importance of data to be applied for predictive analytics that considers data mining and machine learning as tools to analyze massive data sets.	3	1	–	–	–
CO2	Introduce the design and operation of various big-data systems like Hadoop, Spark and Hive.	2	2	–	–	3
CO3	Apply suitable data mining algorithms to handle huge document databases and infinite streams of data to mine large social networks and web graphs.	3	–	3	3	2
CO4	Use case studies as a powerful analytical tool that will provide first-hand insight into how big data problems and their solutions allow companies like Google to succeed in the market.	2	3	2	2	2
CO5	Design large scale machine learning algorithms with practical hands-on experience for analyzing very large amounts of data.	3	3	3	–	3

Evaluation Pattern

Evaluation Pattern – 70:30

Midterm Exam – 20%
Lab Assignments – 25%
Project – 25%
End Semester Exam – 30%

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Research

Centers

Patents

Course

Course Detail

Syllabus

Objectives and Outcomes

Evaluation Pattern

Interests

Programs

Research

About Amrita

Resources

Locations

Reports

About Amrita Vishwa Vidyapeetham

Amritapuri Campus

Amaravati Campus

Bengaluru Campus

Chennai Campus

Coimbatore Campus

Faridabad Campus

Kochi Campus

Mysuru Campus

Nagercoil Campus

Research

Programs

From the news

Others

Course

Course Detail

Syllabus

Objectives and Outcomes

Evaluation Pattern

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus