Mining of Massive Datasets - Amrita Vishwa Vidyapeetham

Course Detail

Course Name	Mining of Massive Datasets
Course Code	23CSE355
Program	B. Tech. in Computer Science and Engineering (CSE)
Credits	3
Campus	Amritapuri ,Coimbatore,Bengaluru, Amaravati, Chennai

Syllabus

PROFESSIONAL ELECTIVES

Electives Electives in Data Science

Unit I

Introduction to Spark: Spark Architecture, Spark Jobs and APIs. Resilient Distributed Datasets- Creating RDDs, Transformation, Actions. Dataframes- Python to RDD communications, Creating Dataframes, Dataframe queries. MLlib -Loading and Transforming the data. Implementation of Machine Learning algorithms such as Classification and Clustering using the MLlib.

Unit II

Approaches to Modelling- Importance of Words in Documents – Hash Functions- Indexes – Secondary Storage -The Base of Natural Logarithms – Power Laws – Map Reduce. Finding similar items: Shingling – LSH – Distance Measures. Mining Data Streams: Stream data model – Sampling data – Filtering streams. Link Analysis: Page Rank, Link Spam.

Unit III

Frequent Item Sets: Market Basket Analysis, A-Priori Algorithm – PCY Algorithm. Recommender Systems, Dimensionality Reduction -SVD, Big data Clustering: Clustering in Non-Euclidean Spaces, BFR, CURE. Structured Streaming: Spark Streaming, Application dataflow

Objectives and Outcomes

Pre-Requisite(s): 23CSEXXX Machine Learning

Course Objectives

To understand various scalable machine learning algorithms to solve big data problems.
To understand the SPARK architecture.
To implement Machine Learning algorithms using PySpark.

Course Outcomes

CO1: Understand how Machine learning algorithms are made scalable to solve big data problems.

CO2: Implement scalable Machine Learning algorithms using PySpark.

CO3: Apply and compare different strategies for big data analytics using various machine learning algorithms.

CO4: Understand Streaming algorithms to analyze voluminous and high dimensional data.

CO-PO Mapping

PO/PSO	PO1	PO2	PO3	PO4	PO5	PO6	PO7	PO8	PO9	PO10	PO11	PO12	PSO1	PSO2
CO	PO1	PO2	PO3	PO4	PO5	PO6	PO7	PO8	PO9	PO10	PO11	PO12	PSO1	PSO2
CO1	3	3	2	3	3				2				3	2
CO2	3	3	3	3	3				3	2	3		3	2
CO3	2	3	2	3	2				2	2	2		3	2
CO4	1	1	1	2	2				2	2	2		3	2

Evaluation Pattern

Evaluation Pattern: 70:30

Assessment	Internal	End Semester
Midterm	20
*Continuous Assessment Theory (CAT)	10
*Continuous Assessment Lab (CAL)	40
**End Semester		30 (50 Marks; 2 hours exam)

*CAT – Can be Quizzes, Assignments, and Reports

*CAL – Can be Lab Assessments, Project, and Report

**End Semester can be theory examination/ lab-based examination/ project presentation

Text Books / References

Textbook(s)

AnandRajaRaman, Jure Leskovec and J.D. Ullman, “Mining of Massive Data sets”, e-book, Publisher, 2014.

Reference(s)

Viktor Mayer-Schönberger, Kenneth Cukier, “Big Data: A Revolution That Will Transform How We Live, Work, and Think”, Houghton Mifflin Harcourt, 2013.

Bill Chambers, Matei Zaharia, “Spark: The Definitive Guide”, O’Reilly Media Inc,2018, ISBN: 9781491912218. Kevin P. Murphey, “Machine Learning, a Probabilistic Perspective”, The MIT Press Cambridge, Massachusetts, 2012.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Course

Course Detail

Syllabus

PROFESSIONAL ELECTIVES

Unit I

Unit II

Unit III

Objectives and Outcomes

Evaluation Pattern

Text Books / References

Interests

Programs

Research

About Amrita

Resources

Locations

Reports

About Amrita Vishwa Vidyapeetham

Amritapuri Campus

Amaravati Campus

Bengaluru Campus

Chennai Campus

Coimbatore Campus

Faridabad Campus

Kochi Campus

Mysuru Campus

Nagercoil Campus

Haridwar

Research

Programs

From the news

Others

Course

Course Detail

Syllabus

PROFESSIONAL ELECTIVES

Unit I

Unit II

Unit III

Objectives and Outcomes

Evaluation Pattern

Text Books / References

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus