Machine Learning for Big Data - Amrita Vishwa Vidyapeetham

Course Detail

Course Name	Machine Learning for Big Data
Course Code	24AI731
Program	M. Tech. in Artificial Intelligence
Credits	3
Campus	Amritapuri ,Coimbatore

Syllabus

Introduction to Spark : Spark Architecture, Spark Jobs and APIs. Resilient Distributed Datasets- Creating RDDs, Transformation, Actions. Dataframes- Python to RDD communications, Creating Dataframes, Dataframe queries. MLlib -Loading and Transforming the data. Implementation of Machine Learning algorithms such as Classification and Clustering using the MLlib

Approaches to Modelling- Importance of Words in Documents – Hash Functions- Indexes – Secondary Storage -The Base of Natural Logarithms – Power Laws – Map Reduce. Finding similar items: Shingling – LSH – Distance Measures. Mining Data Streams: Stream data model – Sampling data – Filtering streams. Link Analysis: Page Rank, Link Spam.

Frequent Item Sets: Market Basket Analysis, A-Priori Algorithm – PCY Algorithm, Big data Clustering: Clustering in Non-Euclidean Spaces, BFR, CURE. Structured Streaming: Spark Streaming, Application dataflow. Coresets: Coresets for K-means, K -median clustering

Objectives and Outcomes

Preamble

This course deals with two aspects of big data analytics. The first one is the infrastructure for big data analytics. Introduction to tools and algorithms that can be used to generate models from big data and to scale those models up to big data problems. Spark framework is the chosen platform. The second is the understanding and implementation of scalable and streaming algorithms to analyze voluminous data that is growing exponentially

Course Objectives

To understand various scalable machine learning algorithms to solve big data problems.
To understand the SPARK architecture
To implement Machine Learning algorithms using PySpark

Course Outcomes

COs	Description
CO1	Understand and explain how machine learning algorithm is made scalable to solve big data problems.
CO2	Implement scalable Machine Learning algorithms using PySpark.
CO3	Apply and compare different strategies for big data analytics using various machine learning algorithms
CO4	Understand Streaming algorithms and Coreset concept to analyze high dimensional data

Prerequisites

Machine Learning.

CO-PO Mapping

COs	Description	PO1	PO2	PO3	PO4	PO5
CO1	Understand and explain how machine learning algorithm is made scalable to solve big data problems.	3	–	–	–	–
CO2	Implement scalable Machine Learning algorithms using PySpark.	3	3	3	3	3
CO3	Apply and compare different strategies for big data analytics using various machine learning algorithms	3	2	1	1	–
CO4	Understand Streaming algorithms and Coreset concept to analyze high dimensional data	3	2	1	1	–

Evaluation Pattern

Evaluation Pattern – 70:30

Midterm Exam – 20%
Lab Assignments – 25%
Project – 25%
End Semester Exam – 30%

Text Books / References

Text Book / References

Anand Raja Raman, Jure Leskovec and J.D. Ullman, “Mining of Massive Data sets”, e-book, Publisher, 2014.
Kevin P. Murphey, “Machine Learning, a Probabilistic Perspective”, The MIT Press Cambridge, Massachusetts, 2012.
Tomasz Drabas, Denny Lee , ”Learning Pyspark”, Packt, February 2017.
Jeff M. Phillips, ”Coresets and Sketches”, arXiv:1601.00617,2016

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Research

Centers

Patents

Course

Course Detail

Syllabus

Objectives and Outcomes

Evaluation Pattern

Text Books / References

Interests

Programs

Research

About Amrita

Resources

Locations

Reports

About Amrita Vishwa Vidyapeetham

Amritapuri Campus

Amaravati Campus

Bengaluru Campus

Chennai Campus

Coimbatore Campus

Faridabad Campus

Kochi Campus

Mysuru Campus

Nagercoil Campus

Research

Programs

From the news

Others

Course

Course Detail

Syllabus

Objectives and Outcomes

Evaluation Pattern

Text Books / References

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus