Back close

Course Detail

Course Name Big Data Analytics
Course Code 24CS751
Program M. Tech. in Computer Science & Engineering
Semester Electives
Credits 3
Campus Coimbatore, Bengaluru, Nagercoil, Chennai

Syllabus

Characteristics of Big Data, Types of Big Data, Technologies for Big Data, Infrastructure for Big Data, Use of Data Analytics, Big Data Challenges, NoSQL, Comparison of SQL and NoSQL, Distributed Computing Challenges, Hadoop Ecosystem: HDFS (Hadoop Distributed File System), MapReduce: Inputs, Outputs, and Data Serialization, Managing Resources with Hadoop YARN, Interacting with the Hadoop Ecosystem, Functional Programming in Scala:

Basic Syntax, Type Inference, Parameters, Recursive Arbitrary Collections, ConsList, Arrays, Tail Recursion, Higher-Order Functions.

MapReduce Programming: Mapper, Reducer, Combiner, Partitioner, Real-Time MapReduce Applications, Data Serialization, Apache Spark: Resilient Distributed Datasets (RDDs), Creating RDDs, Lineage and Fault Tolerance, DAGs, Immutability, Task Division and Partitions, Transformations and Actions, Lazy Evaluations and Optimization, Formatting and Housing Data from Spark RDDs, Persistence

Hive Architecture: Hive Data Types, Hive File Format, Hive Query Language (HQL), User-Defined Functions (UDF) in Hive, Introduction to Machine Learning with Spark: MLlib, Building a Machine Learning Pipeline in Spark, Pig on Hadoop: Anatomy of Pig, Use Cases for Pig, ETL Processing, Data Types in Pig, Running Pig, Execution Modes of Pig, HDFS Commands, Relational Operators, Piggy Bank.

Summary

Pre-Requisite(s): None
Course Type: Theory

Course Objectives and Outcomes

Course Objectives

  • To understand the Big Data technologies and tools for storing and processing
  • To develop skills in analyzing and visualizing large datasets.
  • To explore machine learning for predictive analytics in Big Data Analytics

Course Outcomes

CO1: Understand fundamental concepts of Big Data and its significance in modern data-driven environments.

CO2: Apply various Big Data technologies and tools to effectively store, retrieve, and process Big Data.

CO3: Analyze and visualize large datasets to extract meaningful insights and support decision making processes.

CO4: Design and develop machine learning algorithms for predictive analytics and Big Data based frameworks.

CO5: Utilize functional programming and distributed computing principles to optimize Big Data processing and management.

CO-PO Mapping

CO PO1 PO2 PO3 PO4 PO5 PO6
CO1 3 1 1 1
CO2 3 2 1 2 2
CO3 2 2 1 1 2 2
CO4 2 2 1 1 2 1
CO5 1 1 1 1 2 1

Evaluation Pattern: 60/40

Assessment

Internal Weightage

External Weightage

Midterm Examination

30

 

Continuous Assessment

30

 

End Semester

 

40

Note: Continuous assessments can include quizzes, tutorials, lab assessments, case study and project reviews. Midterm and End semester exams can be a theory exam or lab integrated exam for two hours

Text Books/ References

  1. Seema Acharya, Subhashini Chellappan, “Big Data and Analytics”, Wiley Publication, 2015.
  2. Hurwitz JS, Nugent A, Halper F, Kaufman M. “Big Data for Dummies”, John Wiley & Sons, 2013.
  3. Tom White, “Hadoop: The Definitive Guide”, O’Reilly Publications, 2011.
  4. Kyle Banker, “Mongo DB in Action”, Manning Publications Company, 2012.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now