Back close

Course Detail

Course Name Big Data Analytics and Hadoop
Course Code 24ASD511
Program M.Sc. in Applied Statistics and Data Analytics
Semester 2
Credits 4
Campus Coimbatore , Kochi

Syllabus

Unit I

Introduction to Big Data: Types of Digital Data-Characteristics of Data – Evolution of Big Data – Definition of Big Data – Challenges with Big Data – 3Vs of Big Data – Non Definitional traits of Big Data – Business Intelligence vs. Big Data – Data warehouse and Hadoop environment.

Unit II

Big Data Analytics: Classification of analytics – Data Science – Terminologies in Big Data Data science process – roles, stages in data science project – working with data from files –– exploring data – managing data – cleaning and sampling for modeling and validation. working with relational databases – NoSQL: Types of Databases – Advantages – NewSQL – SQL vs. NOSQL vs NewSQL.

Unit III

Introduction – distributed file system – Hadoop Components – Architecture – HDFS – algorithms using map reduce, Matrix-Vector Multiplication by Map Reduce – Hadoop – Understanding the Map Reduce architecture – Writing Hadoop Map Reduce Programs – Loading data into HDFS – Executing the Map phase – Shuffling and sorting – Reducing phase execution. Hadoop 2 (YARN): Architecture – Interacting with Hadoop Eco systems.

Unit IV

No SQL databases: Mongo DB: Introduction – Features – Data types – Mongo DB Query language – CRUD operations – Arrays – Functions: Count – Sort – Limit – Skip – Aggregate – Map Reduce. Cursors – Indexes – Mongo Import – Mongo Export. Cassandra: Introduction – Features – Data types – CQLSH – Key spaces – CRUD operations – Collections – Counter – TTL – Alter commands – Import and Export – Querying System tables

Unit V

Hadoop Eco systems: Hive – Architecture – data type – File format – HQL – SerDe – User defined functions – Pig: Features – Anatomy – Pig on Hadoop – Pig Philosophy – Pig Latin overview – Data types – Running pig – Execution modes of Pig – HDFS commands – Relational operators – Eval Functions – Complex data type – Piggy Bank – User defined Functions – Parameter substitution – Diagnostic operator.

Objectives and Outcomes

Course Outcomes:

CO1: Familiarize the concepts of Big Data

CO2: Understanding the aspects of managing, cleaning and sampling of Data

CO3: Understanding Hadoop architecture and implement Map Reduce concept

CO4: Managing and querying No SQL databases

CO5: Understanding and executing HDFS using PIG and HIVE

CO-PO Mapping:

 

PO1

PO2

PO3

PO4

PO5

PO5

PO6

PO7

PO8

PO9

PO10

PO11

PO12

CO1

2

3

2

2

2

2

2

       

2

2

CO2

2

2

2

2

2

2

2

       

2

2

CO3

2

2

2

2

2

3

2

       

2

2

CO4

2

2

1

2

2

2

2

       

1

2

CO5

1

2

1

1

1

2

2

       

1

2

Text Books / References

Text Books / Reference Books:

  1. Seema Acharya, Subhashini Chellappan, “Big Data and Analytics”, Wiley Publication, 2015.
  2. Judith Hurwitz, Alan Nugent, Dr. Fern Halper, Marcia Kaufman, “Big Data for Dummies”, John Wiley & Sons, Inc., 2013.
  3. Data Science and big data analytics : Discovering, analyzing , visualizing and presentating data, EMC Education Services, John Wiley 2015
  4. Tom White, “Hadoop: The Definitive Guide”, O’Reilly Publications, 2011.
  5. Kyle Banker, “Mongo DB in Action”, Manning Publications Company, 2012.
  6. Russell Bradberry, Eric Blow, “Practical Cassandra A developers Approach“, Pearson Education, 2014.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now