Syllabus
Unit I
Introduction to Big Data: Types of Digital Data-Characteristics of Data – Evolution of Big Data – Definition of Big Data – Challenges with Big Data – 3Vs of Big Data – Non Definitional traits of Big Data – Business Intelligence vs. Big Data – Data warehouse and Hadoop environment.
Unit II
Big Data Analytics: Classification of analytics – Data Science – Terminologies in Big Data Data science process – roles, stages in data science project – working with data from files –– exploring data – managing data – cleaning and sampling for modeling and validation. working with relational databases – NoSQL: Types of Databases – Advantages – NewSQL – SQL vs. NOSQL vs NewSQL.
Unit III
Introduction – distributed file system – Hadoop Components – Architecture – HDFS – algorithms using map reduce, Matrix-Vector Multiplication by Map Reduce – Hadoop – Understanding the Map Reduce architecture – Writing Hadoop Map Reduce Programs – Loading data into HDFS – Executing the Map phase – Shuffling and sorting – Reducing phase execution. Hadoop 2 (YARN): Architecture – Interacting with Hadoop Eco systems.
Unit IV
No SQL databases: Mongo DB: Introduction – Features – Data types – Mongo DB Query language – CRUD operations – Arrays – Functions: Count – Sort – Limit – Skip – Aggregate – Map Reduce. Cursors – Indexes – Mongo Import – Mongo Export. Cassandra: Introduction – Features – Data types – CQLSH – Key spaces – CRUD operations – Collections – Counter – TTL – Alter commands – Import and Export – Querying System tables
Unit V
Hadoop Eco systems: Hive – Architecture – data type – File format – HQL – SerDe – User defined functions – Pig: Features – Anatomy – Pig on Hadoop – Pig Philosophy – Pig Latin overview – Data types – Running pig – Execution modes of Pig – HDFS commands – Relational operators – Eval Functions – Complex data type – Piggy Bank – User defined Functions – Parameter substitution – Diagnostic operator.
Objectives and Outcomes
Course Outcomes:
CO1: Familiarize the concepts of Big Data
CO2: Understanding the aspects of managing, cleaning and sampling of Data
CO3: Understanding Hadoop architecture and implement Map Reduce concept
CO4: Managing and querying No SQL databases
CO5: Understanding and executing HDFS using PIG and HIVE
CO-PO Mapping:
|
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO5
|
PO6
|
PO7
|
PO8
|
PO9
|
PO10
|
PO11
|
PO12
|
CO1
|
2
|
3
|
2
|
2
|
2
|
2
|
2
|
|
|
|
|
2
|
2
|
CO2
|
2
|
2
|
2
|
2
|
2
|
2
|
2
|
|
|
|
|
2
|
2
|
CO3
|
2
|
2
|
2
|
2
|
2
|
3
|
2
|
|
|
|
|
2
|
2
|
CO4
|
2
|
2
|
1
|
2
|
2
|
2
|
2
|
|
|
|
|
1
|
2
|
CO5
|
1
|
2
|
1
|
1
|
1
|
2
|
2
|
|
|
|
|
1
|
2
|
Text Books / References
Text Books / Reference Books:
- Seema Acharya, Subhashini Chellappan, “Big Data and Analytics”, Wiley Publication, 2015.
- Judith Hurwitz, Alan Nugent, Dr. Fern Halper, Marcia Kaufman, “Big Data for Dummies”, John Wiley & Sons, Inc., 2013.
- Data Science and big data analytics : Discovering, analyzing , visualizing and presentating data, EMC Education Services, John Wiley 2015
- Tom White, “Hadoop: The Definitive Guide”, O’Reilly Publications, 2011.
- Kyle Banker, “Mongo DB in Action”, Manning Publications Company, 2012.
- Russell Bradberry, Eric Blow, “Practical Cassandra A developers Approach“, Pearson Education, 2014.