Syllabus
Discipline Specific Electives: Business Analytics
Unit 1
Introduction to Big Data and Big Data Programming Models – Massively Parallel Processing (MPP) Database Systems – In-Memory Database Systems – MapReduce Systems – Bulk Synchronous Parallel (BSP) Systems, Big Data and Transactional Systems, Scaling of Database
Unit 2
Introduction to Hadoop, Components of Hadoop – Hadoop Distributed File System (HDFS), Hadoop 3.0 – Components of YARN , HDFS High Availability, Hadoop Program : Word Count in local mode versus cluster mode, Hadoop Administration : Hadoop Configuration Files, Configuring Hadoop Daemons, Precedence of Hadoop Configuration Files, Cluster Administration Utilities, Command Line HDFS Administration, Rebalancing HDFS Data – Copying Large Amounts of Data from the HDFS, Components of a MapReduce program, Basics of MapReduce Development : Hadoop and Data Processing, Working with large Datasets : Preparing the Development Environment – Preparing the Hadoop System – Word Count Implementation using map reduce – Introduction to Hadoop I/O, Hadoop Input/Output : Compression Schemes : What Can Be Compressed? – Compression Schemes
Hadoop in the Cloud – Economics – Self-Hosted Cluster – Cloud-Hosted Cluster – Elasticity – On Demand – Bid Pricing- Hybrid Cloud – Logistics Ingress/Egress – Data Retention – Security
– Cloud Usage Models – Cloud Providers – Amazon Web Services, Microsoft Azure – Choosing a Cloud Vendor – Case Study: Amazon Web Services – Elastic MapReduce – Elastic Compute Cloud
Unit 3
HBase, Architecture and role of HBase, HBase schema design, Basic programming for HBase, Combining the capabilities of HBase and HDFS, Log file Analysis.
Unit 4
Hive Architecture and Concepts, Data Definition Language, Data Manipulation Language, External Interfaces, Hive Scripts – Performance, MapReduce Integration, Creating Partitions – User- HiveQL Compiler Details.
Unit 5
Data Processing Using Pig: An Introduction to Pig, Running Pig, executing a Pig Script – Embedded Java Program, Pig Latin: Comments in a Pig Script – Execution of Pig Statements – Pig Commands, User-Defined Functions: Eval Functions Invoked in the Mapper – Eval Functions Invoked in the Reducer – Writing and Using a Custom Interfund, Comparison of PIG versus Hive, Understanding Automated Data processing with Oozie
Objectives and Outcomes
Objective:
To Expose to Big Data Technologies and Environment. Course Outcomes:
CO1: To gain knowledge on Big Data Technologies. CO2: To understand the framework of Big Data.
CO3: Ability to interact with Big Data Environment and analysis the data. CO4: Knowledge on various tools related to Big Data Analysis.
CO/PO |
PO1 |
PO2 |
PO3 |
PO4 |
PO5 |
PO6 |
PO7 |
PO8 |
PO9 |
PO10 |
PO11 |
PO12 |
CO1 |
3 |
2 |
3 |
2 |
1 |
1 |
2 |
2 |
2 |
3 |
3 |
2 |
CO2 |
3 |
3 |
3 |
2 |
2 |
1 |
2 |
2 |
2 |
3 |
3 |
2 |
CO3 |
2 |
3 |
3 |
2 |
2 |
2 |
2 |
2 |
2 |
3 |
3 |
3 |
CO4 |
3 |
2 |
3 |
2 |
1 |
1 |
2 |
2 |
2 |
3 |
3 |
2 |