Syllabus
Lab Component:
Installation and Configuration of Hadoop in two operating modes (Pseudo distributed & Fully distributed), Use web based tools to monitor the Hadoop setup, Perform the different file management tasks in HDFS, Run a basic Word Count program to understand Map Reduce paradigm, Stop word elimination using Map Reduce, Mining of large dataset to find the average, max and min values using Map Reduce, Tera Sort benchmark comparison for YARN, Setting up Hadoop cluster in AWS, Install PIG on Hadoop and write Pig Latin scripts to sort, group, join, project and filter your data, Install Hive on Hadoop and use it to create, alter and drop databases, tables, views, functions and indexes, Use Tableau/Google chart to visualize a dataset of your choice.
Unit I
Introduction to Big Data, Types of Digital Data, Characteristics of Big Data, Evolution of Big Data, Definition of Big Data, Data Appliance, Challenges with Big Data, Big data sources, Best practices in Big Data Analytics, Introduction to Data Modelling
Unit II
Introduction to elementary data analysis: Measures of center: Mean, Median, Mode, Variance, Standard deviation, Range, Normal Distribution :Center, Spread, Skewed Left, Skewed Right, Outlier, Correlation Patterns, Magnitude and Direction in relationship, Introduction to Bayesian Model
Unit III
History of Visualization, Goals of Visualization, Types of Data Visualization: Scientific Visualization, Information Visualization, Visual Analytics, Impact of visualization, Big Data Visualization Tools: Tableau, Google Chart
Unit IV
Introduction to Big Data Processing and Apache Hadoop, Installation and Configuration of Hadoop in Ubuntu, HDFS Concepts, Map Reduce Framework, Anatomy of a Map Reduce Job Run, Job Scheduling, Shuffle and Sort, Task Execution
Unit V
Introduction to Hadoop Eco System, Apache Hive, Apache Mahout, Apache Pig, Case studies: Analyzing big data with twitter, Big data for Ecommerce, Big data for blogs.
Text Books / References
TEXTBOOKS:
1) Seema Acharya, Subhasini Chellappan, “Big Data Analytics”, Wiley, 2015
2) Frank J Ohlhorst, “Big Data and Analytics: Turning Big Data into Big Money”, Wiley and SAS Business Series, 2012.
3) Tom White, “ Hadoop: The Definitive Guide” Third Edition, O’reily Media, 2012.
REFERENCES:
1) Michael C. Reingruber, William W. Gregory “The Data Modeling Handbook: A Best- Practice Approach to Building Quality Data Models”, Wiley QED publications, First Edition.
2) Philip Bobko, “Correlation and Regression: Applications for Industrial Organizational Psychology and Management”, First Edition
Reference for Lab Component:
1.https://hadoop.apache.org/docs/current/
- https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce- client- core/MapReduceTutorial.html
- https://pig.apache.org/
- https://hive.apache.org/
5. https://www.tableau.com/