Back close

Course Detail

Course Name Big Data Storage and Analysis
Course Code 24CSC541
Program Integrated M. Sc. Mathematics and Computing
Credits 3
Campus Coimbatore

Syllabus

Introduction: Scaling with Traditional Databases – NoSQL need – First Princples – Desired Properties- Lambda Architectures. Batch Layer- Big data model – properties – fact based modeling – graph schemas – Apache Thrift,

Data Storage on Batch Layers – Requirements- Solutions- Distributed File Systems and Partitioning- Hadoop basics, Computing on Batch Layer- Algorithms-Scalability-MapReduce, Batch Layer Architecture and Algorithms – Design Overview and Workflow, Ingesting New Data, Normalization.

Serving Layer- Performance Metrics, Requirements and Design, ElephantDB. Speed Layer- Realtime Views, Cassandra basics, Query and Stream Processing , Apache Storm

Text Books / References

Nathan Marz, James Warren, “Big Data: Principles and best practices of scalable real-time data systems”, Manning Publications 2015.

REFERENCES:

  1. Tom White, “Hadoop – The Definitive Guide”, O′Reilly; 3 edition (12 June 2012) Randy Abernethy, “Programmer’s Guide to Apache Thrift”, Manning Publications, 2019 https://thrift.apache.org/
  2. Jeff Carpenter, Eben Hewitt, “Cassandra: The Definitive Guide: Distributed Data at Web Scale”, 2nd Edition, O’Reilly, 2016
  3. Ankit Jain, “Mastering Apache Storm”, Packt Publishing, 2017, https://www.elephantsql.com.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now