Back close

Course Detail

Course Name Database Management Systems for Data Science
Course Code 23CSE354
Program B. Tech. in Computer Science and Engineering (CSE)
Credits 3
Campus Amritapuri ,Coimbatore,Bengaluru, Amaravati, Chennai

Syllabus

PROFESSIONAL ELECTIVES

Electives Electives in Data Science

Unit I

Overview of Database and Database management systems – SQL – SQL for data science –Analysis with SQL – Data Analysis Workflow – Database types – Preparing data for analysis – Types of data – SQL query structure – Profiling – Distributions – Data quality – Deduplication with GROUP BY and DISTINCT – Data cleaning – Dealing with Nulls: coalesce, nullif, nvl Functions – Missing Data – Preparing: Shaping Data – BI, Visualization, Statistics, ML – Pivoting with CASE Statements – Unpivoting with UNION Statements – pivot and unpivot Functions

Unit II

Time Series Analysis – Date, Datetime, and Time Manipulations – Trending the Data – Cohorts – Cohort Analysis – Analysis Framework – Rolling Time Windows – Sparse Data – Analyzing with Seasonality – Retention – SQL for a Basic Retention Curve – Adjusting Time Series to Increase Retention Accuracy – Cohorts Derived from the Time Series – Defining the Cohort from a Separate Table – Dealing with Sparse Cohorts – Defining Cohorts from Dates Other Than the First Date – Related Cohort Analyses – Survivorship – Returnship, or Repeat Purchase Behavior – Cumulative Calculations – Cross-Section Analysis, Through a Cohort Lens

Unit III

Text Analysis with SQL – What Is Text Analysis – Why SQL Is a Good Choice for Text Analysis – When SQL Is Not a Good Choice – The UFO Sightings Data Set – Text Characteristics – Text Parsing – Text Transformations – Finding Elements Within Larger Blocks of Text – Wildcard Matches: LIKE, ILIKE – Exact Matches: IN, NOT IN – Regular Expressions – Constructing and Reshaping Text – Concatenation – Reshaping Text – Database and cloud – Built-in functions – python support for accessing databases.

SQL for anomaly detection – Experiment Analysis with SQL – Correlation Is Not Causation – Experiments with Binary Outcomes: The Chi-Squared Test – Experiments with Continuous Outcomes: The t-Test – Challenges – Variant Assignment – Outliers – Time Boxing – Pre-/Post-Analysis – Natural Experiment Analysis

Objectives and Outcomes

Course Objectives

  • To help students learn SQL for data preparation, data querying and analysing data stored in a database.
  • To make students thoroughly understand data analysis in specific applications such as time series analysis, cohort analysis, text analysis, anomaly detection and experiment analysis.

Course Outcomes

CO1: Understand how to use SQL query for data preparation, data cleaning, and profiling the data stored in

databases.

CO2: Apply SQL features to output data for business Intelligence tool for reports and dashboards creation.

CO3: Conduct Time series Data Analysis and cohort analysis to calculate rolling time windows, identify seasonal patterns, repeat behaviour, and cumulative actions.

CO4: Carry out text analysis using SQL functions.

CO5: Analyse data using experiment analysis techniques.

CO-PO Mapping

 PO/PSO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO
CO1 3 3 2 3 3 3 3 2
CO2 1 3 3 3 3 3 2 3 2
CO3 2 3 2 3 2 2 2 2 3 2
CO4 1 1 1 2 3 2
CO5 1 1

Evaluation Pattern

Evaluation Pattern: 70:30

Assessment Internal End Semester
Midterm 20
*Continuous Assessment (Theory) (CAT) 10
*Continuous Assessment (Lab) (CAL) 40
**End Semester 30 (50 Marks; 2 hours exam)

*CAT – Can be Quizzes, Assignments, and Reports

*CAL – Can be Lab Assessments, Project, and Report

**End Semester can be theory examination/ lab-based examination/ project presentation

Text Books / References

Textbook(s)

Cathy Tanimura, “SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights”, O’Reilly Media, 2021.

Richard Machina, “SQL Programming For Beginners: The Guide With Step by Step Processes on Data Analysis”, 2020.

 

Reference(s)

Anthony DeBarros, “Practical SQL, A Beginner’s Guide to Storytelling with Data”, 2nd Edition, No starch press, 2022.

Upom Malik, Matt Goldwasser, Benjamin Johnston, “SQL for Data Analytics: Perform fast and efficient data analysis with the power of SQL”, Packt Publishing, Year: 2019.

Silberschatz. A., Korth, H. F. and Sudharshan, S., “Database System Concepts”, 6th Edition, TMH, 2010.

Elmasri, R. and Navathe, S. B., “Fundamentals of Database Systems”, 5th Edition, Addison Wesley, 2006.

Date, C. J. , “An Introduction to Database Systems”, 8th Edition, Addison Wesley, 2003.

Ramakrishnan, R. and Gehrke, J., “Database Management Systems”, 3rd Edition, McGrawHill, 2003.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now