Back close

Course Detail

Course Name Foundations of Data Science
Course Code 21CS643
Program M. Tech. in Computer Science & Engineering
Semester Soft Core
Credits 4
Campus Coimbatore, Bengaluru, Nagercoil, Chennai

Syllabus

Introduction to Data Science, Causality and Experiments, Python libraries for data wrangling: Basics of Numpy arrays, aggregations, computations on arrays, fancy indexing, structured arrays. Data Pre-processing – Data manipulation with Pandas – Data indexing and selection – Operating on data – Missing data – Hierarchical indexing – Combining datasets – Aggregation and

grouping – Pivot tables – Data cleaning – Data reduction – Data transformation. Visualization and Graphing: Visualizing Categorical Distributions – Visualizing Numerical Distributions-Overlaid Graphs and plots- Summary statistics of exploratory data analysis- Randomness- Probability- Introduction to Statistics. Ethics and privacy.

Sampling, Sample Means and Sample Sizes. Probability distributions and density functions (univariate and multivariate), Error Probabilities; Expectations and moments; Covariance and correlation; Sampling and Empirical distributions; Permutation Testing, Statistical Inference; Central Limit Theorem, Hypothesis testing of means, proportions, variances and correlations – Assessing Models – Decisions and Uncertainty, Comparing Samples – A/B Testing, P-Values, Causality. Estimation – Resampling and Bootstrap – Confidence Intervals, Properties of Mean – Variability of mean -Choosing Sample Size – Graphical Models. Case Studies

Time Series Analysis: Time Series patterns – Statistical fundamentals – Descriptive statistics – Measuring errors – Correlation and Covariance – Autocorrelations –ACF and PACF- Stationarity-Durbin-Watson Statistic-Overview of Univariate methods – Moving averages; WMA; Exponential smoothing – ARIMA model identification- ARIMA(1,0,0)- ARIMA(0,0,1)– ARIMA(0,1,0)- Akaike Information Criterion – Schwarz Bayesian Information Criterion.

Summary

Pre-Requisite(s): Basic Probability
Course Type: Lab

Course Objectives and Outcomes

Course Objectives

  • Statistical foundations of data science.
  • Techniques to pre-process raw data; (data wrangling, munging) with Numpy, Pandas and other Python statistical packages; visualization with Matplotlb, Plotly and Bokeh; EDA; statistical inferences
  • Predictions using statistical tests
  • Estimation of statistical parameters
  • Introduction to Time Series.

Course Outcomes
CO1: Understand the statistical foundations of data science.
CO2: Apply pre-processing techniques over raw data, conduct exploratory data analysis, create insightful visualizations, and identify patterns to enable further analysis
CO3: Identify machine learning algorithms for prediction/classification and to derive insights
CO4: Analyze the degree of certainty of predictions using statistical test and models.
CO5: Explore the statistical foundations of time series, and employ basic ARIMA models for time series prediction.

CO-PO Mapping

CO PO1 PO2 PO3 PO4 PO5 PO6
CO1 1
CO2 1 1 1 3
CO3 3 1 1 2 3
CO4 3 1 1 2 2
CO5 3 3 1 3 3

Evaluation Pattern: 70/30

Assessment Internal Weightage External Weightage
Midterm Examination 20
Continuous Assessment (Theory) 10
Continuous Assessment (Lab) 40
End Semester 40

Note: Continuous assessments can include quizzes, tutorials, lab assessments, case study and project reviews. Midterm and End semester exams can be a theory exam or lab integrated exam for two hours

Text Books/References

  1. AniAdhikari and John DeNero, ”Computational and Inferential Thinking: The Foundations of Data Science”, e-book.
  2. Joel Grus, ”Data Science from Scratch: First Principles with Python”, 2/e, O’Reilly Media, 2019.
  3. Jake VanderPlas, Python Data Science Handbook: Essential Tools For Working With Data, O’Reilly Media, 2016.
  4. Wes McKinney, Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition, O’Reilly Media, 2022.
  5. Cathy O’Neil and Rachel Schutt,”Doing Data Science”, O’Reilly Media, 2013.
  6. Stephen A. Delurgio. Forecasting Principles and Applications, McGraw-Hill International Editions; 1998.
  7. Jason Brownlee, Introduction to Time Series forecasting with Python, 2017.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now