Syllabus
Unit 1
Introduction, Causality and Experiments, Data Preprocessing: Data cleaning, Data reduction, Data transformation, Data discretization. Visualization and Graphing: Visualizing Categorical Distributions, Visualizing Numerical Distributions, Overlaid Graphs, plots, and summary statistics of exploratory data analysis. (15 hrs).
Unit 2
Randomness, Probability, Introduction to Statistics, Sampling, Sample Means and Sample Sizes, Descriptive statistics – Central tendency, dispersion, variance, covariance, kurtosis, five-point summary, Distributions, Bayes Theorem, Error Probabilities; Permutation Testing, Statistical Inference. (15 hrs)
Unit 3
Hypothesis Testing, Decisions and Uncertainty, Comparing Samples, A/B Testing, P-Values, Causality, Frequency Analysis, Assessing Models, Estimation, Prediction, Confidence Intervals, Inference for Regression, Classification, Graphical Models, Updating Predictions. (15 hrs)
Objectives and Outcomes
Course Outcomes:
CO1: Understand the various data visualization methods.
CO2: Understand the basics of the descriptive statistics.
CO3: Understand and apply the basic concepts of correlations and regressions to the given data.
CO4: Understand and apply the basic concepts of sampling techniques and simple hypothetical testing to the given data.
CO-PO Mapping
PO/PSO
|
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO6
|
PO7
|
PO8
|
PO9
|
PO10
|
PO11
|
PO12
|
PSO1
|
PSO2
|
PSO3
|
CO
|
CO1
|
2
|
2
|
|
|
|
|
|
|
|
|
|
|
|
1
|
|
CO2
|
2
|
2
|
|
|
|
|
|
|
|
|
|
|
|
1
|
|
CO3
|
2
|
3
|
|
|
|
|
|
|
|
|
|
|
|
2
|
|
CO4
|
2
|
3
|
|
|
|
|
|
|
|
|
|
|
|
2
|
|
Text Books / References
Textbook(s)
Adi Adhikari and John DeNero, “Computational and Inferential Thinking: The Foundations of Data Science”, e-book.
Reference(s)
- Data Mining for Business Analytics: Concepts, Techniques and Applications in R, by Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr., Wiley India, 2018.
- Rachel Schutt & Cathy O’Neil, “Doing Data Science” O’ Reilly, First Edition, 2013.