Introduction to Data Science, Causality and Experiments, Python libraries for data wrangling: Basics of Numpy arrays, aggregations, computations on arrays, fancy indexing, structured arrays. Data Pre-processing – Data manipulation with Pandas – Data indexing and selection – Operating on data – Missing data – Hierarchical indexing – Combining datasets – Aggregation and
grouping – Pivot tables – Data cleaning – Data reduction – Data transformation. Visualization and Graphing: Visualizing Categorical Distributions – Visualizing Numerical Distributions-Overlaid Graphs and plots- Summary statistics of exploratory data analysis- Randomness- Probability- Introduction to Statistics. Ethics and privacy.
Sampling, Sample Means and Sample Sizes. Probability distributions and density functions (univariate and multivariate), Error Probabilities; Expectations and moments; Covariance and correlation; Sampling and Empirical distributions; Permutation Testing, Statistical Inference; Central Limit Theorem, Hypothesis testing of means, proportions, variances and correlations – Assessing Models – Decisions and Uncertainty, Comparing Samples – A/B Testing, P-Values, Causality. Estimation – Resampling and Bootstrap – Confidence Intervals, Properties of Mean – Variability of mean -Choosing Sample Size – Graphical Models. Case Studies
Time Series Analysis: Time Series patterns – Statistical fundamentals – Descriptive statistics – Measuring errors – Correlation and Covariance – Autocorrelations –ACF and PACF- Stationarity-Durbin-Watson Statistic-Overview of Univariate methods – Moving averages; WMA; Exponential smoothing – ARIMA model identification- ARIMA(1,0,0)- ARIMA(0,0,1)– ARIMA(0,1,0)- Akaike Information Criterion – Schwarz Bayesian Information Criterion.