Back close

Course Detail

Course Name Data Science
Course Code 19EAC313
Program B. Tech. in Electronics and Computer Engineering
Semester 6
Year Taught 2019

Syllabus

Module I

Introduction: What is Data Science? Big Data and Data Science – Datafication – Current landscape of perspectives – Skill sets needed; Matrices – Matrices to represent relations between data, and necessary linear algebraic operations on matrices -Approximately representing matrices by decompositions (SVD and PCA); Statistics: Descriptive Statistics: distributions and probability – Statistical Inference: Populations and samples – Statistical modeling – probability distributions – fitting a model – Hypothesis Testing.

Module II

Data pre-processing: Data cleaning – data integration – Data Reduction Data Transformation and Data Discretization. Evaluation of classification methods – Confusion matrix, Students T-tests and ROC curvesExploratory Data Analysis – Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA – The Data Science Process.

Module III

Basic Machine Learning Algorithms: Association Rule mining – Linear Regression- Logistic Regression – Classifiers – k-Nearest Neighbors (k-NN), k-means -Decision tree – Naive Bayes- Ensemble Methods – Random Forest. Feature Generation and Feature Selection – Feature Selection algorithms – Filters; Wrappers; Decision Trees; Random Forests.

Data Visualization: Basic principles, ideas and tools for data visualization.

Objectives and Outcomes

Course Objectives

  • To gain useful conclusions from large and diverse data sets through exploration, prediction, and inference.

Course Outcomes

  • CO1: Ability to understand the statistical foundations of data science.
  • CO2: Ability to apply pre-processing techniques over raw data so as to enable further analysis.
  • CO3: Ability to conduct exploratory data analysis and create insightful visualizations to identify patterns.
  • CO4: Ability to identify machine learning algorithms for predictions and classification.
  • CO5: Ability to analyze the degree of certainty of predictions using statistical test and models

CO – PO Mapping

PO/PSO/
CO
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 1 3 2
CO2 1 1 1 3 3 2
CO3 3 1 1 2 3 3 2
CO4 3 1 1 2 2 3 2
CO5 3 3 1 3 2 3 2

Textbook / References

Textbook(s)

  • Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014.
  • Jiawei Han, MichelineKamber and Jian Pei, “Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011.
  • Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.
  • Matt Harrison, “Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization, O’Reilly, 2016.
  • Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly Media, 2015.
  • Wes McKinney, “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, O’Reilly Media, 2012.
  • GalitShmueli, Peter C Bruce, InbalYahav, Nitin R Patel, Kenneth C Lichtendahl Jr. “Data Mining for Business Analytics: Concepts, Techniques, and Applications in R” ISBN: 978-1-118-87936-8, Wiley.

Evaluation Pattern 50:50 (Internal: External)

Assessment Internal External
Periodical 1 (P1) 15
Periodical 2 (P2) 15
*Continuous Assessment (CA) 20
End Semester 50
*CA – Can be Quizzes, Assignment, Projects, and Reports.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now