Syllabus
Unit 1
Measuring the central tendency – measuring the dispersion of data – graphic displays of basic descriptive data Summaries – Missing values – noisy data- data cleaning as a process – Data integration – data transformation – Data cube aggregation – attribute subset selection – dimensionality reduction.
Unit 2
Cluster Analysis using k–Means – k–Medoids – single linkage – complete linkage – UPGMA and expectation Maximization – Assessing clustering tendency – determining the number of clusters – measuring clustering quality k– nearest neighbor – Bayes – decision tree and Support Vector Machines (SVM) classifiers – Classifier accuracy Measures – evaluating the accuracy of a Classifier.
Unit 3
Efficient and Scalable Frequent Itemset Mining Methods- Mining Various Kinds of Association Rules- From Association Mining to Correlation Analysis- Constraint-Based Association Mining.
Lab Component
Experiments on machine learning and artificial intelligence algorithms using Matlab / Python.