Course Syllabus
Computational Statistics- Probability concepts, Sampling Concepts, Generating Random Variables, Exploratory Data Analysis, Monte Carlo Methods for Inferential Statistics, Data Partitioning, Probability Density Estimation, Statistical Pattern Recognition, Nonparametric Regression. Data Mining- data mining algorithms-Instance and Features, Types of Features (data), Concept Learning and Concept Description, Output of data mining Knowledge Representation; Decision Trees- Classification and Regression trees constructing.
Classification trees, Algorithm for Normal Attributes, Information Theory and Information. Entropy, Building tree, Highly-Branching Attributes, ID3 to c4.5, CHAID, CART, Regression Trees, Model Trees, Pruning. Preprocessing and Post processing in data mining – Steps in Preprocessing, Discretization, Manual Approach, Binning, Entropy- based Discretization, Gaussian Approximation, K-tile method, Chi Merge, Feature extraction, selection and construction, Feature extraction, Algorithms, Feature selection, Feature construction, Missing Data, Post processing. Association Rule Mining- The Apriori Algorithm. Multiple Regression Analysis, Logistic Regression, k- Nearest Neighbor Classification, Constructing new attributes for algorithms of decision trees. Induction, Quick, Unbiased and Efficient Statistical tree.