Introduction to Machine Learning and its various types, Motivation and Introduction to Reinforcement Learning, Multi arm Bandits, Markov Decision Process, Value functions; Dynamic programming: Policy evaluation and improvement, Value iteration and Policy iteration algorithms
Value prediction problems: Temporal difference learning in finite state spaces Algorithms for large state spaces Control: Closed loop interactive learning, online and active learning in bandits, Q learning in finite MDPs, Q learning with function approximation,
On policy approximation of action values: Value Prediction with Function Approximation, Gradient- Descent Methods, Policy approximation: Actor critic methods, Monte Carlo Methods: Monte-carlo prediction, estimation of action values, off policy prediction via importance sampling,