Syllabus
Unit 1
Introduction- Reinforcement Learning as MDP- Learnable Functions – Deep Reinforcement Learning Algorithms overview-Deep Learning for Reinforcement Learning- Reinforcement Learning and Supervised Learning. Reinforcement Learning Environment Design: States-Actions-Rewards-Transition Function.
Unit 2
REINFORCE: Policy- The Objective Function-The Policy Gradient- Policy Gradient Derivation- Monte Carlo Sampling- REINFORCE Algorithm-Implementing REINFORCE. SARSA: The Q- and V-Functions -Temporal Difference Learning- Intuition for Temporal Difference Learning- Action Selection in SARSA- Exploration and Exploitation- SARSA Algorithm- On-Policy Algorithms- Implementing SARSA. Deep Q-Networks (DQN): Learning the Q-Function in DQN-Action Selection in DQN-The Boltzmann Policy-Experience Replay- DQN Algorithm- Implementing DQN
Unit 3
Advantage Actor-Critic (A2C): The Actor- The Critic- The Advantage Function- Learning the Advantage Function- A2C Algorithm- Implementing A2C. Proximal Policy Optimization (PPO): Surrogate Objective-Proximal Policy Optimization (PPO) – PPO Algorithm- Implementing PPO
Objectives and Outcomes
Prerequisite(s): Nil Course
Objectives
- To introduce Reinforcement Learning??
- To introduce techniques used for training artificial neural networks?
- To enable design of deep learning models for classification and sequence analysis?
Course Outcomes
- CO1: Able to understand the mathematical basics of reinforcement learning.
- CO2: Able to understand the working of different types of Reinforcement Learning Agents.
- CO3: Able to formulate a problem as a Reinforcement Learning problem
- CO4: Able to implement Reinforcement Learning algorithms
CO – PO Mapping?
? |
PO1 |
PO2 |
PO3 |
PO4 |
PO5 |
PO6 |
PO7 |
PO8 |
PO9 |
PO10 |
PO11 |
PO12 |
PSO1 |
PSO2 |
CO1? |
3 |
3 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
CO2? |
3 |
3 |
– |
2 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
CO3? |
3 |
3 |
2 |
2 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
2 |
CO4? |
3 |
– |
2 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
2 |