Course Detail

Course Name Reinforcement Learning
Course Code 23CSE477
Program B. Tech. in Computer Science and Engineering (CSE)
Credits 3
Campus Amritapuri ,Coimbatore,Bengaluru, Amaravati, Chennai



Electives in Artificial Intelligence

Unit I

Introduction to Reinforcement learning, Markov Decision Process (MDP) – Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs, Planning by Dynamic programming (DP) – Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions, model-free prediction and control.

Unit II

Integrating planning with learning – Model-based RL, Integrated Architecture and Simulation-based Search, Monte-Carlo (MC) Learning, Exploration and exploitation – Multi-arm Bandits, Contextual Bandits and MDP Extensions, integrating AI search and learning – Classical Games: Combining Minimax Search and RL.

Unit III

Hierarchical RL – Semi-Markov Decision Process, Learning with Options, Deep RL – Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Double Q-Learning, Multi-agent RL – Cooperative vs. Competitive Settings, Mixed Setting.

Objectives and Outcomes

Course Objectives

  • This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning.
  • It involves larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms.

Course Outcomes

CO1: Understand Markov decision process and reinforcement learning.

CO2: Apply AI search, planning, and learning.

CO3: Apply Hierarchical learning techniques.

CO4: Analyze Q-learning and multi-agent systems.

CO-PO Mapping

CO1 3 2 3 3 2 0 0 2 2 2 0 0 3 3
CO2 3 2 3 3 3 0 0 2 2 2 0 0 3 3
CO3 3 2 3 3 3 0 0 2 2 2 0 0 3 3
CO4 3 2 3 3 3 0 0 2 2 2 0 0 3 3

Evaluation Pattern

Evaluation Pattern: 70:30

Assessment Internal End Semester
Midterm 20
Continuous Assessment – Theory (*CAT) 10
Continuous Assessment – Lab (*CAL) 40
**End Semester 30 (50 Marks; 2 hours exam)

*CAT – Can be Quizzes, Assignments, and Reports

*CAL – Can be Lab Assessments, Project, and Report

**End Semester can be theory examination/ lab-based examination/ project presentation

Text Books / References


Richard S. Sutton and Andrew G. Barto; “Reinforcement Learning: An Introduction”; 2nd Edition, MIT Press, 2018.


Dimitri P. Bertsekas; “Reinforcement Learning and Optimal Control”; 1st Edition, Athena Scientific, 2019.

Dimitri P. Bertsekas; “Dynamic Programming and Optimal Control (Vol. I and Vol. II)”; 4th Edition, Athena Scientific, 2017.

Csaba Szepesvári; “Algorithms of Reinforcement Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning)”, Morgan & Claypool Publishers, 2010.

