Back close

Course Detail

Course Name Reinforcement Learning
Course Code 23CSE477
Program B. Tech. in Computer Science and Engineering (CSE)
Credits 3
Campus Amritapuri ,Coimbatore,Bengaluru, Amaravati, Chennai

Syllabus

PROFESSIONAL ELECTIVES

Electives in Artificial Intelligence

Unit I

Introduction to Reinforcement learning, Markov Decision Process (MDP) – Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs, Planning by Dynamic programming (DP) – Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions, model-free prediction and control.

Unit II

Integrating planning with learning – Model-based RL, Integrated Architecture and Simulation-based Search, Monte-Carlo (MC) Learning, Exploration and exploitation – Multi-arm Bandits, Contextual Bandits and MDP Extensions, integrating AI search and learning – Classical Games: Combining Minimax Search and RL.

Unit III

Hierarchical RL – Semi-Markov Decision Process, Learning with Options, Deep RL – Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Double Q-Learning, Multi-agent RL – Cooperative vs. Competitive Settings, Mixed Setting.

Objectives and Outcomes

Course Objectives

  • This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning.
  • It involves larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms.

Course Outcomes

CO1: Understand Markov decision process and reinforcement learning.

CO2: Apply AI search, planning, and learning.

CO3: Apply Hierarchical learning techniques.

CO4: Analyze Q-learning and multi-agent systems.

CO-PO Mapping

 PO/PSO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO
CO1 3 2 3 3 2 0 0 2 2 2 0 0 3 3
CO2 3 2 3 3 3 0 0 2 2 2 0 0 3 3
CO3 3 2 3 3 3 0 0 2 2 2 0 0 3 3
CO4 3 2 3 3 3 0 0 2 2 2 0 0 3 3

Evaluation Pattern

Evaluation Pattern: 70:30

Assessment Internal End Semester
Midterm 20
Continuous Assessment – Theory (*CAT) 10
Continuous Assessment – Lab (*CAL) 40
**End Semester 30 (50 Marks; 2 hours exam)

*CAT – Can be Quizzes, Assignments, and Reports

*CAL – Can be Lab Assessments, Project, and Report

**End Semester can be theory examination/ lab-based examination/ project presentation

Text Books / References

Textbook(s)

Richard S. Sutton and Andrew G. Barto; “Reinforcement Learning: An Introduction”; 2nd Edition, MIT Press, 2018.

Reference(s)

Dimitri P. Bertsekas; “Reinforcement Learning and Optimal Control”; 1st Edition, Athena Scientific, 2019.

Dimitri P. Bertsekas; “Dynamic Programming and Optimal Control (Vol. I and Vol. II)”; 4th Edition, Athena Scientific, 2017.

Csaba Szepesvári; “Algorithms of Reinforcement Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning)”, Morgan & Claypool Publishers, 2010.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now