Publication Type : Journal Article
Publisher : Discrete Event Dynamic Systems
Source : Discrete Event Dynamic Systems, Volume 26, Number 3, p.477–509 (2016)
Url : http://dx.doi.org/10.1007/s10626-015-0216-z
Campus : Bengaluru
School : School of Engineering
Department : Computer Science
Year : 2016
Abstract : We present in this article a two-timescale variant of Q-learning with linear function approximation. Both Q-values and policies are assumed to be parameterized with the policy parameter updated on a faster timescale as compared to the Q-value parameter. This timescale separation is seen to result in significantly improved numerical performance of the proposed algorithm over Q-learning. We show that the proposed algorithm converges almost surely to a closed connected internally chain transitive invariant set of an associated differential inclusion.
Cite this Research Publication : S. Bhatnagar and K., L., “Multiscale Q-learning with linear function approximation”, Discrete Event Dynamic Systems, vol. 26, pp. 477–509, 2016.