9/28 |
Overview of MDP models: Finite horizon problems & the dynamic programming algorithm.
- W. B. Powell, Approximate Dynamic Programming, 2nd Edition, 2011.
- D. P. Bertsekas, Dynamic Programming & Optimal Control, Vol. 2, 4th Edition, 2012.
- M. L. Puterman, Markov Decision Processes, 1994.
|
Notes (Jing Yang)
|
9/4 |
Overview of MDP models: Infinite horizon & value/policy iteration; Intro to ADP.
- W. B. Powell, Approximate Dynamic Programming, 2nd Edition, 2011.
- D. P. Bertsekas, Dynamic Programming & Optimal Control, Vol. 2, 4th Edition, 2012.
- M. L. Puterman, Markov Decision Processes, 1994.
- W. B. Powell, Approximate Dynamic Programming, 2nd Edition, 2011.
|
Notes (Ziyue Sun)
|
9/11 |
Variants of the Value Iteration Algorithm.
- D. P. Bertsekas, Dynamic Programming & Optimal Control, Vol. 2, 4th Edition, 2012.
|
Notes (Ziyue Sun, Tarik Bilgic)
|
9/18 |
Asynchronous VI and Bellman Reformulations.
- D. P. Bertsekas, Dynamic Programming & Optimal Control, Vol. 2, 4th Edition, 2012.
- W. B. Powell, Approximate Dynamic Programming, 2nd Edition, 2011.
|
Notes (Mingyuan Xu)
|
9/25 |
Q-Learning and Stochastic Approximation.
- D. P. Bertsekas, J. N. Tsitsiklis, Neuro-Dynamic Programming, 1996.
- H. Robbins, S. Monro, A stochastic approximation method, The Annals of Mathematical Statistics, 1951.
|
Notes (Shaoning Han)
|
10/2 |
Convergence of Q-Learning and DQN.
- D. P. Bertsekas, J. N. Tsitsiklis, Neuro-Dynamic Programming, 1996.
- C. J. Watkins, P. Dayan, Q-learning, Machine Learning, 1992.
- J. N. Tsitsiklis, Asynchronous stochastic approximation and Q-learning, Machine Learning, 1994
- V. Mnih, et. al., Human-level control through deep reinforcement learning, Nature, 2018.
|
Notes (Kamal Basulaiman)
|
10/9 |
Value Function Approximation, Fitted VI, and the LP Approach.
- D. P. Bertsekas, J. N. Tsitsiklis, Neuro-Dynamic Programming, 1996.
- D. P. De Farias, B. Van Roy, On the existence of fixed points for approximate value iteration and temporal-difference learning, Journal of Optimization Theory and Applications, 2000.
- D. P. De Farias, B. Van Roy The LP approach to approximate dynamic programming, Operations Research, 2003.
- D. P. De Farias, B. Van Roy On constraint sampling in the LP approach to approximate dynamic programming, Mathematics of Operations Research, 2004.
|
Notes (Shaoning Han, Mingyuan Xu)
|
10/23 |
Policy Evaluation and TD Learning.
- D. P. Bertsekas, Dynamic Programming & Optimal Control, Vol. 2, 4th Edition, 2012.
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, 2018.
- W. B. Powell and H. Topaloglu, Approximate dynamic programming for large-scale resource allocation problems, Tutorials in Operations Research, 2012.
|
Notes (Tarik Bilgic, Shaoning Han)
|
10/30 |
Q-Learning for Optimal Stopping.
- J. N. Tsitsiklis, B. Van Roy, Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to high-dimensional financial derivatives, IEEE Transactions on Automatic Control, 1999.
- D. P. Bertsekas, J. N. Tsitsiklis, Neuro-Dynamic Programming, 1996.
|
Notes (Kamal Basulaiman, Jing Yang)
|
11/13 |
Aggregation and Feature-based Value Iteration for Control, Natural Policy Gradient.
- J. N. Tsitsiklis, B. Van Roy, Feature-based methods for large scale dynamic programming. Machine Learning, 1996.
- S. M. Kakade. A natural policy gradient. Advances in Neural Information Processing Systems, 2002.
|
Notes (Mingyuan Xu, Tarik Bilgic)
|
11/20 |
Approximate Policy Iteration and Fitted VI.
- D. P. Bertsekas, Dynamic Programming & Optimal Control, Vol. 2, 4th Edition, 2012.
- D. P. Bertsekas, Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, 2011.
- R. Munos, C. Czepesvari Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 2008.
|
Notes (Boyuan Lai, Ibrahim El Shar)
|
11/30 |
Policy Gradient Methods and Dual PI.
- R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 2000.
- S. Kunnumkal, H. Topaloglu. Using stochastic approximation methods to compute optimal base-stock levels in inventory control problems. Operations Research, 2008.
- W. Sun, G. J. Gordon, B. Boots, J. A. Bagnell, Dual policy iteration. Advances in Neural Information Processing Systems, 2018.
|
Notes (Jing Yang)
|
12/4 |
Benchmarking and Information Relaxation.
- D. B. Brown, J. E. Smith, P. Sun, Information relaxations and duality in stochastic dynamic programs. Operations Research, 2010.
|
Notes (Ibrahim El Shar, Boyuan Lai)
|