Publications
RL theory monograph: A monograph on RL theory based on notes from courses taught by Nan Jiang at UIUC and together with Sham Kakade at UW. The notes are being actively updated, and any feedback, typos etc. are welcome.
Ph.D. Thesis
Recent preprints
Journal Publications
- Practical Evaluation and Optimization of Contextual Bandit Algorithms
with Alberto Bietti and John Langford To appear in Journal of Machine Learning Research.
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
with Sham Kakade, Jason Lee and Gaurav Mahajan In Journal of Machine Learning Research, Vol. 22, 2020.
- Active Learning for Cost-Sensitive Classification
with Akshay Krishnamurthy, T.-K. Huang, Hal Daumé and John Langford In Journal of Machine Learning Research, Vol. 20, 2019.
- Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization
with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon to appear in SIAM Journal of Optimization.
- Exact Recovery of Sparsely Used Overcomplete Dictionaries
with Anima Anandkumar and Praneeth Netrapalli In IEEE Transactions on Information Theory, Vol. 63, Issue 1, 2017.
- A Reliable Effective Terascale Linear Learning System
with Olivier Chappelle, Miroslav Dudik and John Langford In Journal of Machine Learning Research, Vol. 15, 2014.
- The Generalization Ability of Online Algorithms for Dependent Data
with John Duchi In IEEE Transactions on Information Theory, Vol. 59, Issue 1, 2013.
- Stochastic convex optimization with bandit feedback
with Dean Foster, Daniel Hsu, Sham Kakade and Alexander Rakhlin In SIAM Journal on Optimization, Vol. 23, Issue 1, 2013.
- Ergodic Mirror Descent
with John Duchi, Mikael Johansson and Mike Jordan In SIAM Journal on Optimization, Vol. 22, Issue 4, 2012.
- Fast global convergence of gradient methods for high-dimensional statistical recovery
with Sahand Negahban and Martin Wainwright In The Annals of Statistics, Vol. 40, Number 5, 2012.
- Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions (Annals formatted version)
with Sahand Negahban and Martin Wainwright In The Annals of Statistics, Vol. 40, Number 2, July 2012.
- Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization
with Peter Bartlett, Pradeep Ravikumar and Martin Wainwright In IEEE Transcations on Information Theory, Vol 58, Issue 5, May 2012.
- Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
with John Duchi and Martin Wainwright In IEEE Transactions on Automatic Control, Vol. 57, Issue 3, 2012.
- Message-passing for graph structured linear programs: Proximal projections, convergence and rounding schemes
with Pradeep Ravikumar and Martin Wainwright In Journal Of Machine Learning Research, Vol. 11, 2010.
Conference Publications (see Google Scholar page for most updated version)
- Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling
with Tong Zhang. In COLT 2022
- Minimax Regret Optimization for Robust Machine Learning under Distribution Shift
with Tong Zhang. In COLT 2022
- Adversarially Trained Actor Critic for Offline Reinforcement Learning (Outstanding paper award)
with Ching-An Cheng, Tengyang Xie and Nan Jiang. In ICML 2022
- Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
with Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang and Wen Sun. In ICML 2022
- Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
with Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy and John Langford. In ICLR 2022
- Bellman-consistent Pessimism for Offline Reinforcement Learning
with Tengyang Xie, Ching-An Cheng, Nan Jiang and Paul Mineiro. In NeurIPS 2021
- Provably Correct Optimization and Exploration with Non-linear Policies
with Fei Feng, Lin Yang and Wotao Yin. In ICML 2021
- Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
with Andrea Zanette and Ching-An Cheng. In COLT 2021
- Towards a Dimension-Free Understanding of Adaptive Linear Control
with Juan Perdomo, Max Simchowitz and Peter Bartlett. In COLT 2021
- PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
with Mikael Henaff, Sham Kakade and Wen Sun. In NeurIPS 2020
- Provably Good Batch Reinforcement Learning Without Great Exploration
with Yao Liu, Adith Swaminathan and Emma Brunskill. In NeurIPS 2020
- Policy Improvement from Multiple Experts
with Ching-An Cheng and Andrey Kolobov. In NeurIPS 2020
- Safe Reinforcement Learning via Curriculum Induction
with Matteo Turchetta, Andrey Kolobov, Shital Shah and Andreas Krause. In NeurIPS 2020
- FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
with Sham Kakade, Akshay Krishnamurthy and Wen Sun. In NeurIPS 2020, oral presentation
- Taking a hint: How to leverage loss predictors in contextual bandits?
with Chen-Yu Wei and Haipeng Luo. In COLT 2020
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
with Sham Kakade, Jason Lee and Gaurav Mahajan In COLT 2020
- On the Optimality of Sparse Model-Based Planning for Markov Decision Processes
with Sham Kakade and Lin Yang. In COLT 2020
- Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
with Jordan Ash, Chicheng Zhang, Akshay Krishnamurthy and John Langford. In ICLR 2020
- Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations
with Aditya Modi, Debadeepta Dey, Adith Swaminathan, Besmira Nushi, Sean Andrist and Eric Horvitz. In AAAI 2020
- Off-Policy Policy Gradient with State Distribution Correction
with Yao Liu, Adith Swaminathan and Emma Brunskill. In UAI 2019
- Fair Regression: Quantitative Definitions and Reduction-based Algorithms
with Steven Wu and Miro Dudik. In ICML 2019
- Provably efficient RL with Rich Observations via Latent State Decoding
with Simon Du, Akshay Krishnamurthy, Nan Jiang, Miro Dudik and John Langford. In ICML 2019
- Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
with Chicheng Zhang, Hal Daumé, John Langford and Sahand Negahban. In ICML 2019
- Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
with Wen Sun, Nan Jiang, Akshay Krishnamurthy and John Langford. In COLT 2019
- On Polynomial Time PAC Reinforcement Learning with Rich Observations
with Christoph Dann, Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire. In NeurIPS 2018
- A Reductions Approach to Fair Classification
with Alina Beygelzimer, Miro Dudik, John Langford and Hanna Wallach. In ICML 2018
- Practical Contextual Bandits with Regression Oracles
with Dylan Foster, Haipeng Luo, Miro Dudik and Rob Schapire. In ICML 2018
- Hierarchical Imitation and Reinfocement Learning
with Hoang Le, Nan Jiang, Miro Dudik, Yisong Yue and Hal Daumé. In ICML 2018
- Efficient Contextual Bandits in Non-stationary Worlds
with Haipeng Luo, Chen-Yu Wei and John Langford. In COLT 2018
- Off-policy evaluation for slate recommendation
with Adith Swaminathan, Akshay Krishnamurthy, Miro Dudik, John Langford, Damien Jose and Imed Zitouni In NIPS 2017, oral presentation
- Corralling a Band of Bandit Algorithms
with Haipeng Luo, Behnam Neyshabur and Rob Schapire In COLT 2017
- Active Learning for Cost-Sensitive Classification
with Akshay Krishnamurthy, T-K Huang, Hal Daumé III and John Langford In ICML 2017
- Contextual Decision Processes with Low Bellman Rank are PAC-Learnable
with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire In ICML 2017
- Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
with Yu-Xiang Wang and Miro Dudik In ICML 2017
- Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations
with Akshay Krishnamurthy and John Langford In NIPS 2016
- Efficient Second Order Online Learning by Sketching
with Haipeng Luo, Nicolo Cesa-Bianchi and John Langford In NIPS 2016
- Efficient Contextual Semi-Bandit Learning
with Akshay Krishnamurthy and Miro Dudik In NIPS 2016
- Fast Convergence of Regularized Learning in Games (Best paper award)
with Vasilis Syrgkanis, Haipeng Luo and Rob Schapire In NIPS 2015
- Efficient and Parsimonious Agnostic Active Learning
with T-K Huang, Daniel Hsu, John Langford and Rob Schapire In NIPS 2015
- Learning to Search Better Than Your Teacher
with Kai-Wei Chang, Akshay Krishnamurthy, Hal Daumé and John Langford In ICML 2015
- A Lower Bound for the Optimization of Finite Sums
with Léon Bottou In ICML 2015
- Scalable Nonlinear Learning with Adaptive Polynomial Expansions
with Alina Beygelzimer, Daniel Hsu, John Langford and Matus Telgarsky In NIPS 2014
- Learning sparsely used overcomplete dictionaries
with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon In COLT 2014
- Robust Multi-Objective Learning with Mentor Feedback
with Ashwinkumar BV, Miro Dudik, Rob Schapire and Alex Slivkins In COLT 2014
- Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
with Daniel Hsu, Satyen Kale, John Langford, Lihong Li and Rob Schapire In ICML 2014
- Least Squares Revisited: Scalable Approaches for Multi-class Prediction
with Sham Kakade, Nikos Karampatziakis, Le Song and Greg Valiant In ICML 2014
- Selective sampling algorithms for cost-sensitive multiclass prediction (long version with proofs)
In ICML 2013
- Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions (Long version)
with Sahand Negahban and Martin Wainwright In NIPS 2012
- Contextual Bandit Learning with Predictable Rewards
with Miroslav Dudik, Satyen Kale, John Langford and Robert Schapire In AISTATS 2012
- Stochastic convex optimization with bandit feedback
with Dean Foster, Daniel Hsu, Sham Kakade and Alexander Rakhlin In NIPS 2011
- Distributed Delayed Stochastic Optimization (Long version)
with John Duchi In NIPS 2011
- Ergodic Subgradient Descent
with John Duchi, Mikael Johansson and Mike Jordan In Allerton 2011
- Learning with Missing Features
with Afshin Rostamizadeh and Peter Bartlett In UAI 2011
- Oracle inequalities for computationally budgeted model selection (Long version)
with John Duchi, Peter Bartlett and Clement Levrard In COLT 2011
- Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
with Sahand Negahban and Martin Wainwright In ICML 2011
- DIStributed Dual Averaging In Networks
with John Duchi and Martin Wainwright In NIPS 2010.
- Convergence rates of gradient methods for high-dimensional statistical recovery
with Sahand Negahban and Martin Wainwright In NIPS 2010, oral presentation
- Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback (longer version with additional proofs)
with Ofer Dekel and Lin Xiao In COLT 2010.
- Optimal Allocation Strategies for the Dark Pool Problem
with Peter Bartlett and Max Dama In AISTATS 2010.
- Information-theoretic lower bounds on the oracle complexity of convex optimization
with Peter Bartlett, Pradeep Ravikumar and Martin Wainwright In NIPS 2009.
- A Stochastic View of Optimal Regret through Minimax Duality
with Jake Abernethy, Alexander Rakhlin and Peter Bartlett arXiv preprint, short version appeared in COLT 2009.
- Message-passing for graph structured linear programs: Proximal projections, convergence and rounding schemes
with Pradeep Ravikumar and Martin Wainwright In ICML 2008.
- An Analysis of Inference with the Universum
with Fabian Sinze, Olivier Chapelle and Bernhard Schölkopf In NIPS 2007
- Learning Random Walks to Rank Nodes in Graphs
with Soumen Chakrabarti In ICML 2007
- Learning Parameters in Entity-relationship Graphs from Ranking Preferences
with Soumen Chakrabarti
In ECML/PKDD 2006
- Learning to Rank Networked Entities
with Soumen Chakrabarti and Sunny Aggarwal
In SIGKDD 2006
Teaching
CSE 599: Reinforcement Learning and Bandits, taught at University of Washington in Spring 2019 with Sham Kakade.
Bandits and Reinforcement Learning, taught at Columbia University in Fall 2017 with Alex Slivkins.
|
Professional Activities
Program Chair for NeurIPS 2022.
Fundraising Chair for AISTATS 2016.
Co-organized NIPS 2015 workshop on Optimization for Machine Learning.
Co-organized NIPS 2014 workshop on Optimization for Machine Learning.
Co-organized NIPS 2013 workshop on Optimization for Machine Learning.
Co-organized NIPS 2013 workshop on Optimization for Machine Learning.
Co-organized NIPS 2012 workshop on Optimization for Machine Learning.
Co-organized NIPS 2011 workshop on Computational Trade-offs in Statistical Learning.
Co-organized NIPS 2010 workshop Learning on Cores, Clusters and Clouds.
Senior Area Chair: NeurIPS 2019, NeurIPS 2020.
Area chair or equivalent: ICML 2013-2020, NeurIPS 2013-2018, COLT 2013-2020, AISTATS 2013, NeurIPS 2013.
Journal Reviewing: JMLR, Annals of Statistics, IEEE Transcations on Automatic Control, IEEE Transcations on Info Theory, SIAM Journal on Optimization, Machine Learning.
| |