Pan Xu - Publications

Full publications on Google Scholar. ^* indicates equal contribution.

Recent Preprints

Linear Mixture Distributionally Robust Markov Decision Processes [Paper]
Zhishuai Liu, Pan Xu, arXiv: 2505.18044.

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning [Paper]
Ruhan Wang*, Yu Yang*, Zhishuai Liu, Dongruo Zhou, Pan Xu, arXiv: 2410.23450.

Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning [Paper]
Zhishuai Liu*, Weixin Wang*, Pan Xu, arXiv: 2409.20521.

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [Paper]
Yu Yang, Pan Xu, arXiv: 2408.01402.

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization [Paper]
Zhen Qin*, Zhishuai Liu*, Pan Xu, arXiv: 2310.15976.

Peer Reviewed Publications

2025

Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction
Yiting He*, Zhishuai Liu*, Weixin Wang, Pan Xu
In Proc. of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada, 2025.
[Summary]

Existing literature on robust Markov decision processes (RMDPs) mostly assumes access to generative models allowing arbitrary state-action queries or pre-collected datasets with a good state coverage of the deployment environment, bypassing the challenge of exploration. In this work, we study a more realistic and challenging setting where the agent is limited to online interaction with the training environment. To capture the intrinsic difficulty of exploration in online RMDPs, we introduce the supremal visitation ratio, a novel quantity that measures the mismatch between the training dynamics and the deployment dynamics. We show that if this ratio is unbounded, online learning becomes exponentially hard. We propose the first computationally efficient algorithm that achieves sublinear regret in online RMDPs under $f$-divergence based transition uncertainties. We also establish matching regret lower bounds, demonstrating that our algorithm achieves optimal dependence on both the supremal visitation ratio and the number of interaction episodes.

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization
Cheng Tang*, Zhishuai Liu*, Pan Xu
In Proc. of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada, 2025.
[Summary]

We propose a novel framework, the $d$-rectangular linear Robust Regularized Markov Decision Process ($d$-RRMDP), which introduces latent structures into both transition kernels and regularization. We focus on offline reinforcement learning, where an agent learns robust policies from a precollected dataset in the nominal environment. We develop the Robust Regularized Pessimistic Value Iteration (R2PVI) algorithm that employ linear function approximation and $f$-divergence-based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, demonstrating that these bounds are influenced by how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. We also establish information-theoretic lower bounds for $d$-RRMDPs, revealing that our algorithm is near-optimal. Finally, numerical experiments validate that R2PVI learns robust policies and exhibits superior computational efficiency compared to baseline methods.

2024

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning
Hao-Lun Hsu*, Weixin Wang*, Miroslav Pajic, Pan Xu
In Proc. of the 38th Conference on Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024.
[Summary] [Paper] [Code]

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication complexity $\widetilde{\mathcal{O}}(dHM^2)$, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (\textit{i.e.,} $N$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models.

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning
Zhishuai Liu, Pan Xu
In Proc. of the 38th Conference on Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024.
[Summary] [Paper] [Code]

Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces. However, the consideration of dynamics uncertainty introduces essential nonlinearity and computational burden, posing unique challenges for analyzing and practically employing function approximation. When the nominal model and perturbed models are linearly parameterized, we propose minimax optimal and computationally efficient algorithms realizing function approximation and initiate the study on instance-dependent suboptimality analysis in the context of robust offline RL.

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation
Yihong Guo, Yixuan Wang, Yuanyuan Shi, Pan Xu, Anqi Liu
In Proc. of the 38th Conference on Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024.
[Summary] [Paper]

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards in the source domain do not guarantee optimal performance when the learned policy is actually deployed to the target domain. We propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Theoretically, we present an error bound for our method under a mild assumption regarding the dynamics shift to justify the motivation of our method. Empirically, our method outperforms the pure modified reward method without imitation learning by at least 30% and also outperforms other baselines in benchmark off-dynamics environments.

Optimal Batched Best Arm Identification
Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu
In Proc. of the 38th Conference on Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024.
[Summary] [Paper]

We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability $1-\delta$ for some small constant $\delta>0$ while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i.e., $\delta\rightarrow 0$) and runs in $3$ batches in expectation. Additionally, we propose the Opt-BBAI algorithm which reaches near-optimal sample complexity in the non-asymptotic setting as well.

Efficient and Robust Sequential Decision Making Algorithms
Pan Xu
AI Magazine, 2024.
[Summary] [Paper]

Sequential decision-making involves making informed decisions based on continuous interactions with a complex environment. This process is ubiquitous in various applications, including recommendation systems and clinical treatment design. My research has concentrated on addressing two pivotal challenges in sequential decision-making: (1) How can we design algorithms that efficiently learn the optimal decision strategy with minimal interactions and limited sample data? (2) How can we ensure robustness in decision-making algorithms when faced with distributional shifts due to environmental changes and the sim-to-real gap? This paper summarizes and expands upon the talk I presented at the AAAI 2024 New Faculty Highlights program, detailing how my research aims to tackle these challenges.

PhyGCN: Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning
Yihe Deng*, Ruochi Zhang*, Pan Xu, Jian Ma, Quanquan Gu
Transactions on Machine Learning Research (TMLR), 2024.
[Summary] [Paper] [Code]

We propose, Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning (PhyGCN), which leverages hypergraph structure for self-supervision to enhance node representations. PhyGCN introduces a unique training strategy that integrates variable hyperedge sizes with self-supervised learning, enabling improved generalization to unseen data. Applications on multi-way chromatin interactions and polypharmacy side-effects demonstrate the effectiveness of PhyGCN. As a generic framework for high-order interaction datasets with abundant unlabeled data, PhyGCN holds strong potential for enhancing hypergraph node representations across various domains.

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
Haque Ishfaq*, Yixin Tan*, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu
Presented at the 1st Reinforcement Learning Conference (RLC), Amherst, Massachusetts, USA, 2024.
In Reinforcement Learning Journal (RLJ), vol. 1, no. 1, 2024.
[Summary] [Paper] [Code]

We propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach, which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

Optimal Batched Linear Bandits
Xuanfei Ren, Tianyuan Jin, Pan Xu
In Proc. of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024.
[Summary] [Paper] [Code]

We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $O(\log\log T)$ batches, and the asymptotically optimal regret with only $3$ batches as $T\rightarrow\infty$, where $T$ is the time horizon. We further prove a lower bound on the batch complexity of linear contextual bandits showing that any asymptotically optimal algorithm must require at least $3$ batches in expectation as $T\rightarrow\infty$, which indicates E$^4$ achieves the asymptotic optimality in regret and batch complexity simultaneously. To the best of our knowledge, E$^4$ is the first algorithm for linear bandits that simultaneously achieves the minimax and asymptotic optimality in regret with the corresponding optimal batch complexities.

Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation
Zhishuai Liu, Pan Xu
In Proc. of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain, 2024.
[Summary] [Paper] [Code]

We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We cast this problem as an online distributionally robust Markov decision process (DRMDP), where the policy is learned by actively interacting with the source domain while seeking the optimal performance under the worst possible dynamics within an uncertainty set of the source domain's transition kernel. We introduce DR-LSVI-UCB, the first provably efficient online DRMDP algorithm for off-dynamics RL with function approximation, and establish a polynomial suboptimality bound that is independent of the state and action space sizes. Our work makes the first step towards a deeper understanding of the provable efficiency of online DRMDPs with linear function approximation.

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq*, Qingfeng Lan*, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli
In Proc. of the 12th International Conference on Learning Representations (ICLR), Vienna, Austria, 2024.
[Summary] [Paper] [Code]

We present a scalable and effective exploration strategy using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it achieves the state-of-the-art regret bound $\tilde{O}(d^{3/2}H^{3/2}\sqrt{T})$ for randomized methods, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $T$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu
In Proc. of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI), Vancouver, Canada, 2024. [Oral Presentation, 2.3%]
[Summary] [Paper] [Code]

We study the multi-agent multi-armed bandit (MAMAB) problem. We propose the $\epsilon$-exploring Multi-Agent Thompson Sampling ($\epsilon$-MATS) algorithm, which performs TS exploration with probability $\epsilon$ while adopts a greedy policy otherwise. We prove that $\epsilon$-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms. We conduct thorough experiments on standard MAMAB problems including 0101-Chains, Gem Mining, and Wind Farm Control, which demonstrate the superior performance and the improved computational efficiency of $\epsilon$-MATS compared with existing algorithms in the same setting.

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits
Yi Shen, Pan Xu, Michael M. Zavlanos
Transactions on Machine Learning Research (TMLR), 2024. [Featured Certification]
[Summary] [Paper]

We propose a novel DRO approach that employs the Wasserstein distance to encompass distributions with varying support and take into consideration the geometry of the distribution support. We present a regularized (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We validate the performance of our approach on the International Stroke Trial (IST) dataset.

2023

Thompson Sampling with Less Exploration is Fast and Optimal
Tianyuan Jin, Xianglin Yang, Xiaokui Xiao, Pan Xu
In Proc. of the 40th International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 2023.
[Summary] [Paper]

We propose $\epsilon$-Exploring Thompson Sampling ($\epsilon$-TS), a modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In $\epsilon$-TS, arms are selected greedily based on empirical mean rewards with probability $1-\epsilon$, and based on posterior samples obtained from TS with probability $\epsilon$. Here, $\epsilon\in(0,1)$ is a user-defined constant. By reducing exploration, $\epsilon$-TS improves computational efficiency compared to TS while achieving better regret bounds. We establish that $\epsilon$-TS is both minimax optimal and asymptotically optimal for various popular reward distributions, including Gaussian, Bernoulli, Poisson, and Gamma. Our algorithm is as easy to implement as TS, but operates significantly faster due to reduced exploration. Empirical evaluations confirm the efficiency and optimality of $\epsilon$-TS.

Distributionally Robust Policy Gradient for Offline Contextual Bandits
Zhouhao Yang*, Yihong Guo*, Pan Xu, Anqi Liu, Anima Anandkumar
In Proc. of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain, 2023.
[Summary] [Paper]

Learning an optimal policy from offline data is notoriously challenging, which requires the evaluation of the learning policy using data pre-collected from a static logging policy. We study the policy optimization problem in offline contextual bandits using policy gradient methods. We employ a distributionally robust policy gradient method, DROPO, to account for the distributional shift between the static logging policy and the learning policy in policy gradient. Our approach conservatively estimates the conditional reward distributional and updates the policy accordingly.

Queer In AI: A Case Study in Community-Led Participatory AI
Organizers of QueerInAI, ... , Pan Xu, ..., Luke Stark. [Show all authors]

Full author list: Organizers of QueerInAI, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi Jethwani, Huan Zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx Mclean, Pan Xu, A. Pranav, Raj Korpan, Ruchira Ray, Sarah Mathew, Sarthak Arora, ST John, Tanvi Anand, Vishakha Agrawal, William Agnew, Yanan Long, Zijie J. Wang, Zeerak Talat, Avijit Ghosh, Nathaniel Dennler, Michael Noseworthy, Sharvani Jha, Emi Baylor, Aditya Joshi, Natalia Y. Bilenko, Andrew Mcnamara, Raphael Gontijo-Lopes, Alex Markham, Evyn Jackie Kay, Manu Saraswat, Nikhil Vytla and Luke Stark
In Proc. of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT), Chicago, IL, USA, 2023. [Best Paper Award]
[Summary] [Paper]

We present Queer in AI as a case study for community-led participatory design in AI. We examine how participatory design and intersectional tenets started and shaped this community's programs over the years. We discuss different challenges that emerged in the process, look at ways this organization has fallen short of operationalizing participatory and intersectional principles, and then assess the organization's impact. Queer in AI provides important lessons and insights for practitioners and theorists of participatory methods broadly through its rejection of hierarchy in favor of decentralization, success at building aid and programs by and for the queer community, and effort to change actors and institutions outside of the queer community. Finally, we theorize how communities like Queer in AI contribute to the participatory design in AI more broadly by fostering cultures of participation in AI, welcoming and empowering marginalized participants, critiquing poor or exploitative participatory practices, and bringing participation to institutions outside of individual research projects. Queer in AI's work serves as a case study of grassroots activism and participatory methods within AI, demonstrating the potential of community-led participatory methods and intersectional praxis, while also providing challenges, case studies, and nuanced insights to researchers developing and using participatory methods.

Multiple Models for Outbreak Decision Support in the Face of Uncertainty
Katriona Shea, ... , Pan Xu, ... , Michael C. Runge [Show all authors]

Full author list: Katriona Shea, Rebecca K. Borchering, William J. M. Probert, Emily Howerton, Tiffany L. Bogich, Shou-Li Li, Willem G. van Panhuis, Cecile Viboud, Ricardo Aguás, Artur A. Belov, Sanjana H. Bhargava, Sean M. Cavany, Joshua C. Chang, Cynthia Chen, Jinghui Chen, Shi Chen, YangQuan Chen, Lauren M. Childs, Carson C. Chow, Isabel Crooker, Sara Y. Del Valle, Guido España, Geoffrey Fairchild, Richard C. Gerkin, Timothy C. Germann, Quanquan Gu, Xiangyang Guan, Lihong Guo, Gregory R. Hart, Thomas J. Hladish, Nathaniel Hupert, Daniel Janies, Cliff C. Kerr, Daniel J. Klein, Eili Y. Klein, Gary Lin, Carrie Manore, Lauren Ancel Meyers, John E. Mittler, Kunpeng Mu, Rafael C. Núñez, Rachel J. Oidtman, Remy Pasco, Ana Pastore y Piontti, Rajib Paul, Carl A. B. Pearson, Dianela R. Perdomo, T. Alex Perkins, Kelly Pierce, Alexander N. Pillai, Rosalyn Cherie Rael, Katherine Rosenfeld, Chrysm Watson Ross, Julie A. Spencer, Arlin B. Stoltzfus, Kok Ben Toh, Shashaank Vattikuti, Alessandro Vespignani, Lingxiao Wang, Lisa J. White, Pan Xu, Yupeng Yang, Osman N. Yogurtcu, Weitong Zhang, Yanting Zhao, Difan Zou, Matthew J. Ferrari, David Pannell, Michael J. Tildesley, Jack Seifarth, Elyse Johnson, Matthew Biggerstaff, Michael A. Johansson, Rachel B. Slayton, John D. Levander, Jeff Stazer, Jessica Kerr, and Michael C. Runge
Proceedings of the National Academy of Sciences (PNAS), Volume 120, No. 18, 2023.
[Summary] [Paper]

During infectious disease outbreaks, uncertainty hinders our ability to forecast dynamics and to make critical decisions about management. Motivated by the COVID-19 pandemic, leading agencies have initiated efforts to prepare for future outbreaks, for example, the US Centers for Disease Control and Prevention’s National Center for Forecasting and Outbreak Analytics and the WHO’s Hub for Pandemic and Epidemic Intelligence were recently inaugurated. Critically, such efforts need to inform policy as well as provide insight into expected disease dynamics. We present a validated case study from early in the pandemic, drawing on recommendations to minimize cognitive biases and incorporate decision theory, to illustrate how a policy-focused process could work for urgent, important, time-sensitive outbreak decision making in the face of uncertainty.

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Yizhou Zhang*, Guannan Qu*, Pan Xu*, Yiheng Lin, Zaiwei Chen, Adam Wierman
Presented at the ACM SIGMETRICS conference, Orlando, Florida, USA, 2023.
In Proc. of the ACM on Measurement and Analysis of Computing Systems (POMACS).
[Summary] [Paper]

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its $\kappa$-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in $\kappa$. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing $\kappa$. Numerical simulations demonstrate the effectiveness of LPI.

2022

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
Tianyuan Jin, Pan Xu, Xiaokui Xiao, Anima Anandkumar
In Proc. of the 35th Conference on Advances in Neural Information Processing Systems (NeurIPS), New Orleans, Louisiana, USA, 2022.
[Summary] [Paper]

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound.

Active Ranking without Strong Stochastic Transitivity
Hao Lou, Tao Jin, Yue Wu, Pan Xu, Farzad Farnoud, Quanquan Gu
In Proc. of the 35th Conference on Advances in Neural Information Processing Systems (NeurIPS), New Orleans, Louisiana, USA, 2022.
[Summary] [Paper]

We consider the problem of recovering the exact full ranking for a list of items under ranking models that do NOT assume the Strong Stochastic Transitivity property. We propose a $\delta$-correct algorithm, Probe-Rank, that actively learns the ranking of the items from noisy pairwise comparisons. We prove a sample complexity upper bound for Probe-Rank, which only depends on the preference probabilities between items that are adjacent in the true ranking. This improves upon existing sample complexity results that depend on the preference probabilities for all pairs of items.

Langevin Monte Carlo for Contextual Bandits
Pan Xu, Hongkai Zheng, Eric Mazumdar, Kamyar Azizzadenesheli, Anima Anandkumar
In Proc. of the 39th International Conference on Machine Learning (ICML), Baltimore, Maryland, USA, 2022.
[Summary] [Paper] [Code]

We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. Our method is computationally efficient since it only needs to perform noisy gradient descent updates without constructing the Laplace approximation of the posterior distribution. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits, viz., linear contextual bandits. We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance.

Neural Contextual Bandits with Deep Representation and Shallow Exploration
Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu
In Proc. of the 10th International Conference on Learning Representations (ICLR), Online, 2022.
[Summary] [Paper] [Code]

We study neural contextual bandits, a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the specific reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves $\widetilde{O}(\sqrt{T})$ finite-time regret, where $T$ is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons
Yue Wu*, Tao Jin*, Hao Lou, Pan Xu, Farzad Farnoud, Quanquan Gu
In Proc. of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), Online, 2022.
[Summary] [Paper]

In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus, a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons from multiple users and improves the users' average accuracy by maintaining an active set of users. We prove that our algorithm can return the true ranking of items with high probability. We also provide a sample complexity bound for the proposed algorithm, which outperforms the non-active strategies in the literature and close to oracle under mild conditions. Experiments are provided to show the empirical advantage of the proposed methods over the state-of-the-art baselines.

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US
Estee Y Cramer, ... , Pan Xu, ... , Nicholas G. Reich. [Show all authors]

Full author list: Estee Y. Cramer, Evan L. Ray, Velma K. Lopez, Johannes Bracher, Andrea Brennen, Alvaro J. Castro Rivadeneira, Aaron Gerding, Tilmann Gneiting, Katie H. House, Yuxin Huang, Dasuni Jayawardena, Abdul H. Kanji, Ayush Khandelwal, Khoa Le, Anja Mühlemann, Jarad Niemi, Apurv Shah, Ariane Stark, Yijin Wang, Nutcha Wattanachit, Martha W. Zorn, Youyang Gu, Sansiddh Jain, Nayana Bannur, Ayush Deva, Mihir Kulkarni, Srujana Merugu, Alpan Raval, Siddhant Shingi, Avtansh Tiwari, Jerome White, Neil F. Abernethy, Spencer Woody, Maytal Dahan, Spencer Fox, Kelly Gaither, Michael Lachmann, Lauren Ancel Meyers, James G. Scott, Mauricio Tec, Ajitesh Srivastava, Glover E. George, Jeffrey C. Cegan, Ian D. Dettwiller, William P. England, Matthew W. Farthing, Robert H. Hunter, Brandon Lafferty, Igor Linkov, Michael L. Mayo, Matthew D. Parno, Michael A. Rowland, Benjamin D. Trump, Yanli Zhang-James, Samuel Chen, Stephen V. Faraone, Jonathan Hess, Christopher P. Morley, Asif Salekin, Dongliang Wang, Sabrina M. Corsetti, Thomas M. Baer, Marisa C. Eisenberg, Karl Falb, Yitao Huang, Emily T. Martin, Ella McCauley, Robert L. Myers, Tom Schwarz, Daniel Sheldon, Graham Casey Gibson, Rose Yu, Liyao Gao, Yian Ma, Dongxia Wu, Xifeng Yan, Xiaoyong Jin, Yu-Xiang Wang, YangQuan Chen, Lihong Guo, Yanting Zhao, Quanquan Gu, Jinghui Chen, Lingxiao Wang, Pan Xu, Weitong Zhang, Difan Zou, Hannah Biegel, Joceline Lega, Steve McConnell, V. P. Nagraj, Stephanie L. Guertin, Christopher Hulme-Lowe, Stephen D. Turner, Yunfeng Shi, Xuegang Ban, Robert Walraven, Qi-Jun Hong, Stanley Kong, Axel van de Walle, James A. Turtle, Michal Ben-Nun, Steven Riley, Pete Riley, Ugur Koyluoglu, David DesRoches, Pedro Forli, Bruce Hamory, Christina Kyriakides, Helen Leis, John Milliken, Michael Moloney, James Morgan, Ninad Nirgudkar, Gokce Ozcan, Noah Piwonka, Matt Ravi, Chris Schrader, Elizabeth Shakhnovich, Daniel Siegel, Ryan Spatz, Chris Stiefeling, Barrie Wilkinson, Alexander Wong, Sean Cavany, Guido España, Sean Moore, Rachel Oidtman, Alex Perkins, David Kraus, Andrea Kraus, Zhifeng Gao, Jiang Bian, Wei Cao, Juan Lavista Ferres, Chaozhuo Li, Tie-Yan Liu, Xing Xie, Shun Zhang, Shun Zheng, Alessandro Vespignani, Matteo Chinazzi, Jessica T. Davis, Kunpeng Mu, Ana Pastore y Piontti, Xinyue Xiong, Andrew Zheng, Jackie Baek, Vivek Farias, Andreea Georgescu, Retsef Levi, Deeksha Sinha, Joshua Wilde, Georgia Perakis, Mohammed Amine Bennouna, David Nze-Ndong, Divya Singhvi, Ioannis Spantidakis, Leann Thayaparan, Asterios Tsiourvas, Arnab Sarker, Ali Jadbabaie, Devavrat Shah, Nicolas Della Penna, Leo A. Celi, Saketh Sundar, Russ Wolfinger, Dave Osthus, Lauren Castro, Geoffrey Fairchild, Isaac Michaud, Dean Karlen, Matt Kinsey, Luke C. Mullany, Kaitlin Rainwater-Lovett, Lauren Shin, Katharine Tallaksen, Shelby Wilson, Elizabeth C. Lee, Juan Dent, Kyra H. Grantz, Alison L. Hill, Joshua Kaminsky, Kathryn Kaminsky, Lindsay T. Keegan, Stephen A. Lauer, Joseph C. Lemaitre, Justin Lessler, Hannah R. Meredith, Javier Perez-Saez, Sam Shah, Claire P. Smith, Shaun A. Truelove, Josh Wills, Maximilian Marshall, Lauren Gardner, Kristen Nixon, John C. Burant, Lily Wang, Lei Gao, Zhiling Gu, Myungjin Kim, Xinyi Li, Guannan Wang, Yueying Wang, Shan Yu, Robert C. Reiner, Ryan Barber, Emmanuela Gakidou, Simon I. Hay, Steve Lim, Chris Murray, David Pigott, Heidi L. Gurung, Prasith Baccam, Steven A. Stage, Bradley T. Suchoski, B. Aditya Prakash, Bijaya Adhikari, Jiaming Cui, Alexander Rodríguez, Anika Tabassum, Jiajia Xie, Pinar Keskinocak, John Asplund, Arden Baxter, Buse Eylul Oruc, Nicoleta Serban, Sercan O. Arik, Mike Dusenberry, Arkady Epshteyn, Elli Kanal, Long T. Le, Chun-Liang Li, Tomas Pfister, Dario Sava, Rajarishi Sinha, Thomas Tsai, Nate Yoder, Jinsung Yoon, Leyou Zhang, Sam Abbott, Nikos I. Bosse, Sebastian Funk, Joel Hellewell, Sophie R. Meakin, Katharine Sherratt, Mingyuan Zhou, Rahi Kalantari, Teresa K. Yamana, Sen Pei, Jeffrey Shaman, Michael L. Li, Dimitris Bertsimas, Omar Skali Lami, Saksham Soni, Hamza Tazi Bouardi, Turgay Ayer, Madeline Adee, Jagpreet Chhatwal, Ozden O. Dalgic, Mary A. Ladd, Benjamin P. Linas, Peter Mueller, Jade Xiao, Yuanjia Wang, Qinxia Wang, Shanghong Xie, Donglin Zeng, Alden Green, Jacob Bien, Logan Brooks, Addison J. Hu, Maria Jahja, Daniel McDonald, Balasubramanian Narasimhan, Collin Politsch, Samyak Rajanala, Aaron Rumack, Noah Simon, Ryan J. Tibshirani, Rob Tibshirani, Valerie Ventura, Larry Wasserman, Eamon B. O’Dea, John M. Drake, Robert Pagano, Quoc T. Tran, Lam Si Tung Ho, Huong Huynh, Jo W. Walker, Rachel B. Slayton, Michael A. Johansson, Matthew Biggerstaff, Nicholas G.
Proceedings of the National Academy of Sciences (PNAS), Volume 119, No. 15, 2022.
[Summary] [Paper]

This paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the United States. Results show high variation in accuracy between and within stand-alone models and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public-health action.

2021

Double Explore-then-Commit: Asymptotic Optimality and Beyond
Tianyuan Jin, Pan Xu, Xiaokui Xiao, Quanquan Gu
In Proc. of the 34th Annual Conference on Learning Theory (COLT), Online, 2021.
This work has been presented at the NeurIPS 2020 Offline Reinforcement Learning Workshop.
[Summary] [Paper]

We propose a double explore-then-commit (DETC) algorithm that has two exploration and exploitation phases and show that it can achieve the asymptotically optimal regret bound. To our knowledge, DETC is the first non-fully-sequential algorithm that achieves asymptotic optimality. In addition, we extend DETC to batched bandit problems, where (i) the exploration process is split into a small number of batches and (ii) the round complexity is of central interest. We prove that a batched version of DETC can achieve the asymptotic optimality with only a constant round complexity. This is the first batched bandit algorithm that can attain the optimal asymptotic regret bound and optimal round complexity simultaneously.

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling
Difan Zou, Pan Xu, Quanquan Gu
In Proc. of the 37th International Conference on Uncertainty in Artificial Intelligence (UAI), Online, 2021.
[Summary] [Paper]

We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\widetilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension. This improves existing results on the convergence rate of SGLD \citep{raginsky2017non,xu2018global}. We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\widetilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin based algorithms, and sheds some light on the design of fast stochastic gradient based sampling algorithms.

MOTS: Minimax Optimal Thompson Sampling
Tianyuan Jin, Pan Xu, Jieming Shi, Xiaokui Xiao, Quanquan Gu
In Proc. of the 38th International Conference on Machine Learning (ICML), Online, 2021.
[Summary] [Paper]

Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound $\Omega(\sqrt{KT})$ for $K$-armed bandit problems, where $T$ is the total time horizon. In this paper, we solve this long open problem by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(\sqrt{KT})$ for finite time horizon $T$, as well as the asymptotic optimal regret bound for Gaussian rewards when $T$ approaches infinity. To our knowledge, MOTS is the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandit problems.

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits
Tianyuan Jin, Jing Tang, Pan Xu, Keke Huang, Xiaokui Xiao, Quanquan Gu
In Proc. of the 38th International Conference on Machine Learning (ICML), Online, 2021.
[Summary] [Paper]

In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assume that the time horizon $T$ is known in advance. However, many applications involve an unpredictable stopping time. In this paper, we study the anytime batched multi-armed bandit problem. We propose an anytime algorithm that achieves the asymptotically optimal regret for exponential families of reward distributions with $\mathcal{O}(\log \log T\cdot \ilog^{\alpha} (T))$ batches, where $\alpha\in \mathcal{O}_{T}(1)$. Moreover, we prove that for any constant $c>0$, no algorithm can achieve the asymptotically optimal regret within $c \log \log T$ batches.

2020

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu
In Proc. of the 33rd Conference on Advances in Neural Information Processing Systems (NeurIPS), Online, 2020.
This work has been presented at the ICML 2020 Theoretical Foundations of Reinforcement Learning Workshop.
[Summary] [Paper]

We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., $\|\nabla J(\theta)\|_2^2 \le \epsilon$) of the non-concave performance function $J({\theta})$, with $\mathcal{\widetilde{O}}(\epsilon^{-2.5})$ sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Pan Xu, Quanquan Gu
In Proc. of the 37th International Conference on Machine Learning (ICML), Online, 2020.
[Summary] [Paper]

We present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

Stochastic Nested Variance Reduction for Nonconvex Optimization
Dongruo Zhou, Pan Xu, Quanquan Gu
Journal of Machine Learning Research (JMLR), Volume 21, No. 103, 2020.
The short version of this paper has been published in NeurIPS 2018. The journal version extends the original SNVRG algorithm for finding first order stationary points to local minimum finding algorithms by incorporating this manuscript: Finding Local Minima via Stochastic Nested Variance Reduction.
[Summary] [Paper]

In this journal version of SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed $\text{SNVRG}+\text{Neon2}^{\text{finite}}$ algorithm achieves $\tilde{O}(n^{1/2}\epsilon^{-2}+n\epsilon_H^{-3}+n^{3/4}\epsilon_H^{-7/2})$ gradient complexity to converge to an $(\epsilon, \epsilon_H)$-second-order stationary point, which outperforms $\text{SVRG}+\text{Neon2}^{\text{finite}}$ \citep{allen2018neon2}, the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed $\text{SNVRG}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ \citep{allen2018neon2} and Natasha2 \citep{allen2017natasha2} in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Pan Xu, Felicia Gao, Quanquan Gu
In Proc. of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 2020.
Partial results of this work have been presented at the NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop, Vancouver, Canada, 2019 and the 2020 Virtual Conference on Reinforcement Learning for Real Life.
[Summary] [Paper] [Poster] [Code] [Video]

We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/\epsilon^{3/2})$ episodes to find an $\epsilon$-approximate stationary point of the nonconcave performance function $J(\boldsymbol{\theta})$ (i.e., $\boldsymbol{\theta}$ such that $\|\nabla J(\boldsymbol{\theta})\|_2^2\leq\epsilon$). This sample complexity improves the existing result $O(1/\epsilon^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/\epsilon^{1/6})$. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

Rank Aggregation via Heterogeneous Thurstone Preference Models
Tao Jin*, Pan Xu*, Quanquan Gu, Farzad Farnoud
In Proc. of the 34th Conference on Artificial Intelligence (AAAI), New York, New York, USA, 2020. [Oral presentation, 4.5%]
[Summary] [Paper] [Code]

We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. Under this framework, we also propose a rank aggregation algorithm based on alternating gradient descent to estimate the underlying item scores and accuracy levels of different users simultaneously from noisy pairwise comparisons. We theoretically prove that the proposed algorithm converges linearly up to a statistical error which matches that of the state-of-the-art method for the single-user BTL model. We evaluate the proposed HTM model and algorithm on both synthetic and real data, demonstrating that it outperforms existing methods.

2019

Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction
Difan Zou, Pan Xu, Quanquan Gu
In Proc. of the 32nd Conference on Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019.
[Summary] [Paper] [Poster]

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stochastic gradient estimator that recursively accumulates the gradient information to reduce the variance of the stochastic gradient. We provide a convergence analysis of SRVR-HMC for sampling from a class of non-log-concave distributions and show that SRVR-HMC converges faster than all existing HMC-type algorithms based on underdamped Langevin dynamics. Thorough experiments on synthetic and real-world datasets validate our theory and demonstrate the superiority of SRVR-HMC.

Stochastic Variance-Reduced Cubic Regularization Methods
Dongruo Zhou, Pan Xu, Quanquan Gu
Journal of Machine Learning Research (JMLR), Volume 20, No. 134, 2019.
The short version of this paper has been published in ICML 2018. The journal version extends the SVRC algorithm to a sample-efficient one proposed in this manuscript: Sample Efficient Stochastic Variance-Reduced Cubic Regularization Method.
[Summary] [Paper]

To reduce the sample complexity of Hessian matrix computation in SVRC, we also propose a sample efficient stochastic variance-reduced cubic regularization (Lite-SVRC) algorithm for finding the local minimum more efficiently. Lite-SVRC converges to an $(\epsilon,\sqrt{\epsilon})$-approximate local minimum within $\tilde{O}(n+n^{2/3}/\epsilon^{3/2})$ Hessian sample complexity, which is faster than all existing cubic regularization based methods. Numerical experiments with different nonconvex optimization problems conducted on real datasets validate our theoretical results for both SVRC and Lite-SVRC.

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Pan Xu, Felicia Gao, Quanquan Gu
In Proc. of the 35th International Conference on Uncertainty in Artificial Intelligence (UAI), Tel Aviv, Israel, 2019. [Oral presentation, 7.8%]
[Summary] [Paper]

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by \citet{papini2018stochastic} for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\epsilon^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/\epsilon^2)$ by a factor of $O(1/\epsilon^{1/3})$. At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on reinforcement learning benchmark tasks.

Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics
Difan Zou, Pan Xu, Quanquan Gu
In Proc. of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), Naha, Okinawa, Japan, 2019.
[Summary] [Paper]

We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD \citep{dubey2016variance}, for sampling from non-log-concave distributions. Under certain assumptions on the log density function, we establish the convergence guarantees of SVRG-LD and SAGA-LD in $2$-Wasserstein distance. More specifically, we show that both SVRG-LD and SAGA-LD require $ \tilde O\big(n+n^{3/4}/\epsilon^2 + n^{1/2}/\epsilon^4\big)\cdot \exp\big(\tilde O(d+\gamma)\big)$ stochastic gradient evaluations to achieve $\epsilon$-accuracy in $2$-Wasserstein distance, which outperforms the $ \tilde O\big(n/\epsilon^4\big)\cdot \exp\big(\tilde O(d+\gamma)\big)$ gradient complexity achieved by Langevin Monte Carlo Method \citep{raginsky2017non}. Experiments on synthetic data and real data back up our theory.

2018 and earlier

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Pan Xu*, Jinghui Chen*, Difan Zou, Quanquan Gu
In Proc. of the 31st Conference on Advances in Neural Information Processing Systems (NeurIPS), Montréal, Canada, 2018. [Spotlight presentation, 3.5%]
[Summary] [Paper] [Video]

We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the $\textit{almost minimizer}$ within $\widetilde O\big(nd/(\lambda\epsilon) \big)$ and $\widetilde O\big(d^7/(\lambda^5\epsilon^5) \big)$ stochastic gradient evaluations respectively, where $d$ is the problem dimension, and $\lambda$ is the spectral gap of the Markov chain generated by GLD. Both results improve upon the best known gradient complexity results \citep{raginsky2017non}. Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (SVRG-LD) to the almost minimizer within $\widetilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees.

Stochastic Nested Variance Reduction for Nonconvex Optimization
Dongruo Zhou, Pan Xu, Quanquan Gu
In Proc. of the 31st Conference on Advances in Neural Information Processing Systems (NeurIPS), Montréal, Canada, 2018. [Spotlight presentation, 3.5%]
[Summary] [Paper] [Video]

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses $K+1$ nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, the proposed algorithm converges to an $\epsilon$-approximate first-order stationary point (i.e., $\|\nabla F(\mathbf{x})\|_2\leq \epsilon$) within $\widetilde O(n\land \epsilon^{-2}+\epsilon^{-3}\land n^{1/2}\epsilon^{-2})$ number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG $O(n+n^{2/3}\epsilon^{-2})$ and that of SCSG $O(n\land \epsilon^{-2}+\epsilon^{-10/3}\land n^{2/3}\epsilon^{-2})$. For gradient dominated functions, our algorithm also achieves better gradient complexity than the state-of-the-art algorithms. Thorough experimental results on different nonconvex optimization problems back up our theory.

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
Yaodong Yu*, Pan Xu*, Quanquan Gu
In Proc. of the 31st Conference on Advances in Neural Information Processing Systems (NeurIPS), Montréal, Canada, 2018.
[Summary] [Paper] [Video]

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specifically, the proposed algorithm only needs $\tilde{O}(\epsilon^{-10/3})$ stochastic gradient evaluations to converge to an approximate local minimum $\mathbf{x}$, which satisfies $\|\nabla f(\mathbf{x})\|_2\leq\epsilon$ and $\lambda_{\min}(\nabla^2 f(\mathbf{x}))\geq -\sqrt{\epsilon}$ in unconstrained stochastic optimization, where $\tilde{O}(\cdot)$ hides logarithm polynomial terms and constants. This improves upon the $\tilde{O}(\epsilon^{-7/2})$ gradient complexity achieved by the state-of-the-art stochastic local minima finding algorithms by a factor of $\tilde{O}(\epsilon^{-1/6})$. Experiments on two nonconvex optimization problems demonstrate the effectiveness of our algorithm and corroborate our theory.

Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics
Difan Zou*, Pan Xu*, Quanquan Gu
In Proc. of the 34th International Conference on Uncertainty in Artificial Intelligence (UAI), Monterey, California, USA, 2018.
[Summary] [Paper]

Stochastic variance-reduced gradient Langevin dynamics (SVRG-LD) was recently proposed to improve the performance of stochastic gradient Langevin dynamics (SGLD) by reducing the variance of the stochastic gradient. In this paper, we propose a variant of SVRG-LD, namely SVRG-LD$^+$, which replaces the full gradient in each epoch with a subsampled one. We provide a nonasymptotic analysis of the convergence of SVRG-LD$^+$ in $2$-Wasserstein distance, and show that SVRG-LD$^+$ enjoys a lower gradient complexity than SVRG-LD, when the sample size is large or the target accuracy requirement is moderate. Our analysis directly implies a sharper convergence rate for SVRG-LD, which improves the existing convergence rate by a factor of $\kappa^{1/6}n^{1/6}$, where $\kappa$ is the condition number of the log-density function and $n$ is the sample size. Experiments on both synthetic and real-world datasets validate our theoretical results.

Continuous and Discrete-Time Accelerated Stochastic Mirror Descent for Strongly Convex Functions
Pan Xu*, Tianhao Wang*, Quanquan Gu
In Proc. of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018.
[Summary] [Paper]

We provide a second-order stochastic differential equation (SDE), which characterizes the continuous-time dynamics of accelerated stochastic mirror descent (ASMD) for strongly convex functions. This SDE plays a central role in designing new discrete-time ASMD algorithms via numerical discretization and providing neat analyses of their convergence rates based on Lyapunov functions. Our results suggest that the only existing ASMD algorithm, namely, AC-SA proposed in \citet{ghadimi2012optimal} is one instance of its kind, and we can derive new instances of ASMD with fewer tuning parameters.This sheds light on revisiting accelerated stochastic optimization through the lens of SDEs, which can lead to a better understanding as well as new simpler algorithms of acceleration in stochastic optimization. Numerical experiments on both synthetic and real data support our theory.

Stochastic Variance-Reduced Cubic Regularized Newton Method
Dongruo Zhou, Pan Xu, Quanquan Gu
In Proc. of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018.
[Summary] [Paper]

We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $(\epsilon,\sqrt{\epsilon})$-approximate local minimum within $\widetilde{O}(n^{4/5}/\epsilon^{3/2})$ second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. Our work also sheds light on the application of variance reduction technique to high-order non-convex optimization methods. Thorough experiments on various non-convex optimization problems support our theory.

Stochastic Variance-Reduced Hamilton Monte Carlo Methods
Difan Zou*, Pan Xu*, Quanquan Gu
In Proc. of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018.
[Summary] [Paper]

We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $\epsilon$ accuracy in 2-Wasserstein distance, our algorithm achieves $\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}\big)$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the state-of-the-art HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general log-concave distributions, and prove the corresponding gradient complexity as well. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.

Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization
Jinghui Chen, Pan Xu, Lingxiao Wang, Jian Ma, Quanquan Gu
In Proc. of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018. [Long talk, 8.6%]
[Summary] [Paper]

We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints. To solve this estimator, we propose an alternating gradient descent algorithm with hard thresholding. Compared with existing methods along this line of research, which lack theoretical guarantees in optimization error and/or statistical error, the proposed algorithm not only is computationally much more efficient with a linear rate of convergence, but also attains the optimal statistical rate up to a logarithmic factor. Thorough experiments on both synthetic and real data support our theory.

Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms
Pan Xu*, Tianhao Wang*, Quanquan Gu
In Proc. of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), Playa Blanca, Lanzarote, Canary Islands, 2018.
[Summary] [Paper]

We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems. It enables us to design new algorithms, and perform a unified and simple analysis of the convergence rates of these algorithms. More specifically, under this framework, we provide a Lyapunov function based analysis for the continuous-time stochastic dynamics, as well as several new discrete-time algorithms derived from the continuous-time dynamics. We show that for general convex objective functions, the derived discrete-time algorithms attain the optimal convergence rate. Empirical experiments corroborate our theory.

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization
Pan Xu, Jian Ma, Quanquan Gu
In Proc. of the 30th Conference on Advances in Neural Information Processing Systems (NIPS), Long Beach, California, USA, 2017.
[Summary] [Paper] [Poster] [Code]

We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory.

Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference
Aditya Chaudhry, Pan Xu, Quanquan Gu
In Proc. of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 2017.
[Summary] [Paper]

Causal inference among high-dimensional time series data proves an important research problem in many fields. In the classical regime, one often establishes causality among time series via a concept known as “Granger causality.” However, existing approaches for Granger causal inference in high-dimensional data lack the means to characterize the uncertainty associated with Granger causality estimates (e.g. p-values and confidence intervals). We make two contributions in this work. First, we introduce a novel unbiased estimator for assessing Granger causality in the high-dimensional regime. We propose test statistics and confidence intervals for our estimator to allow, for the first time, uncertainty characterization in high-dimensional Granger causal inference. Second, we introduce a novel method for false discovery rate control that achieves higher power in multiple testing than existing techniques and that can cope with dependent test statistics and dependent observations. We corroborate our theoretical results with experiments on both synthetic data and real-world climatological data.

Efficient Algorithm for Sparse Tensor-variate Gaussian Graphical Models via Gradient Descent
Pan Xu, Tingting Zhang, Quanquan Gu
In Proc. of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, Florida, USA, 2017.
[Summary] [Paper]

We study the sparse tensor-variate Gaussian graphical model (STGGM), where each way of the tensor follows a multivariate normal distribution whose precision matrix has sparse structures. To estimate the precision matrices, we propose a sparsity constrained maximum likelihood estimator. However, due to the complex structure of the tensor-variate GGMs, the likelihood based estimator is non-convex, which poses great challenges for both computation and theoretical analysis. In order to address these challenges, we propose an efficient alternating gradient descent algorithm to solve this estimator and prove that, under certain conditions on the initial estimator, our algorithm is guaranteed to linearly converge to the unknown precision matrices up to the optimal statistical error. Experiments on both synthetic data and real world brain imaging data corroborate our theory.

Semiparametric Differential Graph Models
Pan Xu, Quanquan Gu
In Proc. of the 29th Conference on Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016.
[Summary] [Paper] [Video]

In many cases of network analysis, it is more attractive to study how a network varies under different conditions than an individual static network. We propose a novel graphical model, namely the Latent Differential Graph Model, where the networks under two different conditions are represented by two semiparametric elliptical distributions respectively, and the variation of these two networks (i.e., differential graph) is characterized by the difference between their latent precision matrices. We propose an estimator for the differential graph based on quasi-likelihood maximization with nonconvex regularization. We show that our estimator attains a faster statistical rate in parameter estimation than the state-of-the-art methods, and enjoys the oracle property under mild conditions. Thorough experiments on both synthetic and real-world data support our theory.

Forward Backward Greedy Algorithms for Multi-Task Learning with Faster Rates
Lu Tian, Pan Xu, Quanquan Gu
In Proc. of the 32nd International Conference on Uncertainty in Artificial Intelligence (UAI), New York / New Jersey, USA, 2016.
[Summary] [Paper]

In this paper, we develop a multi-task learning algorithm with faster convergence rates. In particular, we propose a general estimator for multitask learning with row sparsity constraints on the parameter matrix. The proposed estimator is a nonconvex optimization problem. To solve it, we develop a forward backward greedy algorithm with provable guarantees. More specifically, we prove that the output of the greedy algorithm attains a sharper estimation error bound than state-of-the-art multi-task learning methods. Moreover, our estimator enjoys model selection consistency under mild conditions. Thorough experiments on both synthetic and real-world data demonstrate the effectiveness of our method and back up our theory.