Publications

Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Published in Submitted to AISTATS, 2025

Policy design in non-stationary Markov Decision Processes (MDPs) is inherently challenging due to the complexities introduced by time-varying system transition and reward, which make it difficult for learners to determine the optimal actions for maximizing cumulative future rewards. Fortunately, in many practical applications, such as energy systems, look-ahead predictions are available, including forecasts for renewable energy generation and demand. In this paper, we leverage these look-ahead predictions and propose an algorithm designed to achieve low regret in non-stationary MDPs by incorporating such predictions. Our theoretical analysis demonstrates that, under certain assumptions, the regret decreases exponentially as the look-ahead window expands. When the system prediction is subject to error, the regret does not explode even if the prediction error grows sub-exponentially as a function of the prediction horizon. We validate our approach through simulations, confirming the efficacy of our algorithm in non-stationary environments.

Download here

Fast Bandit-based Policy Adaptation in Diverse Environments

Published in Submitted to ACC, 2025

Autonomous systems must have the ability to quickly adapt to various situations. However, adaptation methods often require strong assumptions about system structures, environmental homogeneity, and multiple rollouts. In this work, we integrate multi-armed bandit and model-based RL to design a fast adaptation algorithm on a single trajectory. Our approach achieves sublinear regret, and the performance guarantee does not require homogeneity of the environment. This regret bound is achieved using a novel prediction error metric that is minimized in the ground-truth MDP. To the best of our knowledge, all existing results with provable guarantees depend on the Bregman divergence between the optimal policies among the MDP’s. We show by simulation that our algorithm performs well in puzzle navigation and quadcopter path-tracking.

Download here

Sample Complexity of Stabilizing LTI Systems on a Single Trajectory under Stochastic Noise

Published in AAAI, 2025

We study the problem of learning to stabilize unknown noisy Linear Time-Invariant (LTI) systems on a single trajectory. It is well known in the literature that the learn-to-stabilize problem suffers from exponential blow-up in which the state norm blows up exponentially in the state dimension. This blow-up is due to the open-loop instability when exploring the n-dimensional state space. To address this issue, we develop a novel algorithm that decouples the unstable subspace of the LTI system from the stable subspace, based on which the algorithm only explores and stabilizes the unstable subspace, the dimension of which can be much smaller than n. With a new singular-value-decomposition(SVD)-based analytical framework, we prove that the system is stabilized before the state norm is only exponential in the order of the dimension of the unstable subspace and is advantagous if the unstable subspace is small. Critically, this bound avoids exponential blow-up in state dimension as in the previous works, and to the best of our knowledge, this is the first paper to avoid exponential blow-up in dimension for stabilizing LTI systems with noise.

Download here

Polyhedra of small relative mixed volume

Published in Contribution to Algebra and Geometry, 2020

We classify all tuples of lattice polyhedra of relative mixed volume 1 and all minimal (by inclusion) tuples of polyhedra of relative mixed volume 2. We also prove a conjecture by Esterov, which states that all tuples with finite relative mixed volume are contained in one of finitely many ones that are minimal by inclusion.

Download here