Search | arXiv e-print repository

Error Bounds for Learning Fourier Linear Operators

Abstract: We investigate the problem of learning operators between function spaces, focusing on the linear layer of the Fourier Neural Operator. First, we identify three main errors that occur during the learning process: statistical error due to finite sample size, truncation error from finite rank approximation of the operator, and discretization error from handling functional data on a finite grid of dom… ▽ More We investigate the problem of learning operators between function spaces, focusing on the linear layer of the Fourier Neural Operator. First, we identify three main errors that occur during the learning process: statistical error due to finite sample size, truncation error from finite rank approximation of the operator, and discretization error from handling functional data on a finite grid of domain points. Finally, we analyze a Discrete Fourier Transform (DFT) based least squares estimator, establishing both upper and lower bounds on the aforementioned errors. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 30 pages

arXiv:2405.17324 [pdf, other]

Leveraging Offline Data in Linear Latent Bandits

Authors: Chinmaya Kausik, Kevin Tan, Ambuj Tewari

Abstract: Sequential decision-making domains such as recommender systems, healthcare and education often have unobserved heterogeneity in the population that can be modeled using latent bandits $-$ a framework where an unobserved latent state determines the model for a trajectory. While the latent bandit framework is compelling, the extent of its generality is unclear. We first address this by establishing… ▽ More Sequential decision-making domains such as recommender systems, healthcare and education often have unobserved heterogeneity in the population that can be modeled using latent bandits $-$ a framework where an unobserved latent state determines the model for a trajectory. While the latent bandit framework is compelling, the extent of its generality is unclear. We first address this by establishing a de Finetti theorem for decision processes, and show that $\textit{every}$ exchangeable and coherent stateless decision process is a latent bandit. The latent bandit framework lends itself particularly well to online learning with offline datasets, a problem of growing interest in sequential decision-making. One can leverage offline latent bandit data to learn a complex model for each latent state, so that an agent can simply learn the latent state online to act optimally. We focus on a linear model for a latent bandit with $d_A$-dimensional actions, where the latent states lie in an unknown $d_K$-dimensional subspace for $d_K \ll d_A$. We present SOLD, a novel principled method to learn this subspace from short offline trajectories with guarantees. We then provide two methods to leverage this subspace online: LOCAL-UCB and ProBALL-UCB. We demonstrate that LOCAL-UCB enjoys $\tilde O(\min(d_A\sqrt{T}, d_K\sqrt{T}(1+\sqrt{d_AT/d_KN})))$ regret guarantees, where the effective dimension is lower when the size $N$ of the offline dataset is larger. ProBALL-UCB enjoys a slightly weaker guarantee, but is more practical and computationally efficient. Finally, we establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from MovieLens. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 40 pages. 14 pages for main paper, 26 pages for references + appendix

arXiv:2405.16250 [pdf, other]

Conformal Robust Control of Linear Systems

Authors: Yash Patel, Sahana Rayan, Ambuj Tewari

Abstract: End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Curre… ▽ More End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications in a collection of engineering control systems, specifically for airfoils and a load-positioning system. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16246 [pdf, other]

Conformalized Late Fusion Multi-View Learning

Authors: Eduardo Ochoa Rivera, Yash Patel, Ambuj Tewari

Abstract: Uncertainty quantification for multi-view learning is motivated by the increasing use of multi-view data in scientific problems. A common variant of multi-view learning is late fusion: train separate predictors on individual views and combine them after single-view predictions are available. Existing methods for uncertainty quantification for late fusion often rely on undesirable distributional as… ▽ More Uncertainty quantification for multi-view learning is motivated by the increasing use of multi-view data in scientific problems. A common variant of multi-view learning is late fusion: train separate predictors on individual views and combine them after single-view predictions are available. Existing methods for uncertainty quantification for late fusion often rely on undesirable distributional assumptions for validity. Conformal prediction is one approach that avoids such distributional assumptions. However, naively applying conformal prediction to late-stage fusion pipelines often produces overly conservative and uninformative prediction regions, limiting its downstream utility. We propose a novel methodology, Multi-View Conformal Prediction (MVCP), where conformal prediction is instead performed separately on the single-view predictors and only fused subsequently. Our framework extends the standard scalar formulation of a score function to a multivariate score that produces more efficient downstream prediction regions in both classification and regression settings. We then demonstrate that such improvements can be realized in methods built atop conformalized regressors, specifically in robust predict-then-optimize pipelines. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15050 [pdf, ps, other]

Provably Efficient Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs

Authors: Kihyuk Hong, Yufan Zhang, Ambuj Tewari

Abstract: We resolve the open problem of designing a computationally efficient algorithm for infinite-horizon average-reward linear Markov Decision Processes (MDPs) with $\widetilde{O}(\sqrt{T})$ regret. Previous approaches with $\widetilde{O}(\sqrt{T})$ regret either suffer from computational inefficiency or require strong assumptions on dynamics, such as ergodicity. In this paper, we approximate the avera… ▽ More We resolve the open problem of designing a computationally efficient algorithm for infinite-horizon average-reward linear Markov Decision Processes (MDPs) with $\widetilde{O}(\sqrt{T})$ regret. Previous approaches with $\widetilde{O}(\sqrt{T})$ regret either suffer from computational inefficiency or require strong assumptions on dynamics, such as ergodicity. In this paper, we approximate the average-reward setting by the discounted setting and show that running an optimistic value iteration-based algorithm for learning the discounted setting achieves $\widetilde{O}(\sqrt{T})$ regret when the discounting factor $γ$ is tuned appropriately. The challenge in the approximation approach is to get a regret bound with a sharp dependency on the effective horizon $1 / (1 - γ)$. We use a computationally efficient clipping operator that constrains the span of the optimistic state value function estimate to achieve a sharp regret bound in terms of the effective horizon, which leads to $\widetilde{O}(\sqrt{T})$ regret. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14066 [pdf, ps, other]

Online Classification with Predictions

Authors: Vinod Raman, Ambuj Tewari

Abstract: We study online classification when the learner has access to predictions about future examples. We design an online learner whose expected regret is never worse than the worst-case regret, gracefully improves with the quality of the predictions, and can be significantly better than the worst-case regret when the predictions of future examples are accurate. As a corollary, we show that if the lear… ▽ More We study online classification when the learner has access to predictions about future examples. We design an online learner whose expected regret is never worse than the worst-case regret, gracefully improves with the quality of the predictions, and can be significantly better than the worst-case regret when the predictions of future examples are accurate. As a corollary, we show that if the learner is always guaranteed to observe data where future examples are easily predictable, then online learning can be as easy as transductive online learning. Our results complement recent work in online algorithms with predictions and smoothed online classification, which go beyond a worse-case analysis by using machine-learned predictions and distributional assumptions respectively. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 24 pages

arXiv:2403.01636 [pdf, other]

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Authors: Ziping Xu, Zifan Xu, Runxuan Jiang, Peter Stone, Ambuj Tewari

Abstract: Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses thi… ▽ More Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like $ε$-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency. △ Less

Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.09467 [pdf, other]

Optimal Thresholding Linear Bandit

Authors: Eduardo Ochoa Rivera, Ambuj Tewari

Abstract: We study a novel pure exploration problem: the $ε$-Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal. We study a novel pure exploration problem: the $ε$-Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2006.16073 by other authors

arXiv:2402.06614 [pdf, other]

The Complexity of Sequential Prediction in Dynamical Systems

Authors: Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the rea… ▽ More We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 35 pages

arXiv:2402.04493 [pdf, ps, other]

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

Authors: Kihyuk Hong, Ambuj Tewari

Abstract: We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $ε$-optimal policy with $O(ε^{-2})$ s… ▽ More We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $ε$-optimal policy with $O(ε^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(ε^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(ε^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals. △ Less

Submitted 2 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03282 [pdf, ps, other]

A Theoretical Framework for Partially Observed Reward-States in RLHF

Authors: Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari

Abstract: The growing deployment of reinforcement learning from human feedback (RLHF) calls for a deeper theoretical investigation of its underlying models. The prevalent models of RLHF do not account for neuroscience-backed, partially-observed "internal states" that can affect human feedback, nor do they accommodate intermediate feedback during an interaction. Both of these can be instrumental in speeding… ▽ More The growing deployment of reinforcement learning from human feedback (RLHF) calls for a deeper theoretical investigation of its underlying models. The prevalent models of RLHF do not account for neuroscience-backed, partially-observed "internal states" that can affect human feedback, nor do they accommodate intermediate feedback during an interaction. Both of these can be instrumental in speeding up learning and improving alignment. To address these limitations, we model RLHF as reinforcement learning with partially observed reward-states (PORRL). We accommodate two kinds of feedback $-$ cardinal and dueling feedback. We first demonstrate that PORRL subsumes a wide class of RL problems, including traditional RL, RLHF, and reward machines. For cardinal feedback, we present two model-based methods (POR-UCRL, POR-UCBVI). We give both cardinal regret and sample complexity guarantees for the methods, showing that they improve over naive history-summarization. We then discuss the benefits of a model-free method like GOLF with naive history-summarization in settings with recursive internal states and dense intermediate feedback. For this purpose, we define a new history aware version of the Bellman-eluder dimension and give a new guarantee for GOLF in our setting, which can be exponentially sharper in illustrative examples. For dueling feedback, we show that a naive reduction to cardinal feedback fails to achieve sublinear dueling regret. We then present the first explicit reduction that converts guarantees for cardinal regret to dueling regret. In both feedback settings, we show that our models and guarantees generalize and extend existing ones. △ Less

Submitted 27 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 58 pages. 12 pages formain paper, 46 pages for references + appendix

arXiv:2310.19064 [pdf, other]

Apple Tasting: Combinatorial Dimensions and Minimax Rates

Authors: Vinod Raman, Unique Subedi, Ananth Raman, Ambuj Tewari

Abstract: In online binary classification under \emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tast… ▽ More In online binary classification under \emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \emph{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes of any learner, under apple tasting feedback, can be $Θ(1), Θ(\sqrt{T})$, or $Θ(T)$. This is in contrast to the full-information realizable setting where only $Θ(1)$ and $Θ(T)$ are possible. △ Less

Submitted 18 June, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: 21 pages, COLT 2024 Camera Ready

arXiv:2310.13088 [pdf, other]

Sequence Length Independent Norm-Based Generalization Bounds for Transformers

Authors: Jacob Trauger, Ambuj Tewari

Abstract: This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization… ▽ More This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 18 pages

arXiv:2310.10003 [pdf, other]

Conformal Contextual Robust Optimization

Authors: Yash Patel, Sahana Rayan, Ambuj Tewari

Abstract: Data-driven approaches to predict-then-optimize decision-making problems seek to mitigate the risk of uncertainty region misspecification in safety-critical settings. Current approaches, however, suffer from considering overly conservative uncertainty regions, often resulting in suboptimal decisionmaking. To this end, we propose Conformal-Predict-Then-Optimize (CPO), a framework for leveraging hig… ▽ More Data-driven approaches to predict-then-optimize decision-making problems seek to mitigate the risk of uncertainty region misspecification in safety-critical settings. Current approaches, however, suffer from considering overly conservative uncertainty regions, often resulting in suboptimal decisionmaking. To this end, we propose Conformal-Predict-Then-Optimize (CPO), a framework for leveraging highly informative, nonconvex conformal prediction regions over high-dimensional spaces based on conditional generative models, which have the desired distribution-free coverage guarantees. Despite guaranteeing robustness, such black-box optimization procedures alone inspire little confidence owing to the lack of explanation of why a particular decision was found to be optimal. We, therefore, augment CPO to additionally provide semantically meaningful visual summaries of the uncertainty regions to give qualitative intuition for the optimal decision. We highlight the CPO framework by demonstrating results on a suite of simulation-based inference benchmark tasks and a vehicle routing task based on probabilistic weather prediction. △ Less

Submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.07852 [pdf, other]

On the Computational Complexity of Private High-dimensional Model Selection

Authors: Saptarshi Roy, Zehua Wang, Ambuj Tewari

Abstract: We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private best subset selection method with strong utility properties by adopting the well-known exponential mechanism for selecting the best model. We propose an efficient Metropolis-Hastings algorithm and establish that it enjoys polynomial mixing t… ▽ More We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private best subset selection method with strong utility properties by adopting the well-known exponential mechanism for selecting the best model. We propose an efficient Metropolis-Hastings algorithm and establish that it enjoys polynomial mixing time to its stationary distribution. Furthermore, we also establish approximate differential privacy for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments that show the strong utility of our algorithm. △ Less

Submitted 23 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 27 pages, 2 figures

arXiv:2309.06548 [pdf, ps, other]

Online Infinite-Dimensional Regression: Learning Linear Operators

Authors: Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: We consider the problem of learning linear operators under squared loss between two infinite-dimensional Hilbert spaces in the online setting. We show that the class of linear operators with uniformly bounded $p$-Schatten norm is online learnable for any $p \in [1, \infty)$. On the other hand, we prove an impossibility result by showing that the class of uniformly bounded linear operators with res… ▽ More We consider the problem of learning linear operators under squared loss between two infinite-dimensional Hilbert spaces in the online setting. We show that the class of linear operators with uniformly bounded $p$-Schatten norm is online learnable for any $p \in [1, \infty)$. On the other hand, we prove an impossibility result by showing that the class of uniformly bounded linear operators with respect to the operator norm is \textit{not} online learnable. Moreover, we show a separation between sequential uniform convergence and online learnability by identifying a class of bounded linear operators that is online learnable but uniform convergence does not hold. Finally, we prove that the impossibility result and the separation between uniform convergence and learnability also hold in the batch setting. △ Less

Submitted 24 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: 21 pages, ALT 2024 Camera Ready

arXiv:2309.02425 [pdf, ps, other]

On the Minimax Regret in Online Ranking with Top-k Feedback

Authors: Mingyuan Zhang, Ambuj Tewari

Abstract: In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to ana… ▽ More In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n. △ Less

Submitted 12 April, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.04620 [pdf, other]

Multiclass Online Learnability under Bandit Feedback

Authors: Ananth Raman, Vinod Raman, Unique Subedi, Idan Mehalel, Ambuj Tewari

Abstract: We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not su… ▽ More We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded. △ Less

Submitted 20 January, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: 16 pages, ALT 2024 Camera Ready

arXiv:2306.07818 [pdf, other]

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari

Abstract: Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function… ▽ More Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning. △ Less

Submitted 19 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

arXiv:2306.06247 [pdf, ps, other]

Online Learning with Set-Valued Feedback

Authors: Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} ev… ▽ More We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension characterizes online learnability in the agnostic setting and tightly quantifies the minimax regret. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response. △ Less

Submitted 18 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted to COLT 2024

arXiv:2305.14275 [pdf, other]

Variational Inference with Coverage Guarantees in Simulation-Based Inference

Authors: Yash Patel, Declan McNamara, Jackson Loper, Jeffrey Regier, Ambuj Tewari

Abstract: Amortized variational inference is an often employed framework in simulation-based inference that produces a posterior approximation that can be rapidly computed given any new observation. Unfortunately, there are few guarantees about the quality of these approximate posteriors. We propose Conformalized Amortized Neural Variational Inference (CANVI), a procedure that is scalable, easily implemente… ▽ More Amortized variational inference is an often employed framework in simulation-based inference that produces a posterior approximation that can be rapidly computed given any new observation. Unfortunately, there are few guarantees about the quality of these approximate posteriors. We propose Conformalized Amortized Neural Variational Inference (CANVI), a procedure that is scalable, easily implemented, and provides guaranteed marginal coverage. Given a collection of candidate amortized posterior approximators, CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. CANVI ensures that the resulting predictor constructs regions that contain the truth with a user-specified level of probability. CANVI is agnostic to design decisions in formulating the candidate approximators and only requires access to samples from the forward model, permitting its use in likelihood-free settings. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation. Finally, we demonstrate the accurate calibration and high predictive efficiency of CANVI on a suite of simulation-based inference benchmark tasks and an important scientific task: analyzing galaxy emission spectra. △ Less

Submitted 25 July, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.03337 [pdf, ps, other]

On the Learnability of Multilabel Ranking

Authors: Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: Multilabel ranking is a central task in machine learning. However, the most fundamental question of learnability in a multilabel ranking setting with relevance-score feedback remains unanswered. In this work, we characterize the learnability of multilabel ranking problems in both batch and online settings for a large family of ranking losses. Along the way, we give two equivalence classes of ranki… ▽ More Multilabel ranking is a central task in machine learning. However, the most fundamental question of learnability in a multilabel ranking setting with relevance-score feedback remains unanswered. In this work, we characterize the learnability of multilabel ranking problems in both batch and online settings for a large family of ranking losses. Along the way, we give two equivalence classes of ranking losses based on learnability that capture most, if not all, losses used in practice. △ Less

Submitted 25 May, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 28 pages

arXiv:2303.17716 [pdf, ps, other]

Multiclass Online Learning and Uniform Convergence

Authors: Steve Hanneke, Shay Moran, Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. W… ▽ More We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, Pál, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero. △ Less

Submitted 7 July, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: COLT Camera-Ready, 15 pages

arXiv:2302.07409 [pdf, other]

Quantum Learning Theory Beyond Batch Binary Classification

Authors: Preetham Mohan, Ambuj Tewari

Abstract: Arunachalam and de Wolf (2018) showed that the sample complexity of quantum batch learning of boolean functions, in the realizable and agnostic settings, has the same form and order as the corresponding classical sample complexities. In this paper, we extend this, ostensibly surprising, message to batch multiclass learning, online boolean learning, and online multiclass learning. For our online le… ▽ More Arunachalam and de Wolf (2018) showed that the sample complexity of quantum batch learning of boolean functions, in the realizable and agnostic settings, has the same form and order as the corresponding classical sample complexities. In this paper, we extend this, ostensibly surprising, message to batch multiclass learning, online boolean learning, and online multiclass learning. For our online learning results, we first consider an adaptive adversary variant of the classical model of Dawid and Tewari (2022). Then, we introduce the first (to the best of our knowledge) model of online learning with quantum examples. △ Less

Submitted 26 December, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 30 pages, 2 figures, 2 tables; v4: entirely reorganized paper with more detailed proofs; handles the adversary-provides-a-distribution model independently;

arXiv:2302.02033 [pdf, other]

An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem

Authors: Gang Qiao, Ambuj Tewari

Abstract: We study the convex hull membership (CHM) problem in the pure exploration setting where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHM problem in the one-dimensional case. We present the first asymptotically optimal algorithm called Thompson-… ▽ More We study the convex hull membership (CHM) problem in the pure exploration setting where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHM problem in the one-dimensional case. We present the first asymptotically optimal algorithm called Thompson-CHM, whose modular design consists of a stopping rule and a sampling rule. In addition, we extend the algorithm to settings that generalize several important problems in the multi-armed bandit literature. Furthermore, we discuss the extension of Thompson-CHM to higher dimensions. Finally, we provide numerical experiments to demonstrate the empirical behavior of the algorithm matches our theoretical results for realistic time horizons. △ Less

Submitted 23 May, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

arXiv:2301.06259 [pdf, other]

Understanding Best Subset Selection: A Tale of Two C(omplex)ities

Authors: Saptarshi Roy, Ambuj Tewari, Ziwei Zhu

Abstract: For decades, best subset selection (BSS) has eluded statisticians mainly due to its computational bottleneck. However, until recently, modern computational breakthroughs have rekindled theoretical interest in BSS and have led to new findings. Recently, \cite{guo2020best} showed that the model selection performance of BSS is governed by a margin quantity that is robust to the design dependence, unl… ▽ More For decades, best subset selection (BSS) has eluded statisticians mainly due to its computational bottleneck. However, until recently, modern computational breakthroughs have rekindled theoretical interest in BSS and have led to new findings. Recently, \cite{guo2020best} showed that the model selection performance of BSS is governed by a margin quantity that is robust to the design dependence, unlike modern methods such as LASSO, SCAD, MCP, etc. Motivated by their theoretical results, in this paper, we also study the variable selection properties of best subset selection for high-dimensional sparse linear regression setup. We show that apart from the identifiability margin, the following two complexity measures play a fundamental role in characterizing the margin condition for model consistency: (a) complexity of \emph{residualized features}, (b) complexity of \emph{spurious projections}. In particular, we establish a simple margin condition that depends only on the identifiability margin and the dominating one of the two complexity measures. Furthermore, we show that a margin condition depending on similar margin quantity and complexity measures is also necessary for model consistency of BSS. For a broader understanding, we also consider some simple illustrative examples to demonstrate the variation in the complexity measures that refines our theoretical understanding of the model selection performance of BSS under different correlation structures. △ Less

Submitted 17 July, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 46 pages, 2 Figures

arXiv:2301.02729 [pdf, ps, other]

A Characterization of Multioutput Learnability

Authors: Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: We consider the problem of learning multioutput function classes in batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online setting… ▽ More We consider the problem of learning multioutput function classes in batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online settings. As an extension, we also consider multilabel learnability in the bandit feedback setting and show a similar characterization as in the full-feedback setting. △ Less

Submitted 22 October, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: 37 pages

arXiv:2211.16583 [pdf, other]

Offline Policy Evaluation and Optimization under Confounding

Authors: Chinmaya Kausik, Yangyi Lu, Kevin Tan, Maggie Makar, Yixin Wang, Ambuj Tewari

Abstract: Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline… ▽ More Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline policy evaluation for confounded MDPs, distinguishing assumptions on confounding based on whether they are memoryless and on their effect on the data-collection policies. We characterize settings where consistent value estimates are provably not achievable, and provide algorithms with guarantees to instead estimate lower bounds on the value. When consistent estimates are achievable, we provide algorithms for value estimation with sample complexity guarantees. We also present new algorithms for offline policy improvement and prove local convergence guarantees. Finally, we experimentally evaluate our algorithms on both a gridworld environment and a simulated healthcare setting of managing sepsis patients. In gridworld, our model-based method provides tighter lower bounds than existing methods, while in the sepsis simulator, our methods significantly outperform confounder-oblivious benchmarks. △ Less

Submitted 6 November, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Overhauled terminology and presentation, strengthened presentation of results

arXiv:2211.10771 [pdf, other]

RL Boltzmann Generators for Conformer Generation in Data-Sparse Environments

Authors: Yash Patel, Ambuj Tewari

Abstract: The generation of conformers has been a long-standing interest to structural chemists and biologists alike. A subset of proteins known as intrinsically disordered proteins (IDPs) fail to exhibit a fixed structure and, therefore, must also be studied in this light of conformer generation. Unlike in the small molecule setting, ground truth data are sparse in the IDP setting, undermining many existin… ▽ More The generation of conformers has been a long-standing interest to structural chemists and biologists alike. A subset of proteins known as intrinsically disordered proteins (IDPs) fail to exhibit a fixed structure and, therefore, must also be studied in this light of conformer generation. Unlike in the small molecule setting, ground truth data are sparse in the IDP setting, undermining many existing conformer generation methods that rely on such data for training. Boltzmann generators, trained solely on the energy function, serve as an alternative but display a mode collapse that similarly preclude their direct application to IDPs. We investigate the potential of training an RL Boltzmann generator against a closely related "Gibbs score," and demonstrate that conformer coverage does not track well with such training. This suggests that the inadequacy of solely training against the energy is independent of the modeling modality △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: Accepted to the NeurIPS 2022 Workshop on Machine Learning in Structural Biology

arXiv:2211.09403 [pdf, other]

Learning Mixtures of Markov Chains and MDPs

Authors: Chinmaya Kausik, Kevin Tan, Ambuj Tewari

Abstract: We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refineme… ▽ More We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refinement using the EM algorithm, (3) a model estimation step, and (4) a classification step for predicting labels of new trajectories. We provide end-to-end performance guarantees, where we only explicitly require the length of trajectories to be linear in the number of states and the number of trajectories to be linear in a mixing time parameter. Experimental results support these guarantees, where we attain 96.6% average accuracy on a mixture of two MDPs in gridworld, outperforming the EM algorithm with random initialization (73.2% average accuracy). △ Less

Submitted 6 February, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 51 pages (13 page paper, 38 page appendix). Paper restructured and refined, corrections made to proofs, experiments added

arXiv:2211.05964 [pdf, other]

Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

Authors: Sunrit Chakraborty, Saptarshi Roy, Ambuj Tewari

Abstract: We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guaran… ▽ More We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones. △ Less

Submitted 28 January, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 38 pages, 4 figures

arXiv:2211.05656 [pdf, other]

On Proper Learnability between Average- and Worst-case Robustness

Authors: Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract: Recently, Montasser et al. [2019] showed that finite VC dimension is not sufficient for proper adversarially robust PAC learning. In light of this hardness, there is a growing effort to study what type of relaxations to the adversarially robust PAC learning setup can enable proper learnability. In this work, we initiate the study of proper learning under relaxations of the worst-case robust loss.… ▽ More Recently, Montasser et al. [2019] showed that finite VC dimension is not sufficient for proper adversarially robust PAC learning. In light of this hardness, there is a growing effort to study what type of relaxations to the adversarially robust PAC learning setup can enable proper learnability. In this work, we initiate the study of proper learning under relaxations of the worst-case robust loss. We give a family of robust loss relaxations under which VC classes are properly PAC learnable with sample complexity close to what one would require in the standard PAC learning setup. On the other hand, we show that for an existing and natural relaxation of the worst-case robust loss, finite VC dimension is not sufficient for proper learning. Lastly, we give new generalization guarantees for the adversarially robust empirical risk minimizer. △ Less

Submitted 25 May, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 19 pages

arXiv:2205.15113 [pdf, other]

Online Agnostic Multiclass Boosting

Authors: Vinod Raman, Ambuj Tewari

Abstract: Boosting is a fundamental approach in machine learning that enjoys both strong theoretical and practical guarantees. At a high-level, boosting algorithms cleverly aggregate weak learners to generate predictions with arbitrarily high accuracy. In this way, boosting algorithms convert weak learners into strong ones. Recently, Brukhim et al. extended boosting to the online agnostic binary classificat… ▽ More Boosting is a fundamental approach in machine learning that enjoys both strong theoretical and practical guarantees. At a high-level, boosting algorithms cleverly aggregate weak learners to generate predictions with arbitrarily high accuracy. In this way, boosting algorithms convert weak learners into strong ones. Recently, Brukhim et al. extended boosting to the online agnostic binary classification setting. A key ingredient in their approach is a clean and simple reduction to online convex optimization, one that efficiently converts an arbitrary online convex optimizer to an agnostic online booster. In this work, we extend this reduction to multiclass problems and give the first boosting algorithm for online agnostic mutliclass classification. Our reduction also enables the construction of algorithms for statistical agnostic, online realizable, and statistical realizable multiclass boosting. △ Less

Submitted 17 October, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Camera-Ready Version

arXiv:2205.14829 [pdf, other]

Adaptive Sampling for Discovery

Authors: Ziping Xu, Eunjae Shim, Ambuj Tewari, Paul Zimmerman

Abstract: In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms f… ▽ More In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions. △ Less

Submitted 2 January, 2023; v1 submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.14775 [pdf, other]

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Authors: Kihyuk Hong, Yuhang Li, Ambuj Tewari

Abstract: We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret… ▽ More We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on the non-stationary kernel bandit setting. Moreover, when applied to the non-stationary linear bandit setting by using a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity. △ Less

Submitted 19 February, 2023; v1 submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.00894 [pdf, other]

Modeling and mitigation of occupational safety risks in dynamic industrial environments

Authors: Ashutosh Tewari, Antonio R. Paiva

Abstract: Identifying and mitigating safety risks is paramount in a number of industries. In addition to guidelines and best practices, many industries already have safety management systems (SMSs) designed to monitor and reinforce good safety behaviors. The analytic capabilities to analyze the data acquired through such systems, however, are still lacking in terms of their ability to robustly quantify risk… ▽ More Identifying and mitigating safety risks is paramount in a number of industries. In addition to guidelines and best practices, many industries already have safety management systems (SMSs) designed to monitor and reinforce good safety behaviors. The analytic capabilities to analyze the data acquired through such systems, however, are still lacking in terms of their ability to robustly quantify risks posed by various occupational hazards. Moreover, best practices and modern SMSs are unable to account for dynamically evolving environments/behavioral characteristics commonly found in many industrial settings. This article proposes a method to address these issues by enabling continuous and quantitative assessment of safety risks in a data-driven manner. The backbone of our method is an intuitive hierarchical probabilistic model that explains sparse and noisy safety data collected by a typical SMS. A fully Bayesian approach is developed to calibrate this model from safety data in an online fashion. Thereafter, the calibrated model holds necessary information that serves to characterize risk posed by different safety hazards. Additionally, the proposed model can be leveraged for automated decision making, for instance solving resource allocation problems -- targeted towards risk mitigation -- that are often encountered in resource-constrained industrial environments. The methodology is rigorously validated on a simulated test-bed and its scalability is demonstrated on real data from large maintenance projects at a petrochemical plant. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.06664 [pdf, other]

Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms

Authors: Laura Niss, Yuekai Sun, Ambuj Tewari

Abstract: Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately re… ▽ More Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting. △ Less

Submitted 13 April, 2022; originally announced April 2022.

arXiv:2203.10416 [pdf, other]

doi 10.1016/j.ssci.2022.105737

Methodology for Testing and Evaluation of Safety Analytics Approaches

Authors: Antonio R. Paiva, Ashutosh Tewari

Abstract: There has been a significant increase in the development of data-driven safety analytics approaches in recent years. In light of these advances it has become imperative to evaluate such approaches in a principled way to determine their merits and limitations. To that end, we propose an evaluation methodology underpinned by a simulated environment that allows for a comprehensive assessment of safet… ▽ More There has been a significant increase in the development of data-driven safety analytics approaches in recent years. In light of these advances it has become imperative to evaluate such approaches in a principled way to determine their merits and limitations. To that end, we propose an evaluation methodology underpinned by a simulated environment that allows for a comprehensive assessment of safety analytics approaches. While assessing those approaches with historical field data is undoubtedly important, such an assessment has limited statistical power because it corresponds to only one realization. The proposed methodology enables validation over a large number of realizations, thereby circumventing the statistical limitations of evaluation on historical data. Moreover, by using a simulated environment one is able to clearly distinguish between the variability in the observed data and differences in performance between safety approaches. A simulated environment does this by comparing the approaches under controlled circumstances, resulting in a fair and systematic evaluation of the potential long-term benefits. We demonstrate the utility of the proposed methodology via a case study that compares a few candidate safety analytics approaches. These approaches differ in how they assimilate field safety data to assess safety risk and suggest mitigative actions. We show that the proposed methodology indeed reveals useful insights and quantifies the relative merits and drawbacks of the different approaches, which would be otherwise difficult to objectively determine in a real-world scenario. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted to Safety Science

arXiv:2112.10955 [pdf, other]

Joint Learning of Linear Time-Invariant Dynamical Systems

Authors: Aditya Modi, Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the trans… ▽ More Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the transition matrices of multiple systems. It is assumed that the transition matrices are unknown linear functions of some unknown shared basis matrices. We establish finite-time estimation error rates that fully reflect the roles of trajectory lengths, dimension, and number of systems under consideration. The presented results are fairly general and show the significant gains that can be achieved by pooling data across systems in comparison to learning each system individually. Further, they are shown to be robust against model misspecifications. To obtain the results, we develop novel techniques that are of interest for addressing similar joint-learning problems. They include tightly bounding estimation errors in terms of the eigen-structures of transition matrices, establishing sharp high probability bounds for singular values of dependent random matrices, and capturing effects of misspecified transition matrices as the systems evolve over time. △ Less

Submitted 2 January, 2024; v1 submitted 20 December, 2021; originally announced December 2021.

arXiv:2112.10314 [pdf, other]

Balancing Adaptability and Non-exploitability in Repeated Games

Authors: Anthony DiGiovanni, Ambuj Tewari

Abstract: We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value. Our solution is an expert algorithm (LAFF) that searches within a set of… ▽ More We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value. Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent. With benchmarks that depend on the opponent class, we show that LAFF has sublinear regret uniformly over the possible opponents, except exploitative ones, for which we guarantee that the opponent has linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning. △ Less

Submitted 2 July, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted at Uncertainty in Artificial Intelligence 2022

arXiv:2111.07126 [pdf, other]

On the Statistical Benefits of Curriculum Learning

Authors: Ziping Xu, Ambuj Tewari

Abstract: Curriculum learning (CL) is a commonly used machine learning training strategy. However, we still lack a clear theoretical understanding of CL's benefits. In this paper, we study the benefits of CL in the multitask linear regression problem under both structured and unstructured settings. For both settings, we derive the minimax rates for CL with the oracle that provides the optimal curriculum and… ▽ More Curriculum learning (CL) is a commonly used machine learning training strategy. However, we still lack a clear theoretical understanding of CL's benefits. In this paper, we study the benefits of CL in the multitask linear regression problem under both structured and unstructured settings. For both settings, we derive the minimax rates for CL with the oracle that provides the optimal curriculum and without the oracle, where the agent has to adaptively learn a good curriculum. Our results reveal that adaptive learning can be fundamentally harder than the oracle learning in the unstructured setting, but it merely introduces a small extra term in the structured setting. To connect theory with practice, we provide justification for a popular empirical method that selects tasks with highest local prediction gain by comparing its guarantees with the minimax rates mentioned above. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2108.04782 [pdf, ps, other]

Bandit Algorithms for Precision Medicine

Authors: Yangyi Lu, Ziping Xu, Ambuj Tewari

Abstract: The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular profiling." It is not an entirely new idea: physicians from ancient times have recognized that medical treatment needs to consider individual variations in patient characteristics. However, the m… ▽ More The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular profiling." It is not an entirely new idea: physicians from ancient times have recognized that medical treatment needs to consider individual variations in patient characteristics. However, the modern precision medicine movement has been enabled by a confluence of events: scientific advances in fields such as genetics and pharmacology, technological advances in mobile devices and wearable sensors, and methodological advances in computing and data sciences. This chapter is about bandit algorithms: an area of data science of special relevance to precision medicine. With their roots in the seminal work of Bellman, Robbins, Lai and others, bandit algorithms have come to occupy a central place in modern data science ( Lattimore and Szepesvari, 2020). Bandit algorithms can be used in any situation where treatment decisions need to be made to optimize some health outcome. Since precision medicine focuses on the use of patient characteristics to guide treatment, contextual bandit algorithms are especially useful since they are designed to take such information into account. The role of bandit algorithms in areas of precision medicine such as mobile health and digital phenotyping has been reviewed before (Tewari and Murphy, 2017; Rabbi et al., 2019). Since these reviews were published, bandit algorithms have continued to find uses in mobile health and several new topics have emerged in the research on bandit algorithms. This chapter is written for quantitative researchers in fields such as statistics, machine learning, and operations research who might be interested in knowing more about the algorithmic and mathematical details of bandit algorithms that have been used in mobile health. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: To appear as a chapter in the Handbook of Statistical Methods for Precision Medicine edited by Tianxi Cai, Bibhas Chakraborty, Eric Laber, Erica Moodie, and Mark van der Laan

arXiv:2106.02988 [pdf, other]

Causal Bandits with Unknown Graph Structure

Authors: Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Abstract: In causal bandit problems, the action set consists of interventions on variables of a causal graph. Several researchers have recently studied such bandit problems and pointed out their practical applications. However, all existing works rely on a restrictive and impractical assumption that the learner is given full knowledge of the causal graph structure upfront. In this paper, we develop novel ca… ▽ More In causal bandit problems, the action set consists of interventions on variables of a causal graph. Several researchers have recently studied such bandit problems and pointed out their practical applications. However, all existing works rely on a restrictive and impractical assumption that the learner is given full knowledge of the causal graph structure upfront. In this paper, we develop novel causal bandit algorithms without knowing the causal graph. Our algorithms work well for causal trees, causal forests and a general class of causal graphs. The regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit (MAB) algorithms under mild conditions. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard MAB algorithms. △ Less

Submitted 9 November, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

Comments: Accepted to NeurIPS 2021

arXiv:2105.14989 [pdf, other]

Representation Learning Beyond Linear Prediction Functions

Authors: Ziping Xu, Ambuj Tewari

Abstract: Recent papers on the theory of representation learning has shown the importance of a quantity called diversity when generalizing from a set of source tasks to a target task. Most of these papers assume that the function mapping shared representations to predictions is linear, for both source and target tasks. In practice, researchers in deep learning use different numbers of extra layers following… ▽ More Recent papers on the theory of representation learning has shown the importance of a quantity called diversity when generalizing from a set of source tasks to a target task. Most of these papers assume that the function mapping shared representations to predictions is linear, for both source and target tasks. In practice, researchers in deep learning use different numbers of extra layers following the pretrained model based on the difficulty of the new task. This motivates us to ask whether diversity can be achieved when source tasks and the target task use different prediction function spaces beyond linear functions. We show that diversity holds even if the target task uses a neural network with multiple layers, as long as source tasks use linear functions. If source tasks use nonlinear prediction functions, we provide a negative result by showing that depth-1 neural networks with ReLu activation function need exponentially many source tasks to achieve diversity. For a general function class, we find that eluder dimension gives a lower bound on the number of tasks required for diversity. Our theoretical results imply that simpler tasks generalize better. Though our theoretical results are shown for the global minimizer of empirical risks, their qualitative predictions still hold true for gradient-based optimization algorithms as verified by our simulations on deep neural networks. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 1 Figure

arXiv:2102.07663 [pdf, other]

Causal Markov Decision Processes: Learning Good Interventions Efficiently

Authors: Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Abstract: We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relati… ▽ More We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an $\tilde{O}(HS\sqrt{ZT})$ regret bound, where $T$ is the the total time steps, $H$ is the episodic horizon, and $S$ is the cardinality of the state space. Notably, our regret bound does not scale with the size of actions/interventions ($A$), but only scales with a causal graph dependent quantity $Z$ which can be exponentially smaller than $A$. By extending C-UCBVI to the factored MDP setting, we propose the causal factored UCBVI (CF-UCBVI) algorithm, which further reduces the regret exponentially in terms of $S$. Furthermore, we show that RL algorithms for linear MDP problems can also be incorporated in C-MDPs. We empirically show the benefit of our causal approaches in various settings to validate our algorithms and theoretical results. △ Less

Submitted 15 February, 2021; originally announced February 2021.

arXiv:2010.08048 [pdf, other]

Decision Making Problems with Funnel Structure: A Multi-Task Learning Approach with Application to Email Marketing Campaigns

Authors: Ziping Xu, Amirhossein Meisami, Ambuj Tewari

Abstract: This paper studies the decision making problem with Funnel Structure. Funnel structure, a well-known concept in the marketing field, occurs in those systems where the decision maker interacts with the environment in a layered manner receiving far fewer observations from deep layers than shallow ones. For example, in the email marketing campaign application, the layers correspond to Open, Click and… ▽ More This paper studies the decision making problem with Funnel Structure. Funnel structure, a well-known concept in the marketing field, occurs in those systems where the decision maker interacts with the environment in a layered manner receiving far fewer observations from deep layers than shallow ones. For example, in the email marketing campaign application, the layers correspond to Open, Click and Purchase events. Conversions from Click to Purchase happen very infrequently because a purchase cannot be made unless the link in an email is clicked on. We formulate this challenging decision making problem as a contextual bandit with funnel structure and develop a multi-task learning algorithm that mitigates the lack of sufficient observations from deeper layers. We analyze both the prediction error and the regret of our algorithms. We verify our theory on prediction errors through a simple simulation. Experiments on both a simulated environment and an environment based on real-world data from a major email marketing company show that our algorithms offer significant improvement over previous methods. △ Less

Submitted 31 January, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

arXiv:2008.04489 [pdf, other]

Federated Learning via Synthetic Data

Authors: Jack Goetz, Ambuj Tewari

Abstract: Federated learning allows for the training of a model using data on multiple clients without the clients transmitting that raw data. However the standard method is to transmit model parameters (or updates), which for modern neural networks can be on the scale of millions of parameters, inflicting significant computational costs on the clients. We propose a method for federated learning where inste… ▽ More Federated learning allows for the training of a model using data on multiple clients without the clients transmitting that raw data. However the standard method is to transmit model parameters (or updates), which for modern neural networks can be on the scale of millions of parameters, inflicting significant computational costs on the clients. We propose a method for federated learning where instead of transmitting a gradient update back to the server, we instead transmit a small amount of synthetic `data'. We describe the procedure and show some experimental results suggesting this procedure has potential, providing more than an order of magnitude reduction in communication costs with minimal model degradation. △ Less

Submitted 26 September, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

arXiv:2006.07078 [pdf, other]

TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search

Authors: Tarun Gogineni, Ziping Xu, Exequiel Punzalan, Runxuan Jiang, Joshua Kammeraad, Ambuj Tewari, Paul Zimmerman

Abstract: Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to genera… ▽ More Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score. Our experimental results show that TorsionNet outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy. △ Less

Submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.02948 [pdf, other]

Low-Rank Generalized Linear Bandit Problems

Authors: Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Abstract: In a low-rank linear bandit problem, the reward of an action (represented by a matrix of size $d_1 \times d_2$) is the inner product between the action and an unknown low-rank matrix $Θ^*$. We propose an algorithm based on a novel combination of online-to-confidence-set conversion~\citep{abbasi2012online} and the exponentially weighted average forecaster constructed by a covering of low-rank matri… ▽ More In a low-rank linear bandit problem, the reward of an action (represented by a matrix of size $d_1 \times d_2$) is the inner product between the action and an unknown low-rank matrix $Θ^*$. We propose an algorithm based on a novel combination of online-to-confidence-set conversion~\citep{abbasi2012online} and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. In $T$ rounds, our algorithm achieves $\widetilde{O}((d_1+d_2)^{3/2}\sqrt{rT})$ regret that improves upon the standard linear bandit regret bound of $\widetilde{O}(d_1d_2\sqrt{T})$ when the rank of $Θ^*$: $r \ll \min\{d_1,d_2\}$. We also extend our algorithmic approach to the generalized linear setting to get an algorithm which enjoys a similar bound under regularity conditions on the link function. To get around the computational intractability of covering based approaches, we propose an efficient algorithm by extending the "Explore-Subspace-Then-Refine" algorithm of~\citet{jun2019bilinear}. Our efficient algorithm achieves $\widetilde{O}((d_1+d_2)^{3/2}\sqrt{rT})$ regret under a mild condition on the action set $\mathcal{X}$ and the $r$-th singular value of $Θ^*$. Our upper bounds match the conjectured lower bound of \cite{jun2019bilinear} for a subclass of low-rank linear bandit problems. Further, we show that existing lower bounds for the sparse linear bandit problem strongly suggest that our regret bounds are unimprovable. To complement our theoretical contributions, we also conduct experiments to demonstrate that our algorithm can greatly outperform the performance of the standard linear bandit approach when $Θ^*$ is low-rank. △ Less

Submitted 19 October, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

arXiv:2006.01980 [pdf, other]

On the Equivalence between Online and Private Learnability beyond Binary Classification

Authors: Young Hun Jung, Baekjin Kim, Ambuj Tewari

Abstract: Alon et al. [2019] and Bun et al. [2020] recently showed that online learnability and private PAC learnability are equivalent in binary classification. We investigate whether this equivalence extends to multi-class classification and regression. First, we show that private learnability implies online learnability in both settings. Our extension involves studying a novel variant of the Littlestone… ▽ More Alon et al. [2019] and Bun et al. [2020] recently showed that online learnability and private PAC learnability are equivalent in binary classification. We investigate whether this equivalence extends to multi-class classification and regression. First, we show that private learnability implies online learnability in both settings. Our extension involves studying a novel variant of the Littlestone dimension that depends on a tolerance parameter and on an appropriate generalization of the concept of threshold functions beyond binary classification. Second, we show that while online learnability continues to imply private learnability in multi-class classification, current proof techniques encounter significant hurdles in the regression setting. While the equivalence for regression remains open, we provide non-trivial sufficient conditions for an online learnable class to also be privately learnable. △ Less

Submitted 8 October, 2021; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: An earlier version of this manuscript claimed an upper bound over the sample complexity that is exponential in the Littlestone dimension. The argument contained a technical mistake, and the current version presents a correction that deteriorates the dependence on the Littlestone dimension from exponential to doubly exponential. arXiv admin note: text overlap with arXiv:2003.00563 by other authors

Showing 1–50 of 101 results for author: Tewari, A