Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 58 results for author: György, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06811  [pdf, other

    cs.LG

    Learning Continually by Spectral Regularization

    Authors: Alex Lewandowski, Saurabh Kumar, Dale Schuurmans, András György, Marlos C. Machado

    Abstract: Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early ph… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2406.02543  [pdf, other

    cs.LG cs.AI cs.CL

    To Believe or Not to Believe Your LLM

    Authors: Yasin Abbasi Yadkori, Ilja Kuzborskij, András György, Csaba Szepesvári

    Abstract: We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.01563  [pdf, other

    cs.LG cs.AI cs.CL

    Mitigating LLM Hallucinations via Conformal Abstention

    Authors: Yasin Abbasi Yadkori, Ilja Kuzborskij, David Stutz, András György, Adam Fisch, Arnaud Doucet, Iuliya Beloshapka, Wei-Hung Weng, Yao-Yuan Yang, Csaba Szepesvári, Ali Taylan Cemgil, Nenad Tomasev

    Abstract: We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-e… ▽ More

    Submitted 4 April, 2024; originally announced May 2024.

  4. arXiv:2403.01518  [pdf, other

    cs.CL cs.LG

    Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

    Authors: Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, Michalis K. Titsias

    Abstract: We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluation data, we here emphasize the perspective that online adaptation turns parameters into temporally ch… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  5. arXiv:2402.05878  [pdf, other

    stat.ML cs.LG

    Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

    Authors: Nicolas Nguyen, Imad Aouali, András György, Claire Vernade

    Abstract: We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is in… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  6. arXiv:2402.03928  [pdf, other

    cs.GT cs.MA

    Approximating the Core via Iterative Coalition Sampling

    Authors: Ian Gemp, Marc Lanctot, Luke Marris, Yiran Mao, Edgar Duéñez-Guzmán, Sarah Perrin, Andras Gyorgy, Romuald Elie, Georgios Piliouras, Michael Kaisers, Daniel Hennes, Kalesha Bullard, Kate Larson, Yoram Bachrach

    Abstract: The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Published in AAMAS 2024

  7. arXiv:2310.07811  [pdf, ps, other

    cs.LG stat.ML

    Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

    Authors: Gellért Weisz, András György, Csaba Szepesvári

    Abstract: We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^π$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear fun… ▽ More

    Submitted 20 December, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  8. Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

    Authors: Hazem Ibrahim, Fengyuan Liu, Rohail Asim, Balaraju Battu, Sidahmed Benabderrahmane, Bashar Alhafni, Wifag Adnan, Tuka Alhanai, Bedoor AlShebli, Riyadh Baghdadi, Jocelyn J. Bélanger, Elena Beretta, Kemal Celik, Moumena Chaqfeh, Mohammed F. Daqaq, Zaynab El Bernoussi, Daryl Fougnie, Borja Garcia de Soto, Alberto Gandolfi, Andras Gyorgy, Nizar Habash, J. Andrew Harris, Aaron Kaufman, Lefteris Kirousis, Korhan Kocak , et al. (14 additional authors not shown)

    Abstract: The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artific… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: 17 pages, 4 figures

  9. arXiv:2305.11032  [pdf, ps, other

    cs.LG stat.ML

    Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

    Authors: Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári

    Abstract: While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary. This paper proposes a simple efficie… ▽ More

    Submitted 3 December, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  10. arXiv:2302.05371  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Second-Order Method for Stochastic Bandit Convex Optimisation

    Authors: Tor Lattimore, András György

    Abstract: We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1.5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: 27 pages

  11. arXiv:2301.03236  [pdf, other

    cs.LG cs.AI math.OC

    Optimistic Meta-Gradients

    Authors: Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh

    Abstract: We study the connection between gradient-based meta-learning and convex op-timisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for meta-learning in the single task setting. While a meta-learned update rule can yield faster convergence up to constant factor, it is not sufficient fo… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  12. arXiv:2212.12532  [pdf, other

    cs.LG

    Generalization Bounds for Few-Shot Transfer Learning with Pretrained Classifiers

    Authors: Tomer Galanti, András György, Marcus Hutter

    Abstract: We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer a theoretical expl… ▽ More

    Submitted 16 July, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2112.15121

  13. arXiv:2212.03319  [pdf, other

    cs.LG cs.AI

    Understanding Self-Predictive Learning for Reinforcement Learning

    Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

    Abstract: We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirabl… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  14. arXiv:2210.15755  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

    Authors: Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

    Abstract: We consider approximate dynamic programming in $γ$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with t… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  15. arXiv:2205.13170  [pdf, other

    cs.LG stat.ML

    Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

    Authors: Sanae Amani, Tor Lattimore, András György, Lin F. Yang

    Abstract: We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $Ω(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. W… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  16. arXiv:2202.13001  [pdf, other

    cs.LG stat.ML

    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

    Authors: MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

    Abstract: We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M\le K$, the learner aims to compete with the best subset of arms of size $M$. We design an algorithm based on a reduction to bandit submodular maximizati… ▽ More

    Submitted 18 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  17. arXiv:2201.06532  [pdf, ps, other

    cs.LG stat.ML

    A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

    Authors: Yasin Abbasi-Yadkori, Andras Gyorgy, Nevena Lazic

    Abstract: We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their dynamic regret, which is defined as the difference between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative r… ▽ More

    Submitted 8 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

  18. arXiv:2112.15121  [pdf, other

    cs.LG

    On the Role of Neural Collapse in Transfer Learning

    Authors: Tomer Galanti, András György, Marcus Hutter

    Abstract: We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. In this paper we provide an… ▽ More

    Submitted 3 January, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

  19. arXiv:2110.02195  [pdf, other

    cs.LG cs.AI stat.ML

    TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

    Authors: Gellért Weisz, Csaba Szepesvári, András György

    Abstract: We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^\star$ or (ii) the optimal action-value function $q^\star$ lie in the linear span of some features; or (iii) both… ▽ More

    Submitted 10 March, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

  20. arXiv:2106.16037  [pdf, ps, other

    cs.IT cs.LG cs.NI

    Learning to Minimize Age of Information over an Unreliable Channel with Energy Harvesting

    Authors: Elif Tugce Ceran, Deniz Gunduz, Andras Gyorgy

    Abstract: The time average expected age of information (AoI) is studied for status updates sent over an error-prone channel from an energy-harvesting transmitter with a finite-capacity battery. Energy cost of sensing new status updates is taken into account as well as the transmission energy cost better capturing practical systems. The optimal scheduling policy is first studied under the hybrid automatic re… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1902.09467

  21. arXiv:2106.08199  [pdf, other

    cs.LG cs.RO

    On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

    Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and au… ▽ More

    Submitted 1 August, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  22. arXiv:2104.01086  [pdf, other

    cs.CV cs.LG

    Defending Against Image Corruptions Through Adversarial Augmentations

    Authors: Dan A. Calian, Florian Stimberg, Olivia Wiles, Sylvestre-Alvise Rebuffi, Andras Gyorgy, Timothy Mann, Sven Gowal

    Abstract: Modern neural networks excel at image classification, yet they remain vulnerable to common image corruptions such as blur, speckle noise or fog. Recent methods that focus on this problem, such as AugMix and DeepAugment, introduce defenses that operate in expectation over a distribution of image corruptions. In contrast, the literature on $\ell_p$-norm bounded perturbations focuses on defenses agai… ▽ More

    Submitted 16 December, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  23. arXiv:2102.09774  [pdf, ps, other

    cs.IT cs.LG

    A Reinforcement Learning Approach to Age of Information in Multi-User Networks with HARQ

    Authors: Elif Tugce Ceran, Deniz Gunduz, Andras Gyorgy

    Abstract: Scheduling the transmission of time-sensitive information from a source node to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users. A long-term average resource constraint is imposed on the source, which limits the average number of transmissions. The source can transmit only to a single user at… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  24. arXiv:2102.07140  [pdf, other

    cs.LG cs.CR stat.ML

    Perceptually Constrained Adversarial Attacks

    Authors: Muhammad Zaid Hameed, Andras Gyorgy

    Abstract: Motivated by previous observations that the usually applied $L_p$ norms ($p=1,2,\infty$) do not capture the perceptual quality of adversarial examples in image classification, we propose to replace these norms with the structural similarity index (SSIM) measure, which was developed originally to measure the perceptual similarity of images. Through extensive experiments with adversarially trained c… ▽ More

    Submitted 14 February, 2021; originally announced February 2021.

  25. arXiv:2012.00708  [pdf, other

    stat.ML cs.CL cs.LG

    Mutual Information Constraints for Monte-Carlo Objectives

    Authors: Gábor Melis, András György, Phil Blunsom

    Abstract: A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the… ▽ More

    Submitted 9 May, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: 32 pages, 29 figures

  26. arXiv:2010.06022  [pdf, ps, other

    cs.LG math.OC stat.ML

    Adapting to Delays and Data in Adversarial Multi-Armed Bandits

    Authors: András György, Pooria Joulani

    Abstract: We consider the adversarial multi-armed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step-size using only information (about the losses and delays) available at the time of the decisions, and obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses. First, through a remarkably simple proof… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  27. arXiv:2009.12228  [pdf, ps, other

    math.OC cs.LG stat.ML

    Mirror Descent and the Information Ratio

    Authors: Tor Lattimore, András György

    Abstract: We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directe… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  28. arXiv:2006.10460  [pdf, other

    cs.LG stat.ML

    Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

    Authors: Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

    Abstract: We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies. We propose a new method to compute a lower bound on the value of an arbitrary target policy given some logged data in contextual bandits for a… ▽ More

    Submitted 21 March, 2022; v1 submitted 18 June, 2020; originally announced June 2020.

  29. arXiv:2006.02119  [pdf, other

    stat.ML cs.LG

    Non-Stationary Delayed Bandits with Intermediate Observations

    Authors: Claire Vernade, Andras Gyorgy, Timothy Mann

    Abstract: Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. While mitigating the effects of delays in learning is well-understood in stationary environments, the problem becomes much more challenging when the environment changes. In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about th… ▽ More

    Submitted 11 August, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: 18 pages, 17 figures, ICML 2020

  30. arXiv:1910.05558  [pdf, other

    cs.LO cs.FL

    Minimal Assumptions Refinement for GR(1) Specifications

    Authors: Davide G. Cavezza, Dalal Alrajeh, Andras Gyorgy

    Abstract: Reactive synthesis is concerned with finding a correct-by-construction controller from formal specifications, typically expressed in Linear Temporal Logic (LTL). The specifications describe assumptions about an environment and guarantees to be achieved by the controller operating in that environment. If a controller exists, given the assumptions, the specification is said to be realizable. This pa… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

  31. arXiv:1905.03030  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-learning of Sequential Strategies

    Authors: Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

    Abstract: In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: DeepMind Technical Report (15 pages, 6 figures). Version V1.1

  32. arXiv:1903.02380  [pdf, other

    cs.LG stat.ML

    Detecting Overfitting via Adversarial Examples

    Authors: Roman Werpachowski, András György, Csaba Szepesvári

    Abstract: The repeated community-wide reuse of test sets in popular benchmark problems raises doubts about the credibility of reported test-error rates. Verifying whether a learned model is overfitted to a test set is challenging as independent test sets drawn from the same data distribution are usually unavailable, while other test sets may introduce a distribution shift. We propose a new hypothesis test t… ▽ More

    Submitted 14 November, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: 17 pages

    Journal ref: Part of: Advances in Neural Information Processing Systems 32 (NIPS 2019) pre-proceedings

  33. Degenerate Feedback Loops in Recommender Systems

    Authors: Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli

    Abstract: Machine learning is used extensively in recommender systems deployed in products. The decisions made by these systems can influence user beliefs and preferences which in turn affect the feedback the learning system receives - thus creating a feedback loop. This phenomenon can give rise to the so-called "echo chambers" or "filter bubbles" that have user and societal implications. In this paper, we… ▽ More

    Submitted 27 March, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Journal ref: Proceedings of AAAI/ACM Conference on AI, Ethics, and Society, Honolulu, HI, USA, January 27-28, 2019 (AIES '19)

  34. arXiv:1902.10674  [pdf, other

    cs.LG cs.CR stat.ML

    The Best Defense Is a Good Offense: Adversarial Attacks to Avoid Modulation Detection

    Authors: Muhammad Zaid Hameed, Andras Gyorgy, Deniz Gunduz

    Abstract: We consider a communication scenario, in which an intruder tries to determine the modulation scheme of the intercepted signal. Our aim is to minimize the accuracy of the intruder, while guaranteeing that the intended receiver can still recover the underlying message with the highest reliability. This is achieved by perturbing channel input symbols at the encoder, similarly to adversarial attacks a… ▽ More

    Submitted 7 April, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

  35. arXiv:1902.09467  [pdf, ps, other

    eess.SP cs.IT cs.NI cs.SI

    Reinforcement Learning to Minimize Age of Information with an Energy Harvesting Sensor with HARQ and Sensing Cost

    Authors: Elif Tuğçe Ceran, Deniz Gündüz, András György

    Abstract: The time average expected age of information (AoI) is studied for status updates sent from an energy-harvesting transmitter with a finite-capacity battery. The optimal scheduling policy is first studied under different feedback mechanisms when the channel and energy harvesting statistics are known. For the case of unknown environments, an average-cost reinforcement learning algorithm is proposed t… ▽ More

    Submitted 24 January, 2019; originally announced February 2019.

  36. arXiv:1807.09387  [pdf, other

    cs.LG stat.ML

    Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

    Authors: Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan

    Abstract: Predicting delayed outcomes is an important problem in recommender systems (e.g., if customers will finish reading an ebook). We formalize the problem as an adversarial, delayed online learning problem and consider how a proxy for the delayed outcome (e.g., if customers read a third of the book in 24 hours) can help minimize regret, even though the proxy is not available when making a prediction.… ▽ More

    Submitted 15 October, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

  37. arXiv:1807.00755  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

    Authors: Gellért Weisz, András György, Csaba Szepesvári

    Abstract: We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose Lea… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: to appear at ICML 2018

  38. arXiv:1806.03816  [pdf, ps, other

    cs.LG math.NA stat.ML

    Adaptive MCMC via Combining Local Samplers

    Authors: Kiarash Shaloudegi, András György

    Abstract: Markov chain Monte Carlo (MCMC) methods are widely used in machine learning. One of the major problems with MCMC is the question of how to design chains that mix fast over the whole state space; in particular, how to select the parameters of an MCMC algorithm. Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the w… ▽ More

    Submitted 12 July, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

  39. arXiv:1806.00336  [pdf, ps, other

    cs.LG stat.ML

    A Reinforcement Learning Approach to Age of Information in Multi-User Networks

    Authors: Elif Tuğçe Ceran, Deniz Gündüz, András György

    Abstract: Scheduling the transmission of time-sensitive data to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users under a constraint on the average number of transmissions at the source node. After each transmission, the source receives an instantaneous ACK/NACK feedback from the intended receiver and dec… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  40. arXiv:1805.03151  [pdf, other

    cs.LO

    A Weakness Measure for GR(1) Formulae

    Authors: Davide G. Cavezza, Dalal Alrajeh, András György

    Abstract: In spite of the theoretical and algorithmic developments for system synthesis in recent years, little effort has been dedicated to quantifying the quality of the specifications used for synthesis. When dealing with unrealizable specifications, finding the weakest environment assumptions that would ensure realizability is typically a desirable property; in such context the weakness of the assumptio… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

    Comments: To appear in FM2018 proceedings

  41. arXiv:1802.03041  [pdf, ps, other

    stat.ML cs.CR cs.LG

    Detection of Adversarial Training Examples in Poisoning Attacks through Anomaly Detection

    Authors: Andrea Paudice, Luis Muñoz-González, Andras Gyorgy, Emil C. Lupu

    Abstract: Machine learning has become an important component for many systems and applications including computer vision, spam filtering, malware and network intrusion detection, among others. Despite the capabilities of machine learning algorithms to extract valuable information from data and produce accurate predictions, it has been shown that these algorithms are vulnerable to attacks. Data poisoning is… ▽ More

    Submitted 8 February, 2018; originally announced February 2018.

    Comments: 10 pages, 3 figures

  42. arXiv:1712.07084  [pdf, ps, other

    eess.SY cs.IT cs.NI

    A Reinforcement-Learning Approach to Proactive Caching in Wireless Networks

    Authors: Samuel O. Somuyiwa, Andras Gyorgy, Deniz Gunduz

    Abstract: We consider a mobile user accessing contents in a dynamic environment, where new contents are generated over time (by the user's contacts), and remain relevant to the user for random lifetimes. The user, equipped with a finite-capacity cache memory, randomly accesses the system, and requests all the relevant contents at the time of access. The system incurs an energy cost associated with the numbe… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  43. arXiv:1710.04971  [pdf, ps, other

    cs.IT

    Average Age of Information with Hybrid ARQ under a Resource Constraint

    Authors: Elif Tugce Ceran, Deniz Gunduz, Andras Gyorgy

    Abstract: Scheduling of the transmission of status updates over an error-prone communication channel is studied in order to minimize the long-term average age of information (AoI) at the destination, under an average resource constraint at the source node, which limits the average number of transmissions. After each transmission, the source receives an instantaneous ACK/NACK feedback, and decides on the nex… ▽ More

    Submitted 5 December, 2018; v1 submitted 1 October, 2017; originally announced October 2017.

  44. arXiv:1709.02726  [pdf, other

    cs.LG math.OC stat.ML

    A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

    Authors: Pooria Joulani, András György, Csaba Szepesvári

    Abstract: Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this paper we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror De… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

    Comments: Accepted to The 28th International Conference on Algorithmic Learning Theory (ALT 2017). 40 pages

  45. arXiv:1702.06382  [pdf, ps, other

    cs.NI cs.IT math.OC

    Energy-Efficient Wireless Content Delivery with Proactive Caching

    Authors: Samuel O. Somuyiwa, András György, Deniz Gündüz

    Abstract: We propose an intelligent proactive content caching scheme to reduce the energy consumption in wireless downlink. We consider an online social network (OSN) setting where new contents are generated over time, and remain \textit{relevant} to the user for a random lifetime. Contents are downloaded to the user equipment (UE) through a time-varying wireless channel at an energy cost that depends on th… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: 6 pages, 2 figures. Submitted, under review

  46. arXiv:1702.03040  [pdf, other

    cs.LG

    Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

    Authors: Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

    Abstract: The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved. In this paper we ask whether there are other "lucky" settings when FTL achieves sublinear, "small" regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact… ▽ More

    Submitted 9 February, 2017; originally announced February 2017.

  47. arXiv:1610.09491  [pdf, ps, other

    cs.LG

    SDP Relaxation with Randomized Rounding for Energy Disaggregation

    Authors: Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu

    Abstract: We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find… ▽ More

    Submitted 29 October, 2016; originally announced October 2016.

  48. arXiv:1609.07087  [pdf, other

    cs.LG stat.ML

    (Bandit) Convex Optimization with Biased Noisy Gradient Oracles

    Authors: Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

    Abstract: Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of ``noise'' in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In t… ▽ More

    Submitted 4 July, 2020; v1 submitted 22 September, 2016; originally announced September 2016.

  49. arXiv:1609.01872  [pdf, ps, other

    stat.ML cs.LG

    Chaining Bounds for Empirical Risk Minimization

    Authors: Gábor Balázs, András György, Csaba Szepesvári

    Abstract: This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded. The bound applies to many loss functions besides the squared loss, and scales only with the sub-Gaussian or subexponential parameters without further statistical assumptions such as the bo… ▽ More

    Submitted 7 September, 2016; originally announced September 2016.

  50. arXiv:1510.08108  [pdf, ps, other

    stat.ML cs.LG

    Online Learning with Gaussian Payoffs and Side Observations

    Authors: Yifan Wu, András György, Csaba Szepesvári

    Abstract: We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one act… ▽ More

    Submitted 27 October, 2015; originally announced October 2015.