Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 60 results for author: Wen, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2403.02233  [pdf, other

    cs.LG math.OC stat.ML

    How Transformers Learn Diverse Attention Correlations in Masked Vision Pretraining

    Authors: Yu Huang, Zixin Wen, Yuejie Chi, Yingbin Liang

    Abstract: Masked reconstruction, which predicts randomly masked patches from unmasked ones, has emerged as an important approach in self-supervised pretraining. However, the theoretical understanding of masked pretraining is rather limited, especially for the foundational architecture of transformers. In this paper, to the best of our knowledge, we provide the first end-to-end theoretical guarantee of learn… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: v2 polishes writing

  2. arXiv:2310.11531  [pdf, ps, other

    cs.LG cs.AI eess.SY stat.ML

    Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach

    Authors: Dengwang Tang, Rahul Jain, Botao Hao, Zheng Wen

    Abstract: In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. We assume that the offline dataset is generated by an expert but with unknown level of competence, i.e., it is not perfect and not necessarily using the optimal policy. We show that if the learning agent models the behavioral policy (paramet… ▽ More

    Submitted 1 February, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 22 pages

    MSC Class: 93E35

  3. arXiv:2310.06713  [pdf, other

    cs.LG stat.AP

    Interpretable Traffic Event Analysis with Bayesian Networks

    Authors: Tong Yuan, Jian Yang, Zeyi Wen

    Abstract: Although existing machine learning-based methods for traffic accident analysis can provide good quality results to downstream tasks, they lack interpretability which is crucial for this critical problem. This paper proposes an interpretable framework based on Bayesian Networks for traffic accident prediction. To enable the ease of interpretability, we design a dataset construction pipeline to feed… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 11 pages, 7 figures

    MSC Class: 62F15 ACM Class: G.3

  4. arXiv:2303.10599  [pdf, ps, other

    stat.ML math.OC

    Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators

    Authors: Tianyou Li, Fan Chen, Huajie Chen, Zaiwen Wen

    Abstract: Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational i… ▽ More

    Submitted 23 March, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

  5. arXiv:2302.03319  [pdf, ps, other

    cs.LG math.ST stat.ML

    Leveraging Demonstrations to Improve Online Learning: Quality Matters

    Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

    Abstract: We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning… ▽ More

    Submitted 17 May, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  6. arXiv:2212.07632  [pdf, other

    stat.ML cs.LG

    Reinforcement Learning in Credit Scoring and Underwriting

    Authors: Seksan Kiatsupaibul, Pakawan Chansiripas, Pojtanut Manopanjasiri, Kantapong Visantavarakul, Zheng Wen

    Abstract: This paper proposes a novel reinforcement learning (RL) framework for credit underwriting that tackles ungeneralizable contextual challenges. We adapt RL principles for credit scoring, incorporating action space renewal and multi-choice actions. Our work demonstrates that the traditional underwriting approach aligns with the RL greedy strategy. We introduce two new RL-based credit underwriting alg… ▽ More

    Submitted 26 June, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

  7. arXiv:2207.06147  [pdf, other

    cs.LG cs.AI stat.ML

    A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

    Authors: Fan Chen, Junyu Zhang, Zaiwen Wen

    Abstract: As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  8. arXiv:2206.03633  [pdf, other

    cs.LG cs.AI stat.ML

    Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

    Authors: Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

    Abstract: In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been proposed for training ensembles, and conflicting views prevail with regards to the importance of various ingredients of these approaches. In this paper… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  9. arXiv:2205.06226  [pdf, other

    cs.LG stat.ML

    The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

    Authors: Zixin Wen, Yuanzhi Li

    Abstract: Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (… ▽ More

    Submitted 15 January, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: 88 pages, comments welcome

  10. arXiv:2203.01303  [pdf, other

    cs.LG stat.ML

    An Analysis of Ensemble Sampling

    Authors: Chao Qin, Zheng Wen, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem. This represents the first rigorous regret analysis of ensemble sampling and is made… ▽ More

    Submitted 1 March, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: [NeurIPS 2022 camera-ready version](https://openreview.net/forum?id=c6ibx0yl-aG) with improved regret bounds

  11. arXiv:2202.13509  [pdf, other

    stat.ML cs.AI cs.LG

    Evaluating High-Order Predictive Distributions in Deep Learning

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive distributions with inputs sampled i.i.d. from the testing distribution. With low-dimensional inputs, these methods distinguish agents that effectively estimate u… ▽ More

    Submitted 27 February, 2022; originally announced February 2022.

  12. arXiv:2110.04629  [pdf, other

    cs.LG cs.AI stat.ML

    The Neural Testbed: Evaluating Joint Predictions

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

    Abstract: Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a… ▽ More

    Submitted 1 November, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  13. arXiv:2107.09224  [pdf, ps, other

    cs.LG stat.ML

    From Predictions to Decisions: The Importance of Joint Predictive Distributions

    Authors: Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

    Abstract: A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes? Most work on supervised learning has focused on producing accurate marginal predictions for each input. However, we show that for a broad class of decision problems, accurate joint predictions are required to deliver good performance. In particular, we establish several resu… ▽ More

    Submitted 23 May, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

  14. arXiv:2107.08924  [pdf, other

    cs.LG cs.AI stat.ML

    Epistemic Neural Networks

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle, ensemble-based approaches produce effective joint predictions, but the computational costs of training large ensembles can become prohibitive. We introduce the epinet: an architecture that can supplement any… ▽ More

    Submitted 17 May, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

  15. arXiv:2106.07454  [pdf, other

    math.OC cs.AI cs.LG stat.ML

    NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

    Authors: Minghan Yang, Dong Xu, Qiwen Cui, Zaiwen Wen, Pengxiang Xu

    Abstract: In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies th… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  16. arXiv:2105.15134  [pdf, other

    cs.LG cs.CV stat.ML

    Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

    Authors: Zixin Wen, Yuanzhi Li

    Abstract: How can neural networks trained by contrastive learning extract features from the unlabeled data? Why does contrastive learning usually need much stronger data augmentations than supervised learning to ensure good representations? These questions involve both the optimization and statistical aspects of deep learning, but can hardly be answered by analyzing supervised learning, where the target fun… ▽ More

    Submitted 5 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: V3 corrected related works. Accepted to ICML2021

  17. arXiv:2105.07911  [pdf, other

    cs.CL stat.ML

    SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising

    Authors: Kuan Xu, Yongbo Wang, Yongliang Wang, Zujie Wen, Yang Dong

    Abstract: In text-to-SQL task, seq-to-seq models often lead to sub-optimal performance due to limitations in their architecture. In this paper, we present a simple yet effective approach that adapts transformer-based seq-to-seq model to robust text-to-SQL generation. Instead of inducing constraint to decoder or reformat the task as slot-filling, we propose to train seq-to-seq model with Schema aware Denoisi… ▽ More

    Submitted 30 January, 2023; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: 9 pages

  18. arXiv:2012.01780  [pdf, other

    cs.LG stat.ML

    Neural Contextual Bandits with Deep Representation and Shallow Exploration

    Authors: Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

    Abstract: We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in th… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: 28 pages, 1 figure, 1 table

  19. arXiv:2008.07353  [pdf, ps, other

    cs.LG cs.AI cs.DS stat.ML

    On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

    Authors: Wenlong Mou, Zheng Wen, Xi Chen

    Abstract: We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which a… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

  20. arXiv:2007.15788  [pdf, other

    stat.ML cs.LG

    Stochastic Low-rank Tensor Bandits for Multi-dimensional Online Decision Making

    Authors: Jie Zhou, Botao Hao, Zheng Wen, Jingfei Zhang, Will Wei Sun

    Abstract: Multi-dimensional online decision making plays a crucial role in many real applications such as online recommendation and digital marketing. In these problems, a decision at each time is a combination of choices from different types of entities. To solve it, we introduce stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low-rank tensor. We consider t… ▽ More

    Submitted 13 February, 2024; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted by Journal of the American Statistical Association

  21. arXiv:2007.04915  [pdf, other

    cs.LG stat.ML

    Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

    Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

    Abstract: We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act effic… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  22. arXiv:2006.09606  [pdf, other

    math.OC stat.ML

    Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

    Authors: Minghan Yang, Dong Xu, Hongyu Chen, Zaiwen Wen, Mengyun Chen

    Abstract: In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian informat… ▽ More

    Submitted 25 March, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

  23. arXiv:2006.07464  [pdf, other

    cs.LG math.OC stat.ML

    Hypermodels for Exploration

    Authors: Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

    Abstract: We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gain… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2020

  24. arXiv:2006.05924  [pdf, ps, other

    math.OC stat.ML

    Sketchy Empirical Natural Gradient Methods for Deep Learning

    Authors: Minghan Yang, Dong Xu, Zaiwen Wen, Mengyun Chen, Pengxiang Xu

    Abstract: In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement… ▽ More

    Submitted 25 March, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

  25. arXiv:2002.06979  [pdf, ps, other

    cs.LG stat.ML

    Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning

    Authors: Zixin Wen

    Abstract: Unsupervised contrastive learning has gained increasing attention in the latest research and has proven to be a powerful method for learning representations from unlabeled data. However, little theoretical analysis was known for this framework. In this paper, we study the optimization of deep unsupervised contrastive learning. We prove that, by applying end-to-end training that simultaneously upda… ▽ More

    Submitted 30 May, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

  26. arXiv:1911.04209  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy-Preserving Gradient Boosting Decision Trees

    Authors: Qinbin Li, Zhaomin Wu, Zeyi Wen, Bingsheng He

    Abstract: The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential pri… ▽ More

    Submitted 10 October, 2022; v1 submitted 11 November, 2019; originally announced November 2019.

  27. arXiv:1911.04206  [pdf, other

    cs.LG stat.ML

    Practical Federated Gradient Boosting Decision Trees

    Authors: Qinbin Li, Zeyi Wen, Bingsheng He

    Abstract: Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, ex… ▽ More

    Submitted 13 December, 2019; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: Accepted to AAAI-20

  28. arXiv:1911.03011  [pdf, other

    cs.LG cs.PF stat.ML

    Adaptive Kernel Value Caching for SVM Training

    Authors: Qinbin Li, Zeyi Wen, Bingsheng He

    Abstract: Support Vector Machines (SVMs) can solve structured multi-output learning problems such as multi-label classification, multiclass classification and vector regression. SVM training is expensive especially for large and high dimensional datasets. The bottleneck of the SVM training often lies in the kernel value computation. In many real-world problems, the same kernel values are used in many iterat… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  29. arXiv:1910.09373  [pdf, ps, other

    math.OC stat.ML

    A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization

    Authors: Minghan Yang, Andre Milzarek, Zaiwen Wen, Tong Zhang

    Abstract: In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems. We assume that the gradient of the smooth part of the objective function can only be approximated by stochastic oracles. The proposed method combines general stochastic higher order steps derived from an underlying proximal type fixed-point equation… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: 41 pages

    MSC Class: 90C06; 90C15; 90C26; 90C53

  30. arXiv:1907.09693  [pdf, other

    cs.LG cs.CR cs.DB stat.ML

    A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection

    Authors: Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, Bingsheng He

    Abstract: Federated learning has been a hot research topic in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning… ▽ More

    Submitted 4 December, 2021; v1 submitted 23 July, 2019; originally announced July 2019.

    Comments: Accepted to IEEE Transactions on Knowledge and Data Engineering (TKDE)

  31. arXiv:1906.05247  [pdf, other

    stat.ML cs.LG

    Bootstrapping Upper Confidence Bound

    Authors: Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

    Abstract: Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration inequalities, which thus lead to over-exploration. In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap… ▽ More

    Submitted 30 October, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Accepted by NeurIPS 2019

  32. arXiv:1904.09404  [pdf, ps, other

    cs.LG stat.ML

    Waterfall Bandits: Learning to Sell Ads Online

    Authors: Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

    Abstract: A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incu… ▽ More

    Submitted 20 April, 2019; originally announced April 2019.

  33. arXiv:1903.01083  [pdf, other

    cs.LG stat.ML

    Stochastic Online Learning with Probabilistic Graph Feedback

    Authors: Shuai Li, Wei Chen, Zheng Wen, Kwong-Sak Leung

    Abstract: We consider a problem of stochastic online learning with general probabilistic graph feedback, where each directed edge in the feedback graph has probability $p_{ij}$. Two cases are covered. (a) The one-step case, where after playing arm $i$ the learner observes a sample reward feedback of arm $j$ with independent probability $p_{ij}$. (b) The cascade case where after playing arm $i$ the learner o… ▽ More

    Submitted 21 November, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

  34. arXiv:1902.07239  [pdf, other

    stat.ML cs.LG

    Scalable Thompson Sampling via Optimal Transport

    Authors: Ruiyi Zhang, Zheng Wen, Changyou Chen, Lawrence Carin

    Abstract: Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model. However, calculating exact posterior distributions is intractable for all but the simplest models. Consequently, efficient computation of an approximate posterior distribution is a crucial problem for scalable TS with complex models, such as neural netwo… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.

    Comments: Infer to Control Workshop on Probabilistic Reinforcement Learning and Structured Control at NIPS 2018; Long version accepted by AISTATS 2019

  35. arXiv:1811.05154  [pdf, other

    cs.LG stat.ML

    Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

    Authors: Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

    Abstract: We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We an… ▽ More

    Submitted 20 June, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: Proceedings of the 36th International Conference on Machine Learning

  36. arXiv:1811.00911  [pdf, other

    cs.IR cs.LG stat.ML

    Online Diverse Learning to Rank from Partial-Click Feedback

    Authors: Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin

    Abstract: Learning to rank is an important problem in machine learning and recommender systems. In a recommender system, a user is typically recommended a list of items. Since the user is unlikely to examine the entire recommended list, partial feedback arises naturally. At the same time, diverse recommendations are important because it is challenging to model all tastes of the user in practice. In this pap… ▽ More

    Submitted 21 November, 2018; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: The first three authors contributed equally to this work. 24 pages, 4 figures, 1 table

  37. arXiv:1810.06032  [pdf, other

    math.OC stat.ML

    Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains

    Authors: Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan

    Abstract: This paper develops a low-nonnegative-rank approximation method to identify the state aggregation structure of a finite-state Markov chain under an assumption that the state space can be mapped into a handful of meta-states. The number of meta-states is characterized by the nonnegative rank of the Markov transition matrix. Motivated by the success of the nuclear norm relaxation in low rank minimiz… ▽ More

    Submitted 14 October, 2018; originally announced October 2018.

  38. arXiv:1806.00892  [pdf, other

    stat.ML cs.LG

    Conservative Exploration using Interleaving

    Authors: Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru

    Abstract: In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action. In general, this is impossible because the agent has to explore unknown actions, some of which can be bad, to learn better actions. However, when the actions are combinatorial, this may be possible if the unkn… ▽ More

    Submitted 3 June, 2018; originally announced June 2018.

  39. arXiv:1805.09793  [pdf, other

    cs.LG stat.ML

    New Insights into Bootstrapping for Bandits

    Authors: Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori

    Abstract: We investigate the use of bootstrapping in the bandit setting. We first show that the commonly used non-parametric bootstrapping (NPB) procedure can be provably inefficient and establish a near-linear lower bound on the regret incurred by it under the bandit model with Bernoulli rewards. We show that NPB with an appropriate amount of forced exploration can result in sub-linear albeit sub-optimal r… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  40. arXiv:1804.10488  [pdf, other

    cs.LG stat.ML

    Offline Evaluation of Ranking Policies with Click Models

    Authors: Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

    Abstract: Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algor… ▽ More

    Submitted 13 June, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

  41. arXiv:1803.03466  [pdf, ps, other

    math.OC stat.ML

    A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization

    Authors: Andre Milzarek, Xiantao Xiao, Shicong Cen, Zaiwen Wen, Michael Ulbrich

    Abstract: In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function. We assume that only noisy gradient and Hessian information of the smooth part of the objective function is available via calling stochastic first and second order oracles. The proposed method can be s… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

    MSC Class: 49M15; 65C60; 65K05; 90C06

  42. arXiv:1802.03692  [pdf, other

    stat.ML cs.LG

    Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

    Authors: Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie

    Abstract: Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where the reward distributions may change in a piecewise-stationary fashion at unknown time steps. We show that by incorporating a simple change-detection component wi… ▽ More

    Submitted 24 January, 2019; v1 submitted 10 February, 2018; originally announced February 2018.

  43. arXiv:1712.04644  [pdf, other

    cs.LG stat.ML

    Stochastic Low-Rank Bandits

    Authors: Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan

    Abstract: Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobs… ▽ More

    Submitted 13 December, 2017; originally announced December 2017.

  44. arXiv:1709.07172  [pdf, ps, other

    cs.LG stat.ML

    SpectralLeader: Online Spectral Learning for Single Topic Models

    Authors: Tong Yu, Branislav Kveton, Zheng Wen, Hung Bui, Ole J. Mengshoel

    Abstract: We study the problem of learning a latent variable model from a stream of data. Latent variable models are popular in practice because they can explain observed data in terms of unobserved concepts. These models have been traditionally studied in the offline setting. In the online setting, on the other hand, the online EM is arguably the most popular algorithm for learning latent variable models.… ▽ More

    Submitted 25 April, 2018; v1 submitted 21 September, 2017; originally announced September 2017.

    Comments: 17 pages, 2 figures

  45. arXiv:1703.07608  [pdf, other

    stat.ML cs.AI cs.LG

    Deep Exploration via Randomized Value Functions

    Authors: Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

    Abstract: We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through comp… ▽ More

    Submitted 23 September, 2019; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: Accepted for publication in Journal of Machine Learning Research 2019

  46. arXiv:1703.06513  [pdf, other

    cs.LG stat.ML

    Bernoulli Rank-$1$ Bandits for Click Feedback

    Authors: Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

    Abstract: The probability that a user will click a search result depends both on its relevance and its position on the results page. The position based model explains this behavior by ascribing to every item an attraction probability, and to every position an examination probability. To be clicked, a result must be both attractive and examined. The probabilities of an item-position pair being clicked thus f… ▽ More

    Submitted 19 March, 2017; originally announced March 2017.

  47. arXiv:1703.02527  [pdf, other

    cs.LG stat.ML

    Online Learning to Rank in Stochastic Click Models

    Authors: Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen

    Abstract: Online learning to rank is a core problem in information retrieval and machine learning. Many provably efficient algorithms have been recently proposed for this problem in specific click models. The click model is a model of how the user interacts with a list of documents. Though these results are significant, their impact on practice is limited, because all proposed algorithms are designed for sp… ▽ More

    Submitted 20 June, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

    Comments: Proceedings of the 34th International Conference on Machine Learning

  48. arXiv:1608.03023  [pdf, other

    cs.LG stat.ML

    Stochastic Rank-1 Bandits

    Authors: Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

    Abstract: We propose stochastic rank-$1$ bandits, a class of online learning problems where at each step a learning agent chooses a pair of row and column arms, and receives the product of their values as a reward. The main challenge of the problem is that the individual values of the row and column are unobserved. We assume that these values are stochastic and drawn independently. We propose a computationa… ▽ More

    Submitted 8 March, 2017; v1 submitted 9 August, 2016; originally announced August 2016.

    Comments: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics

  49. arXiv:1605.06593  [pdf, other

    cs.LG cs.AI cs.SI math.OC stat.ML

    Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

    Authors: Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

    Abstract: We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, an… ▽ More

    Submitted 19 June, 2018; v1 submitted 21 May, 2016; originally announced May 2016.

    Comments: Compared with the previous version, this version has fixed a mistake. This version is also consistent with the NIPS camera-ready version

    Journal ref: Z. Wen, B. Kveton, M. Valko, and S. Vaswani, "Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback", Advances in Neural Information Processing Systems 30 Proceedings, 2017

  50. arXiv:1603.05359  [pdf, ps, other

    cs.LG stat.ML

    Cascading Bandits for Large-Scale Recommendation Problems

    Authors: Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton

    Abstract: Most recommender systems recommend a list of items. The user examines the list, from the first item to the last, and often chooses the first attractive item and does not examine the rest. This type of user behavior can be modeled by the cascade model. In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items f… ▽ More

    Submitted 30 June, 2016; v1 submitted 17 March, 2016; originally announced March 2016.

    Comments: Accepted to UAI 2016