Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 194 results for author: Sugiyama, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.16168  [pdf, other

    cs.LG stat.ML

    Multi-Player Approaches for Dueling Bandits

    Authors: Or Raveh, Junya Honda, Masashi Sugiyama

    Abstract: Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  2. arXiv:2402.18805  [pdf, other

    cs.SI stat.ML

    VEC-SBM: Optimal Community Detection with Vectorial Edges Covariates

    Authors: Guillaume Braun, Masashi Sugiyama

    Abstract: Social networks are often associated with rich side information, such as texts and images. While numerous methods have been developed to identify communities from pairwise interactions, they usually ignore such side information. In this work, we study an extension of the Stochastic Block Model (SBM), a widely used statistical framework for community detection, that integrates vectorial edges covar… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  3. arXiv:2310.00539  [pdf, other

    stat.ML cs.LG

    Thompson Exploration with Best Challenger Rule in Best Arm Identification

    Authors: Jongyeong Lee, Junya Honda, Masashi Sugiyama

    Abstract: This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving an optimization problem at every round and/or are forced to explore an arm at least a certain number of times except those restricted to the Gaussian model. To… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: TBA ACML2023, 49pages

  4. arXiv:2308.10238  [pdf, other

    cs.LG stat.ML

    Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

    Authors: Shintaro Nakamura, Masashi Sugiyama

    Abstract: We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given $d$ stochastic arms, and the reward of each arm $s\in\{1, \ldots, d\}$ follows an unknown distribution with mean $μ_s$. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action}… ▽ More

    Submitted 15 November, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

  5. arXiv:2302.14407  [pdf, other

    cs.LG math.ST stat.ML

    The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models

    Authors: Jongyeong Lee, Chao-Kai Chiang, Masashi Sugiyama

    Abstract: Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models in the classical stochastic multi-armed bandit problems. Nonetheless, its optimality is often restricted to specific priors due to the common observation that TS is fairly insensitive to the choice of the prior when it comes to asymptotic regret bounds. H… ▽ More

    Submitted 12 December, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 55 pages, TBA AAAI2024

  6. arXiv:2302.02552  [pdf, other

    cs.LG stat.ML

    Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

    Authors: Yu-Jie Zhang, Zhen-Yu Zhang, Peng Zhao, Masashi Sugiyama

    Abstract: Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the covariate shift, where the input distributions of data change from training to testing stages while the input-conditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario -- continuous covariate shift -- in which t… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  7. arXiv:2302.01544  [pdf, other

    cs.LG math.ST stat.ML

    Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

    Authors: Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama

    Abstract: In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 49 pages, a preprint

  8. arXiv:2209.15338  [pdf, other

    stat.ML cs.LG

    Many-body Approximation for Non-negative Tensors

    Authors: Kazu Ghalamkari, Mahito Sugiyama, Yoshinobu Kawahara

    Abstract: We present an alternative approach to decompose non-negative tensors, called many-body approximation. Traditional decomposition methods assume low-rankness in the representation, resulting in difficulties in global optimization and target rank selection. We avoid these problems by energy-based modeling of tensors, where a tensor and its mode correspond to a probability distribution and a random va… ▽ More

    Submitted 30 October, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: 25 pages, 14 figures

    ACM Class: I.2.6

  9. arXiv:2207.02121  [pdf, other

    cs.LG stat.ML

    Adapting to Online Label Shift with Provable Guarantees

    Authors: Yong Bai, Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama, Zhi-Hua Zhou

    Abstract: The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an init… ▽ More

    Submitted 14 January, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2022; the first two authors contributed equally

  10. arXiv:2206.03019  [pdf, other

    cs.LG stat.ML

    The Survival Bandit Problem

    Authors: Charles Riou, Junya Honda, Masashi Sugiyama

    Abstract: We introduce and study a new variant of the multi-armed bandit problem (MAB), called the survival bandit problem (S-MAB). While in both problems, the objective is to maximize the so-called cumulative reward, in this new variant, the procedure is interrupted if the cumulative reward falls below a preset threshold. This simple yet unexplored extension of the MAB follows from many practical applicati… ▽ More

    Submitted 6 January, 2024; v1 submitted 7 June, 2022; originally announced June 2022.

  11. arXiv:2206.01606  [pdf, ps, other

    stat.ML cs.LG

    Excess risk analysis for epistemic uncertainty with application to variational inference

    Authors: Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

    Abstract: Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis… ▽ More

    Submitted 11 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  12. arXiv:2205.12904  [pdf, other

    cs.LG stat.ML

    Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel

    Authors: Ryuichi Kanoh, Mahito Sugiyama

    Abstract: A soft tree is an actively studied variant of a decision tree that updates splitting rules using the gradient method. Although soft trees can take various architectures, their impact is not theoretically well known. In this paper, we formulate and analyze the Neural Tangent Kernel (NTK) induced by soft tree ensembles for arbitrary tree architectures. This kernel leads to the remarkable finding tha… ▽ More

    Submitted 7 February, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted to ICLR 2023

  13. arXiv:2204.07415  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Universal approximation property of invertible neural networks

    Authors: Isao Ishikawa, Takeshi Teshima, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama

    Abstract: Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a q… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: This paper extends our previous work of the following two papers: "Coupling-based invertible neural networks are universal diffeomorphism approximators" [arXiv:2006.11469] (published as a conference paper in NeurIPS 2020) and "Universal approximation property of neural ordinary differential equations" [arXiv:2012.02414] (presented at DiffGeo4DL Workshop in NeurIPS 2020)

  14. arXiv:2202.00395  [pdf, other

    cs.LG stat.ML

    Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification

    Authors: Takashi Ishida, Ikko Yamane, Nontawat Charoenphakdee, Gang Niu, Masashi Sugiyama

    Abstract: There is a fundamental limitation in the prediction performance that a machine learning model can achieve due to the inevitable uncertainty of the prediction target. In classification problems, this can be characterized by the Bayes error, which is the best achievable error with any classifier. The Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance and… ▽ More

    Submitted 13 March, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: ICLR 2023 (notable-top-5%)

  15. arXiv:2112.10157  [pdf, other

    cs.LG stat.ML

    Rethinking Importance Weighting for Transfer Learning

    Authors: Nan Lu, Tianyi Zhang, Tongtong Fang, Takeshi Teshima, Masashi Sugiyama

    Abstract: A key assumption in supervised learning is that training and test data follow the same probability distribution. However, this fundamental assumption is not always satisfied in practice, e.g., due to changing environments, sample selection bias, privacy concerns, or high labeling costs. Transfer learning (TL) relaxes this assumption and allows us to learn under distribution shift. Classical TL met… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

  16. arXiv:2110.12595  [pdf, other

    stat.ML cs.LG

    Fast Rank-1 NMF for Missing Data with KL Divergence

    Authors: Kazu Ghalamkari, Mahito Sugiyama

    Abstract: We propose a fast non-gradient-based method of rank-1 non-negative matrix factorization (NMF) for missing data, called A1GM, that minimizes the KL divergence from an input matrix to the reconstructed rank-1 matrix. Our method is based on our new finding of an analytical closed-formula of the best rank-1 non-negative multiple matrix factorization (NMMF), a variety of NMF. NMMF is known to exactly s… ▽ More

    Submitted 18 February, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

    Comments: 16 pages, 5 figures, accepted to the 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022)

    ACM Class: I.2.6

  17. arXiv:2109.04983  [pdf, other

    cs.LG stat.ML

    A Neural Tangent Kernel Perspective of Infinite Tree Ensembles

    Authors: Ryuichi Kanoh, Mahito Sugiyama

    Abstract: In practical situations, the tree ensemble is one of the most popular models along with neural networks. A soft tree is a variant of a decision tree. Instead of using a greedy method for searching splitting rules, the soft tree is trained using a gradient method in which the entire splitting operation is formulated in a differentiable form. Although ensembles of such soft trees have been used incr… ▽ More

    Submitted 21 March, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted to ICLR 2022

  18. arXiv:2107.08135  [pdf, other

    stat.ML cs.LG

    Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences

    Authors: Ikko Yamane, Junya Honda, Florian Yger, Masashi Sugiyama

    Abstract: Ordinary supervised learning is useful when we have paired training data of input $X$ and output $Y$. However, such paired data can be difficult to collect in practice. In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two… ▽ More

    Submitted 17 July, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: ICML 2021 version with correction to Figure 1 and the appendices

  19. arXiv:2106.08864  [pdf, other

    cs.LG stat.ML

    Multi-Class Classification from Single-Class Data with Confidences

    Authors: Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

    Abstract: Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 23 pages, 1 figure

  20. arXiv:2106.05010  [pdf, ps, other

    stat.ML cs.LG

    Loss function based second-order Jensen inequality and its application to particle variational inference

    Authors: Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

    Abstract: Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure… ▽ More

    Submitted 9 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

  21. arXiv:2105.14900  [pdf, other

    cs.LG cs.AI stat.ML

    A unified view of likelihood ratio and reparameterization gradients

    Authors: Paavo Parmas, Masashi Sugiyama

    Abstract: Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature. We use a first principles approach to explain that LR and RP are alternative methods of keeping track of the movement of prob… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: AISTATS2021; Earlier paper was split in two (arXiv:1910.06419). Refer to the current paper for the unified view, but see the earlier paper for discussion on an importance sampling technique

    Journal ref: In International Conference on Artificial Intelligence and Statistics (pp. 4078-4086). PMLR (2021, March)

  22. arXiv:2103.13569  [pdf, other

    cs.LG stat.ML

    Approximating Instance-Dependent Noise via Instance-Confidence Embedding

    Authors: Yivan Zhang, Masashi Sugiyama

    Abstract: Label noise in multiclass classification is a major obstacle to the deployment of learning systems. However, unlike the widely used class-conditional noise (CCN) assumption that the noisy label is independent of the input feature given the true label, label noise in real-world datasets can be aleatory and heavily dependent on individual instances. In this work, we investigate the instance-dependen… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  23. arXiv:2103.07084  [pdf, other

    stat.ML cs.AI cs.LG

    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information

    Authors: Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

    Abstract: Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an app… ▽ More

    Submitted 12 April, 2022; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: 35 pages

  24. arXiv:2103.03466  [pdf, other

    cs.LG stat.ML

    Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

    Authors: Ryuichi Kanoh, Mahito Sugiyama

    Abstract: A multiplicative constant scaling factor is often applied to the model output to adjust the dynamics of neural network parameters. This has been used as one of the key interventions in an empirical study of lazy and active behavior. However, we show that the combination of such scaling and a commonly used adaptive learning rate optimizer strongly affects the training behavior of the neural network… ▽ More

    Submitted 2 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  25. arXiv:2103.02898  [pdf, other

    stat.ML cs.LG

    Fast Tucker Rank Reduction for Non-Negative Tensors Using Mean-Field Approximation

    Authors: Kazu Ghalamkari, Mahito Sugiyama

    Abstract: We present an efficient low-rank approximation algorithm for non-negative tensors. The algorithm is derived from our two findings: First, we show that rank-1 approximation for tensors can be viewed as a mean-field approximation by treating each tensor as a probability distribution. Second, we theoretically provide a sufficient condition for distribution parameters to reduce Tucker ranks of tensors… ▽ More

    Submitted 23 October, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: 19 pages, 4 figures, accepted to the 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021)

    ACM Class: I.2.6

  26. arXiv:2103.02893  [pdf, other

    stat.ML cs.LG

    Lower-Bounded Proper Losses for Weakly Supervised Classification

    Authors: Shuhei M. Yoshida, Takashi Takenouchi, Masashi Sugiyama

    Abstract: This paper discusses the problem of weakly supervised classification, in which instances are given weak labels that are produced by some label-corruption process. The goal is to derive conditions under which loss functions for weak-label learning are proper and lower-bounded -- two essential requirements for the losses used in class-probability estimation. To this end, we derive a representation t… ▽ More

    Submitted 11 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: ICML2021 camera ready, code available at https://github.com/yoshum/lower-bounded-proper-losses

  27. arXiv:2103.00719  [pdf, ps, other

    cs.LG cs.AI stat.ML

    LocalDrop: A Hybrid Regularization for Deep Neural Networks

    Authors: Ziqing Lu, Chang Xu, Bo Du, Takashi Ishida, Lefei Zhang, Masashi Sugiyama

    Abstract: In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been dev… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

  28. arXiv:2103.00136  [pdf, other

    cs.LG stat.ML

    Incorporating Causal Graphical Prior Knowledge into Predictive Modeling via Simple Data Augmentation

    Authors: Takeshi Teshima, Masashi Sugiyama

    Abstract: Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions. When a CG is available, e.g., from the domain knowledge, we can infer the conditional independence (CI) relations that should hold in the data distribution. However, it is not straightforward how to incorporate this knowledge into predictive modeling. In this work, we pr… ▽ More

    Submitted 17 August, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: 29 pages, 5 figures, 2 tables. Camera-ready version of the paper accepted at the Thirty-seventh Conference on Uncertainty in Artificial Intelligence (UAI 2021)

  29. arXiv:2102.06879  [pdf, other

    stat.ML cs.LG

    Learning from Similarity-Confidence Data

    Authors: Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

    Abstract: Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data. In this paper, we investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data, where we aim to learn an effective binary classifier from only unlabeled data pairs equipped with confidence that illustrates th… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Comments: 33 pages, 5 figures

  30. arXiv:2102.02414  [pdf, other

    stat.ML cs.LG

    Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization

    Authors: Yivan Zhang, Gang Niu, Masashi Sugiyama

    Abstract: Many weakly supervised classification methods employ a noise transition matrix to capture the class-conditional label corruption. To estimate the transition matrix from noisy data, existing methods often need to estimate the noisy class-posterior, which could be unreliable due to the overconfidence of neural networks. In this work, we propose a theoretically grounded method that can estimate the n… ▽ More

    Submitted 14 June, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: ICML 2021 camera-ready version

  31. arXiv:2102.00678  [pdf, other

    cs.LG stat.ML

    Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

    Authors: Nan Lu, Shida Lei, Gang Niu, Issei Sato, Masashi Sugiyama

    Abstract: To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing… ▽ More

    Submitted 11 June, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: ICML2021 camera-ready version

  32. arXiv:2101.01366  [pdf, other

    stat.ML cs.LG

    A Symmetric Loss Perspective of Reliable Machine Learning

    Authors: Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama

    Abstract: When minimizing the empirical risk in binary classification, it is a common practice to replace the zero-one loss with a surrogate loss to make the learning objective feasible to optimize. Examples of well-known surrogate losses for binary classification include the logistic loss, hinge loss, and sigmoid loss. It is known that the choice of a surrogate loss can highly influence the performance of… ▽ More

    Submitted 5 June, 2023; v1 submitted 5 January, 2021; originally announced January 2021.

    Comments: Invited article preprint

  33. arXiv:2012.15584  [pdf, other

    cs.LG cs.DM cs.DS cs.SI stat.ML

    Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation

    Authors: Yuko Kuroki, Junya Honda, Masashi Sugiyama

    Abstract: Combinatorial optimization is one of the fundamental research fields that has been extensively studied in theoretical computer science and operations research. When developing an algorithm for combinatorial optimization, it is commonly assumed that parameters such as edge weights are exactly known as inputs. However, this assumption may not be fulfilled since input parameters are often uncertain o… ▽ More

    Submitted 29 August, 2023; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Preprint of an Invited Review Article, In Fields Institute

  34. arXiv:2011.09172  [pdf, other

    stat.ML cs.LG

    On Focal Loss for Class-Posterior Probability Estimation: A Theoretical Perspective

    Authors: Nontawat Charoenphakdee, Jayakorn Vongkulbhisal, Nuttapong Chairatanakul, Masashi Sugiyama

    Abstract: The focal loss has demonstrated its effectiveness in many real-world applications such as object detection and image classification, but its theoretical understanding has been limited so far. In this paper, we first prove that the focal loss is classification-calibrated, i.e., its minimizer surely yields the Bayes-optimal classifier and thus the use of the focal loss in classification can be theor… ▽ More

    Submitted 13 December, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: 57 pages

  35. arXiv:2010.11748  [pdf, other

    stat.ML cs.LG

    Classification with Rejection Based on Cost-sensitive Classification

    Authors: Nontawat Charoenphakdee, Zhenghang Cui, Yivan Zhang, Masashi Sugiyama

    Abstract: The goal of classification with rejection is to avoid risky misclassification in error-critical applications such as medical diagnosis and product inspection. In this paper, based on the relationship between classification with rejection and cost-sensitive classification, we propose a novel method of classification with rejection by learning an ensemble of cost-sensitive classifiers, which satisfi… ▽ More

    Submitted 29 September, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 40 pages. Added the discussion of the recent work by Gangrade et al. (2021) at the end of Section 3.4, where the idea of constructing cost-sensitive classifiers for classification with rejection has also been explored in a different framework of classification with rejection (where the goal is not minimizing the 0-1-c risk as in our paper)

  36. arXiv:2010.11415  [pdf, other

    cs.LG stat.ML

    Maximum Mean Discrepancy Test is Aware of Adversarial Attacks

    Authors: Ruize Gao, Feng Liu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Masashi Sugiyama

    Abstract: The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets. However, it has been shown that the MMD test is unaware of adversarial attacks -- the MMD test failed to detect the discrepancy between natural and adversarial data. Given this phenomenon, we raise a question: are natural and adversarial data really from different distributions? T… ▽ More

    Submitted 11 July, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

  37. arXiv:2010.10181  [pdf, other

    stat.ML cs.AI cs.LG

    Robust Imitation Learning from Noisy Demonstrations

    Authors: Voot Tangkaratt, Nontawat Charoenphakdee, Masashi Sugiyama

    Abstract: Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning. In this paper, we first theoretically show that robust imitation learning can be achieved by optimizing a classification risk with a symmetric loss. Based on this theoretical finding, we then propose a new imitation learning method that optimizes the classification risk by effectively com… ▽ More

    Submitted 19 February, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 16 pages, 9 figures. Accepted to AISTATS 2021

  38. arXiv:2010.01875  [pdf, other

    cs.LG stat.ML

    Pointwise Binary Classification with Pairwise Confidence Comparisons

    Authors: Lei Feng, Senlin Shu, Nan Lu, Bo Han, Miao Xu, Gang Niu, Bo An, Masashi Sugiyama

    Abstract: To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed. Among them, some consider using pairwise but not pointwise labels, when pointwise labels are not accessible due to privacy, confidentiality, or security reasons. However, as a pairwise label denotes whether or not two data points share a… ▽ More

    Submitted 13 January, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to ICML 2021

  39. arXiv:2007.08929  [pdf, other

    cs.LG stat.ML

    Provably Consistent Partial-Label Learning

    Authors: Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama

    Abstract: Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then… ▽ More

    Submitted 23 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020 camera-ready version

  40. arXiv:2007.04043  [pdf, ps, other

    cs.LG stat.ML

    A One-step Approach to Covariate Shift Adaptation

    Authors: Tianyi Zhang, Ikko Yamane, Nan Lu, Masashi Sugiyama

    Abstract: A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. However, such an assumption is often violated in the real world due to non-stationarity of the environment or bias in sample selection. In this work, we consider a prevalent setting called covariate shift, where the input distribution differs between the t… ▽ More

    Submitted 3 May, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

  41. arXiv:2007.02235  [pdf, other

    cs.LG stat.ML

    Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

    Authors: Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama

    Abstract: In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning w… ▽ More

    Submitted 21 August, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

    Comments: Accepted at ICML 2020

  42. arXiv:2006.15815  [pdf, other

    cs.LG stat.ML

    Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

    Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

    Abstract: Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the most popular stochastic optimizer for accelerating the training of deep neural networks. However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framewo… ▽ More

    Submitted 14 June, 2022; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML2022, Long Oral Presentation, 30 pages, 14 figures, Key Words: Deep Learning Theory, Optimization, Adam, Adaptive Inertia, Flat Minima

  43. arXiv:2006.13642  [pdf, other

    cs.LG cs.DS cs.SI stat.ML

    Online Dense Subgraph Discovery via Blurred-Graph Feedback

    Authors: Yuko Kuroki, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama

    Abstract: Dense subgraph discovery aims to find a dense component in edge-weighted graphs. This is a fundamental graph-mining task with a variety of applications and thus has received much attention recently. Although most existing methods assume that each individual edge weight is easily obtained, such an assumption is not necessarily valid in practice. In this paper, we introduce a novel learning problem… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: ICML2020

  44. arXiv:2006.11942  [pdf, other

    stat.ML cs.LG

    Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent

    Authors: Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama

    Abstract: In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the challenge. However, no theoretical guarantees have been proven yet. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime. This framework comprises closed form expression of the model through tasks… ▽ More

    Submitted 4 December, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

  45. arXiv:2006.11469  [pdf, other

    cs.LG cs.NE math.CA math.DG stat.ML

    Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

    Authors: Takeshi Teshima, Isao Ishikawa, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama

    Abstract: Invertible neural networks based on coupling flows (CF-INNs) have various machine learning applications such as image synthesis and representation learning. However, their desirable characteristics such as analytic invertibility come at the cost of restricting the functional forms. This poses a question on their representation power: are CF-INNs universal approximators for invertible functions? Wi… ▽ More

    Submitted 3 November, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 29 pages, 3 figures. Accepted at Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) for oral presentation

  46. arXiv:2006.09668  [pdf, other

    stat.ML cs.LG

    Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

    Authors: Taira Tsuchiya, Junya Honda, Masashi Sugiyama

    Abstract: We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the p… ▽ More

    Submitted 10 June, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Published version in NeurIPS 2020 (https://proceedings.neurips.cc/paper/2020/hash/649d45bf179296e31731adfd4df25588-Abstract.html), 39 pages, 4 figures

  47. arXiv:2006.08982  [pdf, other

    stat.ML cs.LG

    Additive Poisson Process: Learning Intensity of Higher-Order Interaction in Stochastic Processes

    Authors: Simon Luo, Feng Zhou, Lamiae Azizi, Mahito Sugiyama

    Abstract: We present the Additive Poisson Process (APP), a novel framework that can model the higher-order interaction effects of the intensity functions in stochastic processes using lower dimensional projections. Our model combines the techniques in information geometry to model higher-order interactions on a statistical manifold and in generalized additive models to use lower-dimensional projections to o… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: 14 pages, 8 figures, pre-print

  48. arXiv:2006.08306  [pdf, other

    cs.LG stat.ML

    LFD-ProtoNet: Prototypical Network Based on Local Fisher Discriminant Analysis for Few-shot Learning

    Authors: Kei Mukaiyama, Issei Sato, Masashi Sugiyama

    Abstract: The prototypical network (ProtoNet) is a few-shot learning framework that performs metric learning and classification using the distance to prototype representations of each class. It has attracted a great deal of attention recently since it is simple to implement, highly extensible, and performs well in experiments. However, it only takes into account the mean of the support vectors as prototypes… ▽ More

    Submitted 25 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 20 pages

    MSC Class: 68T01(Primary); 68T05(Secondary)

  49. arXiv:2006.07836  [pdf, other

    cs.LG stat.ML

    Part-dependent Label Noise: Towards Instance-dependent Label Noise

    Authors: Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, Masashi Sugiyama

    Abstract: Learning with the \textit{instance-dependent} label noise is challenging, because it is hard to model such real-world noise. Note that there are psychological and physiological evidences showing that we humans perceive instances by decomposing them into parts. Annotators are therefore more likely to annotate instances based on the parts rather than the whole instances, where a wrong mapping from p… ▽ More

    Submitted 2 December, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

  50. arXiv:2006.07805  [pdf, other

    cs.LG stat.ML

    Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning

    Authors: Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, Masashi Sugiyama

    Abstract: The transition matrix, denoting the transition relationship from clean labels to noisy labels, is essential to build statistically consistent classifiers in label-noise learning. Existing methods for estimating the transition matrix rely heavily on estimating the noisy class posterior. However, the estimation error for noisy class posterior could be large due to the randomness of label noise, whic… ▽ More

    Submitted 23 June, 2021; v1 submitted 14 June, 2020; originally announced June 2020.