-
Rejection via Learning Density Ratios
Authors:
Alexander Soen,
Hisham Husain,
Philip Schulz,
Vu Nguyen
Abstract:
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data di…
▽ More
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ φ$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ φ$-divergences are specified by the family of $ α$-divergence. Our framework is tested empirically over clean and noisy datasets.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Confident Sinkhorn Allocation for Pseudo-Labeling
Authors:
Vu Nguyen,
Hisham Husain,
Sachin Farfade,
Anton van den Hengel
Abstract:
Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data. It has been successfully applied to structured data, such as images and natural language, by exploiting the inherent spatial and semantic structure therein with pretrained models or data augmentation. These methods are not applicable, however, when the data does not have the appropriate structure…
▽ More
Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data. It has been successfully applied to structured data, such as images and natural language, by exploiting the inherent spatial and semantic structure therein with pretrained models or data augmentation. These methods are not applicable, however, when the data does not have the appropriate structure, or invariances. Due to their simplicity, pseudo-labeling (PL) methods can be widely used without any domain assumptions. However, the greedy mechanism in PL is sensitive to a threshold and can perform poorly if wrong assignments are made due to overconfidence. This paper studies theoretically the role of uncertainty to pseudo-labeling and proposes Confident Sinkhorn Allocation (CSA), which identifies the best pseudo-label allocation via optimal transport to only samples with high confidence scores. CSA outperforms the current state-of-the-art in this practically important area of semi-supervised learning. Additionally, we propose to use the Integral Probability Metrics to extend and improve the existing PACBayes bound which relies on the Kullback-Leibler (KL) divergence, for ensemble models. Our code is publicly available at https://github.com/amzn/confident-sinkhorn-allocation.
△ Less
Submitted 5 March, 2024; v1 submitted 12 June, 2022;
originally announced June 2022.
-
Distributionally Robust Bayesian Optimization with $\varphi$-divergences
Authors:
Hisham Husain,
Vu Nguyen,
Anton van den Hengel
Abstract:
The study of robustness has received much attention due to its inevitability in data-driven settings where many systems face uncertainty. One such example of concern is Bayesian Optimization (BO), where uncertainty is multi-faceted, yet there only exists a limited number of works dedicated to this direction. In particular, there is the work of Kirschner et al. (2020), which bridges the existing li…
▽ More
The study of robustness has received much attention due to its inevitability in data-driven settings where many systems face uncertainty. One such example of concern is Bayesian Optimization (BO), where uncertainty is multi-faceted, yet there only exists a limited number of works dedicated to this direction. In particular, there is the work of Kirschner et al. (2020), which bridges the existing literature of Distributionally Robust Optimization (DRO) by casting the BO problem from the lens of DRO. While this work is pioneering, it admittedly suffers from various practical shortcomings such as finite contexts assumptions, leaving behind the main question Can one devise a computationally tractable algorithm for solving this DRO-BO problem? In this work, we tackle this question to a large degree of generality by considering robustness against data-shift in $\varphi$-divergences, which subsumes many popular choices, such as the $χ^2$-divergence, Total Variation, and the extant Kullback-Leibler (KL) divergence. We show that the DRO-BO problem in this setting is equivalent to a finite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds. We then show experimentally that our method surpasses existing methods, attesting to the theoretical results.
△ Less
Submitted 27 October, 2023; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Not another computer algebra system: Highlighting wxMaxima in calculus
Authors:
N. Karjanto,
H. S. Husain
Abstract:
This article introduces and explains a computer algebra system (CAS) wxMaxima for Calculus teaching and learning at the tertiary level. The didactic reasoning behind this approach is the need to implement an element of technology into classrooms to enhance students' understanding of Calculus concepts. For many mathematics educators who have been using CAS, this material is of great interest, parti…
▽ More
This article introduces and explains a computer algebra system (CAS) wxMaxima for Calculus teaching and learning at the tertiary level. The didactic reasoning behind this approach is the need to implement an element of technology into classrooms to enhance students' understanding of Calculus concepts. For many mathematics educators who have been using CAS, this material is of great interest, particularly for secondary teachers and university instructors who plan to introduce an alternative CAS into their classrooms. By highlighting both the strengths and limitations of the software, we hope that it will stimulate further debate not only among mathematics educators and software users but also also among symbolic computation and software developers.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
A Law of Robustness for Weight-bounded Neural Networks
Authors:
Hisham Husain,
Borja Balle
Abstract:
Robustness of deep neural networks against adversarial perturbations is a pressing concern motivated by recent findings showing the pervasive nature of such vulnerabilities. One method of characterizing the robustness of a neural network model is through its Lipschitz constant, which forms a robustness certificate. A natural question to ask is, for a fixed model class (such as neural networks) and…
▽ More
Robustness of deep neural networks against adversarial perturbations is a pressing concern motivated by recent findings showing the pervasive nature of such vulnerabilities. One method of characterizing the robustness of a neural network model is through its Lipschitz constant, which forms a robustness certificate. A natural question to ask is, for a fixed model class (such as neural networks) and a dataset of size $n$, what is the smallest achievable Lipschitz constant among all models that fit the dataset? Recently, (Bubeck et al., 2020) conjectured that when using two-layer networks with $k$ neurons to fit a generic dataset, the smallest Lipschitz constant is $Ω(\sqrt{\frac{n}{k}})$. This implies that one would require one neuron per data point to robustly fit the data. In this work we derive a lower bound on the Lipschitz constant for any arbitrary model class with bounded Rademacher complexity. Our result coincides with that conjectured in (Bubeck et al., 2020) for two-layer networks under the assumption of bounded weights. However, due to our result's generality, we also derive bounds for multi-layer neural networks, discovering that one requires $\log n$ constant-sized layers to robustly fit the data. Thus, our work establishes a law of robustness for weight bounded neural networks and provides formal evidence on the necessity of over-parametrization in deep learning.
△ Less
Submitted 12 March, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Regularized Policies are Reward Robust
Authors:
Hisham Husain,
Kamil Ciosek,
Ryota Tomioka
Abstract:
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the…
▽ More
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we apply to other existing regularization schemes. Our results thus give insights into the effects of regularization of policies and deepen our understanding of exploration through robust rewards at large.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Fair Densities via Boosting the Sufficient Statistics of Exponential Families
Authors:
Alexander Soen,
Hisham Husain,
Richard Nock
Abstract:
We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned d…
▽ More
We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.
△ Less
Submitted 15 August, 2023; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Optimal Continual Learning has Perfect Memory and is NP-hard
Authors:
Jeremias Knoblauch,
Hisham Husain,
Tom Diethe
Abstract:
Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms wo…
▽ More
Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms would have to possess in order to avoid catastrophic forgetting. Our main finding is that such optimal CL algorithms generally solve an NP-hard problem and will require perfect memory to do so. The findings are of theoretical interest, but also explain the excellent performance of CL algorithms using experience replay, episodic memory and core sets relative to regularization-based approaches.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
Distributional Robustness with IPMs and links to Regularization and GANs
Authors:
Hisham Husain
Abstract:
Robustness to adversarial attacks is an important concern due to the fragility of deep neural networks to small perturbations and has received an abundance of attention in recent years. Distributionally Robust Optimization (DRO), a particularly promising way of addressing this challenge, studies robustness via divergence-based uncertainty sets and has provided valuable insights into robustificatio…
▽ More
Robustness to adversarial attacks is an important concern due to the fragility of deep neural networks to small perturbations and has received an abundance of attention in recent years. Distributionally Robust Optimization (DRO), a particularly promising way of addressing this challenge, studies robustness via divergence-based uncertainty sets and has provided valuable insights into robustification strategies such as regularization. In the context of machine learning, the majority of existing results have chosen $f$-divergences, Wasserstein distances and more recently, the Maximum Mean Discrepancy (MMD) to construct uncertainty sets. We extend this line of work for the purposes of understanding robustness via regularization by studying uncertainty sets constructed with Integral Probability Metrics (IPMs) - a large family of divergences including the MMD, Total Variation and Wasserstein distances. Our main result shows that DRO under \textit{any} choice of IPM corresponds to a family of regularization penalties, which recover and improve upon existing results in the setting of MMD and Wasserstein distances. Due to the generality of our result, we show that other choices of IPMs correspond to other commonly used penalties in machine learning. Furthermore, we extend our results to shed light on adversarial generative modelling via $f$-GANs, constituting the first study of distributional robustness for the $f$-GAN objective. Our results unveil the inductive properties of the discriminator set with regards to robustness, allowing us to give positive comments for several penalty-based GAN methods such as Wasserstein-, MMD- and Sobolev-GANs. In summary, our results intimately link GANs to distributional robustness, extend previous results on DRO and contribute to our understanding of the link between regularization and robustness at large.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Authors:
Hamel Husain,
Ho-Hsiang Wu,
Tiferet Gazit,
Miltiadis Allamanis,
Marc Brockschmidt
Abstract:
Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly technical) and natural language more suitable to describe vague concepts and ideas.
To enable evaluation of progress on code search, we are releasing the CodeSe…
▽ More
Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly technical) and natural language more suitable to describe vague concepts and ideas.
To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus. The corpus contains about 6 million functions from open-source code spanning six programming languages (Go, Java, JavaScript, PHP, Python, and Ruby). The CodeSearchNet Corpus also contains automatically generated query-like natural language for 2 million functions, obtained from mechanically scraping and preprocessing associated function documentation. In this article, we describe the methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task.
We hope that CodeSearchNet Challenge encourages researchers and practitioners to study this interesting task further and will host a competition and leaderboard to track the progress on the challenge. We are also keen on extending CodeSearchNet Challenge to more queries and programming languages in the future.
△ Less
Submitted 8 June, 2020; v1 submitted 20 September, 2019;
originally announced September 2019.
-
Derived Functor Cohomology Groups with Yoneda Product
Authors:
Hafiz Syed Husain,
Mariam Sultana
Abstract:
This work presents an exposition of both the internal structure of derived category of an abelian category D*(A) and its contribution in solving problems, particularly in algebraic geometry. Calculation of some morphisms will be presented between objects in D*(A) as elements in appropriate cohomology groups along with their compositions with the help of Yoneda construction under the assumption tha…
▽ More
This work presents an exposition of both the internal structure of derived category of an abelian category D*(A) and its contribution in solving problems, particularly in algebraic geometry. Calculation of some morphisms will be presented between objects in D*(A) as elements in appropriate cohomology groups along with their compositions with the help of Yoneda construction under the assumption that the homological dimension of D*(A) is greater than or equal to 2. These computational settings will then be considered under sheaf cohomological context with a particular case from projective geometry.
△ Less
Submitted 29 March, 2019;
originally announced April 2019.
-
Adversarial Networks and Autoencoders: The Primal-Dual Relationship and Generalization Bounds
Authors:
Hisham Husain,
Richard Nock,
Robert C. Williamson
Abstract:
Since the introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAE), the literature on generative modelling has witnessed an overwhelming resurgence. The impressive, yet elusive empirical performance of GANs has lead to the rise of many GAN-VAE hybrids, with the hopes of GAN level performance and additional benefits of VAE, such as an encoder for feature reduction,…
▽ More
Since the introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAE), the literature on generative modelling has witnessed an overwhelming resurgence. The impressive, yet elusive empirical performance of GANs has lead to the rise of many GAN-VAE hybrids, with the hopes of GAN level performance and additional benefits of VAE, such as an encoder for feature reduction, which is not offered by GANs. Recently, the Wasserstein Autoencoder (WAE) was proposed, achieving performance similar to that of GANs, yet it is still unclear whether the two are fundamentally different or can be further improved into a unified model. In this work, we study the $f$-GAN and WAE models and make two main discoveries. First, we find that the $f$-GAN and WAE objectives partake in a primal-dual relationship and are equivalent under some assumptions, which then allows us to explicate the success of WAE. Second, the equivalence result allows us to, for the first time, prove generalization bounds for Autoencoder models, which is a pertinent problem when it comes to theoretical analyses of generative models. Furthermore, we show that the WAE objective is related to other statistical quantities such as the $f$-divergence and in particular, upper bounded by the Wasserstein distance, which then allows us to tap into existing efficient (regularized) optimal transport solvers. Our findings thus present the first primal-dual relationship between GANs and Autoencoder models, comment on generalization abilities and make a step towards unifying these models.
△ Less
Submitted 26 April, 2019; v1 submitted 3 February, 2019;
originally announced February 2019.
-
Integral Privacy for Sampling
Authors:
Hisham Husain,
Zac Cranko,
Richard Nock
Abstract:
Differential privacy is a leading protection setting, focused by design on individual privacy. Many applications, in medical / pharmaceutical domains or social networks, rather posit privacy at a group level, a setting we call integral privacy. We aim for the strongest form of privacy: the group size is in particular not known in advance. We study a problem with related applications in domains cit…
▽ More
Differential privacy is a leading protection setting, focused by design on individual privacy. Many applications, in medical / pharmaceutical domains or social networks, rather posit privacy at a group level, a setting we call integral privacy. We aim for the strongest form of privacy: the group size is in particular not known in advance. We study a problem with related applications in domains cited above that have recently met with substantial recent press: sampling.
Keeping correct utility levels in such a strong model of statistical indistinguishability looks difficult to be achieved with the usual differential privacy toolbox because it would typically scale in the worst case the sensitivity by the sample size and so the noise variance by up to its square. We introduce a trick specific to sampling that bypasses the sensitivity analysis. Privacy enforces an information theoretic barrier on approximation, and we show how to reach this barrier with guarantees on the approximation of the target non private density. We do so using a recent approach to non private density estimation relying on the original boosting theory, learning the sufficient statistics of an exponential family with classifiers. Approximation guarantees cover the mode capture problem. In the context of learning, the sampling problem is particularly important: because integral privacy enjoys the same closure under post-processing as differential privacy does, any algorithm using integrally privacy sampled data would result in an output equally integrally private. We also show that this brings fairness guarantees on post-processing that would eventually elude classical differential privacy: any decision process has bounded data-dependent bias when the data is integrally privately sampled. Experimental results against private kernel density estimation and private GANs displays the quality of our results.
△ Less
Submitted 2 July, 2019; v1 submitted 12 June, 2018;
originally announced June 2018.
-
A Cloud-based Service for Real-Time Performance Evaluation of NoSQL Databases
Authors:
Omar Almootassem,
Syed Hamza Husain,
Denesh Parthipan,
Qusay H. Mahmoud
Abstract:
We have created a cloud-based service that allows the end users to run tests on multiple different databases to find which databases are most suitable for their project. From our research, we could not find another application that enables the user to test several databases to gauge the difference between them. This application allows the user to choose which type of test to perform and which data…
▽ More
We have created a cloud-based service that allows the end users to run tests on multiple different databases to find which databases are most suitable for their project. From our research, we could not find another application that enables the user to test several databases to gauge the difference between them. This application allows the user to choose which type of test to perform and which databases to target. The application also displays the results of different tests that were run by other users previously. There is also a map to show the location where all the tests are run to give the user an estimate of the location. Unlike the orthodox static tests and reports conducted to evaluate NoSQL databases, we have created a web application to run and analyze these tests in real time. This web application evaluates the performance of several NoSQL databases. The databases covered are MongoDB, DynamoDB, CouchDB, and Firebase. The web service is accessible from: nosqldb.nextproject.ca.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
Fuzzy Model on Human Emotions Recognition
Authors:
Kaveh Bakhtiyari,
Hafizah Husain
Abstract:
This paper discusses a fuzzy model for multi-level human emotions recognition by computer systems through keyboard keystrokes, mouse and touchscreen interactions. This model can also be used to detect the other possible emotions at the time of recognition. Accuracy measurements of human emotions by the fuzzy model are discussed through two methods; the first is accuracy analysis and the second is…
▽ More
This paper discusses a fuzzy model for multi-level human emotions recognition by computer systems through keyboard keystrokes, mouse and touchscreen interactions. This model can also be used to detect the other possible emotions at the time of recognition. Accuracy measurements of human emotions by the fuzzy model are discussed through two methods; the first is accuracy analysis and the second is false positive rate analysis. This fuzzy model detects more emotions, but on the other hand, for some of emotions, a lower accuracy was obtained with the comparison with the non-fuzzy human emotions detection methods. This system was trained and tested by Support Vector Machine (SVM) to recognize the users' emotions. Overall, this model represents a closer similarity between human brain detection of emotions and computer systems.
△ Less
Submitted 6 July, 2014;
originally announced July 2014.