Search | arXiv e-print repository

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2402.00355 [pdf, other]

Adaptive Primal-Dual Method for Safe Reinforcement Learning

Authors: Weiqin Chen, James Onyejizu, Long Vu, Lan Hoang, Dharmashankar Subramanian, Koushik Kar, Sandipan Mishra, Santiago Paternain

Abstract: Primal-dual methods have a natural application in Safe Reinforcement Learning (SRL), posed as a constrained policy optimization problem. In practice however, applying primal-dual methods to SRL is challenging, due to the inter-dependency of the learning rate (LR) and Lagrangian multipliers (dual variables) each time an embedded unconstrained RL problem is solved. In this paper, we propose, analyze… ▽ More Primal-dual methods have a natural application in Safe Reinforcement Learning (SRL), posed as a constrained policy optimization problem. In practice however, applying primal-dual methods to SRL is challenging, due to the inter-dependency of the learning rate (LR) and Lagrangian multipliers (dual variables) each time an embedded unconstrained RL problem is solved. In this paper, we propose, analyze and evaluate adaptive primal-dual (APD) methods for SRL, where two adaptive LRs are adjusted to the Lagrangian multipliers so as to optimize the policy in each iteration. We theoretically establish the convergence, optimality and feasibility of the APD algorithm. Finally, we conduct numerical evaluation of the practical APD algorithm with four well-known environments in Bullet-Safey-Gym employing two state-of-the-art SRL algorithms: PPO-Lagrangian and DDPG-Lagrangian. All experiments show that the practical APD algorithm outperforms (or achieves comparable performance) and attains more stable training than the constant LR cases. Additionally, we substantiate the robustness of selecting the two adaptive LRs by empirical evidence. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.06436 [pdf, other]

doi 10.1109/KSE53942.2021.9648823

Improving Graph Convolutional Networks with Transformer Layer in social-based items recommendation

Authors: Thi Linh Hoang, Tuan Dung Pham, Viet Cuong Ta

Abstract: In this work, we have proposed an approach for improving the GCN for predicting ratings in social networks. Our model is expanded from the standard model with several layers of transformer architecture. The main focus of the paper is on the encoder architecture for node embedding in the network. Using the embedding layer from the graph-based convolution layer, the attention mechanism could rearran… ▽ More In this work, we have proposed an approach for improving the GCN for predicting ratings in social networks. Our model is expanded from the standard model with several layers of transformer architecture. The main focus of the paper is on the encoder architecture for node embedding in the network. Using the embedding layer from the graph-based convolution layer, the attention mechanism could rearrange the feature space to get a more efficient embedding for the downstream task. The experiments showed that our proposed architecture achieves better performance than GCN on the traditional link prediction task. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2311.15297 [pdf, other]

Controllable Expensive Multi-objective Learning with Warm-starting Bayesian Optimization

Authors: Quang-Huy Nguyen, Long P. Hoang, Hoang V. Viet, Dung D. Le

Abstract: Pareto Set Learning (PSL) is a promising approach for approximating the entire Pareto front in multi-objective optimization (MOO) problems. However, existing derivative-free PSL methods are often unstable and inefficient, especially for expensive black-box MOO problems where objective function evaluations are costly. In this work, we propose to address the instability and inefficiency of existing… ▽ More Pareto Set Learning (PSL) is a promising approach for approximating the entire Pareto front in multi-objective optimization (MOO) problems. However, existing derivative-free PSL methods are often unstable and inefficient, especially for expensive black-box MOO problems where objective function evaluations are costly. In this work, we propose to address the instability and inefficiency of existing PSL methods with a novel controllable PSL method, called Co-PSL. Particularly, Co-PSL consists of two stages: (1) warm-starting Bayesian optimization to obtain quality Gaussian Processes priors and (2) controllable Pareto set learning to accurately acquire a parametric mapping from preferences to the corresponding Pareto solutions. The former is to help stabilize the PSL process and reduce the number of expensive function evaluations. The latter is to support real-time trade-off control between conflicting objectives. Performances across synthesis and real-world MOO problems showcase the effectiveness of our Co-PSL for expensive multi-objective optimization tasks. △ Less

Submitted 9 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

arXiv:2306.13814 [pdf, other]

BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large Graphs

Authors: Loc Hoang, Rita Brugarolas Brufau, Ke Ding, Bo Wu

Abstract: We present BatchGNN, a distributed CPU system that showcases techniques that can be used to efficiently train GNNs on terabyte-sized graphs. It reduces communication overhead with macrobatching in which multiple minibatches' subgraph sampling and feature fetching are batched into one communication relay to reduce redundant feature fetches when input features are static. BatchGNN provides integrate… ▽ More We present BatchGNN, a distributed CPU system that showcases techniques that can be used to efficiently train GNNs on terabyte-sized graphs. It reduces communication overhead with macrobatching in which multiple minibatches' subgraph sampling and feature fetching are batched into one communication relay to reduce redundant feature fetches when input features are static. BatchGNN provides integrated graph partitioning and native GNN layer implementations to improve runtime, and it can cache aggregated input features to further reduce sampling overhead. BatchGNN achieves an average $3\times$ speedup over DistDGL on three GNN models trained on OGBN graphs, outperforms the runtimes reported by distributed GPU systems $P^3$ and DistDGLv2, and scales to a terabyte-sized graph. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: Edited preprint of a conference submission

arXiv:2212.06882 [pdf]

Envisioning a Human-AI collaborative system to transform policies into decision models

Authors: Vanessa Lopez, Gabriele Picco, Inge Vejsbjerg, Thanh Lam Hoang, Yufang Hou, Marco Luca Sbodio, John Segrave-Daly, Denisa Moga, Sean Swords, Miao Wei, Eoin Carroll

Abstract: Regulations govern many aspects of citizens' daily lives. Governments and businesses routinely automate these in the form of coded rules (e.g., to check a citizen's eligibility for specific benefits). However, the path to automation is long and challenging. To address this, recent global initiatives for digital government, proposing to simultaneously express policy in natural language for human co… ▽ More Regulations govern many aspects of citizens' daily lives. Governments and businesses routinely automate these in the form of coded rules (e.g., to check a citizen's eligibility for specific benefits). However, the path to automation is long and challenging. To address this, recent global initiatives for digital government, proposing to simultaneously express policy in natural language for human consumption as well as computationally amenable rules or code, are gathering broad public-sector interest. We introduce the problem of semi-automatically building decision models from eligibility policies for social services, and present an initial emerging approach to shorten the route from policy documents to executable, interpretable and standardised decision models using AI, NLP and Knowledge Graphs. Despite the many open domain challenges, in this position paper we explore the enormous potential of AI to assist government agencies and policy experts in scaling the production of both human-readable and machine executable policy rules, while improving transparency, interpretability, traceability and accountability of the decision making. △ Less

Submitted 1 November, 2022; originally announced December 2022.

Comments: 9 pages, 7 figures

MSC Class: 68T30 ACM Class: H.4

arXiv:2212.01130 [pdf, other]

Improving Pareto Front Learning via Multi-Sample Hypernetworks

Authors: Long P. Hoang, Dung D. Le, Tran Anh Tuan, Tran Ngoc Thang

Abstract: Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preferen… ▽ More Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preference of one Pareto solution over another, and must switch between them depending on the situation. However, existing PFL methods ignore the relationship between the solutions during the optimization process, which hinders the quality of the obtained front. To overcome this issue, we propose a novel PFL framework namely PHN-HVI, which employs a hypernetwork to generate multiple solutions from a set of diverse trade-off preferences and enhance the quality of the Pareto front by maximizing the Hypervolume indicator defined by these solutions. The experimental results on several MOO machine learning tasks show that the proposed framework significantly outperforms the baselines in producing the trade-off Pareto front. △ Less

Submitted 28 April, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI-23

arXiv:2211.01179 [pdf, other]

Tournesol: Permissionless Collaborative Algorithmic Governance with Security Guarantees

Authors: Lê Nguyên Hoang, Romain Beylerian, Bérangère Colbois, Julien Fageot, Louis Faucon, Aidan Jungo, Alain Le Noac'h, Adrien Matissart, Oscar Villemaud

Abstract: Recommendation algorithms play an increasingly central role in our information ecosystem. Yet, so far, they are mostly designed, parameterized and updated unilaterally by private groups or governmental authorities, based on insecure data from increasingly many fake accounts. In this paper, we present an end-to-end permissionless collaborative algorithmic governance pipeline with security guarantee… ▽ More Recommendation algorithms play an increasingly central role in our information ecosystem. Yet, so far, they are mostly designed, parameterized and updated unilaterally by private groups or governmental authorities, based on insecure data from increasingly many fake accounts. In this paper, we present an end-to-end permissionless collaborative algorithmic governance pipeline with security guarantees, which is deployed on the open-source platform https://tournesol.app. Our pipeline has essentially four steps. First, voting rights are assigned to the contributors, based on Sybil-resilient email domains and on a novel secure trust propagation algorithm. Second, a generalized Bradley-Terry model turns contributors' pairwise alternative comparisons into scores. Third, contributors' scores are collaboratively scaled, by an adaptation of the robust sparse voting solution Mehestan. Finally, scaled scores are post-processed and securely aggregated into human-readable global scores, which are used for recommendation and display. We believe that our pipeline lays an appealing foundation for any collaborative, effective, scalable, fair, interpretable and secure algorithmic governance. △ Less

Submitted 15 August, 2023; v1 submitted 30 October, 2022; originally announced November 2022.

Comments: 33 pages, 8 figures

arXiv:2209.15259 [pdf, ps, other]

On the Impossible Safety of Large AI Models

Authors: El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê-Nguyên Hoang, Rafael Pinot, Sébastien Rouault, John Stephan

Abstract: Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging featur… ▽ More Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees. △ Less

Submitted 9 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 40 pages

arXiv:2209.10931 [pdf, other]

Robust Collaborative Learning with Linear Gradient Overhead

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê Nguyên Hoang, Rafael Pinot, John Stephan

Abstract: Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous da… ▽ More Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(α, λ)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning. △ Less

Submitted 3 June, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: Accepted paper at ICML 2023

arXiv:2202.08656 [pdf, other]

Robust Sparse Voting

Authors: Youssef Allouah, Rachid Guerraoui, Lê-Nguyên Hoang, Oscar Villemaud

Abstract: Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to ha… ▽ More Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to hack the voting process by reporting dishonest scores. We give a precise definition of the problem of robust sparse voting, highlight its underlying technical challenges, and present a novel voting mechanism addressing the problem. We prove that, using this mechanism, no voter can have more than a small parameterizable effect on each alternative's score; a property we call Lipschitz resilience. We also identify conditions of voters comparability under which any unanimous preferences can be recovered, even when each voter provides sparse scores, on a scale that is potentially very different from any other voter's score scale. Proving these properties required us to introduce, analyze and carefully compose novel aggregation primitives which could be of independent interest. △ Less

Submitted 25 January, 2024; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: Accepted at AISTATS 2024

arXiv:2202.08578 [pdf, other]

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang, Oscar Villemaud

Abstract: To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this mod… ▽ More To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models. △ Less

Submitted 20 July, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.02398

Journal ref: ICML 2022

arXiv:2112.07790 [pdf, ps, other]

Maximum Bayes Smatch Ensemble Distillation for AMR Parsing

Authors: Young-Suk Lee, Ramon Fernandez Astudillo, Thanh Lam Hoang, Tahira Naseem, Radu Florian, Salim Roukos

Abstract: AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data augmentation seems to be fading. In this pa… ▽ More AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data augmentation seems to be fading. In this paper we propose to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation. In an extensive experimental setup, we push single model English parser performance to a new state-of-the-art, 85.9 (AMR2.0) and 84.3 (AMR3.0), and return to substantial gains from silver data augmentation. We also attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish. Finally we explore the impact of the proposed technique on domain adaptation, and show that it can produce gains rivaling those of human annotated data for QALD-9 and achieve a new state-of-the-art for BioAMR. △ Less

Submitted 2 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Journal ref: NAACL-HLT 2022

arXiv:2107.07334 [pdf, other]

Tournesol: A quest for a large, secure and trustworthy database of reliable human judgments

Authors: Lê-Nguyên Hoang, Louis Faucon, Aidan Jungo, Sergei Volodin, Dalia Papuc, Orfeas Liossatos, Ben Crulis, Mariame Tighanimine, Isabela Constantin, Anastasiia Kucherenko, Alexandre Maurer, Felix Grimberg, Vlad Nitu, Chris Vossen, Sébastien Rouault, El-Mahdi El-Mhamdi

Abstract: Today's large-scale algorithms have become immensely influential, as they recommend and moderate the content that billions of humans are exposed to on a daily basis. They are the de-facto regulators of our societies' information diet, from shaping opinions on public health to organizing groups for social movements. This creates serious concerns, but also great opportunities to promote quality info… ▽ More Today's large-scale algorithms have become immensely influential, as they recommend and moderate the content that billions of humans are exposed to on a daily basis. They are the de-facto regulators of our societies' information diet, from shaping opinions on public health to organizing groups for social movements. This creates serious concerns, but also great opportunities to promote quality information. Addressing the concerns and seizing the opportunities is a challenging, enormous and fabulous endeavor, as intuitively appealing ideas often come with unwanted {\it side effects}, and as it requires us to think about what we deeply prefer. Understanding how today's large-scale algorithms are built is critical to determine what interventions will be most effective. Given that these algorithms rely heavily on {\it machine learning}, we make the following key observation: \emph{any algorithm trained on uncontrolled data must not be trusted}. Indeed, a malicious entity could take control over the data, poison it with dangerously manipulative fabricated inputs, and thereby make the trained algorithm extremely unsafe. We thus argue that the first step towards safe and ethical large-scale algorithms must be the collection of a large, secure and trustworthy dataset of reliable human judgments. To achieve this, we introduce \emph{Tournesol}, an open source platform available at \url{https://tournesol.app}. Tournesol aims to collect a large database of human judgments on what algorithms ought to widely recommend (and what they ought to stop widely recommending). We outline the structure of the Tournesol database, the key features of the Tournesol platform and the main hurdles that must be overcome to make it a successful project. Most importantly, we argue that, if successful, Tournesol may then serve as the essential foundation for any safe and ethical large-scale algorithm. △ Less

Submitted 29 May, 2021; originally announced July 2021.

Comments: 27 pages, 13 figures

arXiv:2106.08500 [pdf, other]

Optimizing Graph Transformer Networks with Graph-based Techniques

Authors: Loc Hoang, Udit Agarwal, Gurbinder Gill, Roshan Dathathri, Abhik Seal, Brian Martin, Keshav Pingali

Abstract: Graph transformer networks (GTN) are a variant of graph convolutional networks (GCN) that are targeted to heterogeneous graphs in which nodes and edges have associated type information that can be exploited to improve inference accuracy. GTNs learn important metapaths in the graph, create weighted edges for these metapaths, and use the resulting graph in a GCN. Currently, the only available implem… ▽ More Graph transformer networks (GTN) are a variant of graph convolutional networks (GCN) that are targeted to heterogeneous graphs in which nodes and edges have associated type information that can be exploited to improve inference accuracy. GTNs learn important metapaths in the graph, create weighted edges for these metapaths, and use the resulting graph in a GCN. Currently, the only available implementation of GTNs uses dense matrix multiplication to find metapaths. Unfortunately, the space overhead of this approach can be large, so in practice it is used only for small graphs. In addition, the matrix-based implementation is not fine-grained enough to use random-walk based methods to optimize metapath finding. In this paper, we present a graph-based formulation and implementation of the GTN metapath finding problem. This graph-based formulation has two advantages over the matrix-based approach. First, it is more space efficient than the original GTN implementation and more compute-efficient for metapath sizes of practical interest. Second, it permits us to implement a sampling method that reduces the number of metapaths that must be enumerated, allowing the implementation to be used for larger graphs and larger metapath sizes. Experimental results show that our implementation is $6.5\times$ faster than the original GTN implementation on average for a metapath length of 4, and our sampling implementation is $155\times$ faster on average than this implementation without compromising on the accuracy of the GTN. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.02398 [pdf, other]

Strategyproof Learning: Building Trustworthy User-Generated Datasets

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang

Abstract: We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality. By leveraging a careful design of the loss function, we propose Licchavi, a global and personalized learning framework with provable strategyproofness guarantees. Essentially, we prove that no user can gain much by replying to Licchavi's queries with answers that deviate from their true preference… ▽ More We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality. By leveraging a careful design of the loss function, we propose Licchavi, a global and personalized learning framework with provable strategyproofness guarantees. Essentially, we prove that no user can gain much by replying to Licchavi's queries with answers that deviate from their true preferences. Interestingly, Licchavi also promotes the desirable "one person, one unit-force vote" fairness principle. Furthermore, our empirical evaluation of its performance showcases Licchavi's real-world applicability. We believe that our results are critical for the safety of any learning scheme that leverages user-generated data. △ Less

Submitted 18 February, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: 31 pages

arXiv:2106.02394 [pdf, other]

On the Strategyproofness of the Geometric Median

Authors: El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang

Abstract: The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness… ▽ More The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness. While we observe that the geometric median is not even approximately strategyproof, we prove that it is asymptotically $α$-strategyproof: when the number of users is large enough, a user that misbehaves can gain at most a multiplicative factor $α$, which we compute as a function of the distribution followed by the users. We then generalize our results to the case where users actually care more about specific dimensions, determining how this impacts $α$. We also show how the skewed geometric medians can be used to improve strategyproofness. △ Less

Submitted 2 June, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Accepted paper at AISTATS 2023

arXiv:2101.06592 [pdf, other]

TSEC: a framework for online experimentation under experimental constraints

Authors: Simon Mak, Yuanshuo Zhou, Lavonne Hoang, C. F. Jeff Wu

Abstract: Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) $N$ can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting… ▽ More Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) $N$ can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting on only a small subset of $K \ll N$ arms within each time period, which poses a problem for traditional Thompson sampling. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called "arm budget constraint". TSEC makes use of a Bayesian interaction model with effect hierarchy priors, to model correlations between rewards on different arms. This fitted model is then integrated within Thompson sampling, to jointly identify a good subset of arms for experimentation and to allocate resources over these arms. We demonstrate the effectiveness of TSEC in two problems with arm budget constraints. The first is a simulated website optimization study, where TSEC shows noticeable improvements over industry benchmarks. The second is a portfolio optimization application on industry-based exchange-traded funds, where TSEC provides more consistent and greater wealth accumulation over standard investment strategies. △ Less

Submitted 17 January, 2021; originally announced January 2021.

arXiv:2011.03135 [pdf, other]

Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

Authors: Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali

Abstract: Graph pattern mining (GPM) is used in diverse application areas including social network analysis, bioinformatics, and chemical engineering. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a wide variety of GPM algorithms at the cost of increased programming complexity. Moreover, existing s… ▽ More Graph pattern mining (GPM) is used in diverse application areas including social network analysis, bioinformatics, and chemical engineering. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a wide variety of GPM algorithms at the cost of increased programming complexity. Moreover, existing systems lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications. We present Sandslash, an in-memory Graph Pattern Mining (GPM) framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that needs only a specification of the GPM problem, and it implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash on shared-memory machines using five GPM applications and a wide range of large real-world graphs. Experimental results demonstrate that applications written using Sandslash high-level or low-level API outperforms state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort. △ Less

Submitted 5 November, 2020; originally announced November 2020.

arXiv:2008.04256 [pdf, ps, other]

Purely Bayesian counterfactuals versus Newcomb's paradox

Authors: Lê Nguyên Hoang

Abstract: This paper proposes a careful separation between an entity's epistemic system and their decision system. Crucially, Bayesian counterfactuals are estimated by the epistemic system; not by the decision system. Based on this remark, I prove the existence of Newcomb-like problems for which an epistemic system necessarily expects the entity to make a counterfactually bad decision. I then address (a sli… ▽ More This paper proposes a careful separation between an entity's epistemic system and their decision system. Crucially, Bayesian counterfactuals are estimated by the epistemic system; not by the decision system. Based on this remark, I prove the existence of Newcomb-like problems for which an epistemic system necessarily expects the entity to make a counterfactually bad decision. I then address (a slight generalization of) Newcomb's paradox. I solve the specific case where the player believes that the predictor applies Bayes rule with a supset of all the data available to the player. I prove that the counterfactual optimality of the 1-Box strategy depends on the player's prior on the predictor's additional data. If these additional data are not expected to reduce sufficiently the predictor's uncertainty on the player's decision, then the player's epistemic system will counterfactually prefer to 2-Box. But if the predictor's data is believed to make them quasi-omniscient, then 1-Box will be counterfactually preferred. Implications of the analysis are then discussed. More generally, I argue that, to better understand or design an entity, it is useful to clearly separate the entity's epistemic, decision, but also data collection, reward and maintenance systems, whether the entity is human, algorithmic or institutional. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2008.00742 [pdf, other]

Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)

Authors: El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Arsany Guirguis, Lê Nguyên Hoang, Sébastien Rouault

Abstract: We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f < n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial v… ▽ More We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f < n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial vector and seek to approximately agree on a common vector, which is close to the average of honest nodes' initial vectors. We present two asynchronous solutions to averaging agreement, each we prove optimal according to some dimension. The first, based on the minimum-diameter averaging, requires $ n \geq 6f+1$, but achieves asymptotically the best-possible averaging constant up to a multiplicative constant. The second, based on reliable broadcast and coordinate-wise trimmed mean, achieves optimal Byzantine resilience, i.e., $n \geq 3f+1$. Each of these algorithms induces an optimal Byzantine collaborative learning protocol. In particular, our equivalence yields new impossibility theorems on what any collaborative learning algorithm can achieve in adversarial and heterogeneous environments. △ Less

Submitted 1 December, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: 34 pages, 1 figure

Journal ref: NeurIPS 2021

arXiv:1912.04206 [pdf]

Subreddit R/unpopularopinion: Against the Spiral of Silence

Authors: Linh Hoang, Gerry Oei, Dasom Eom

Abstract: As a social animal, human conforms to the customs of society, and their behavior and opinions are greatly influenced by social norms [5]. Accordingly, people choose to hide their opinion in front of others when they feel their idea is against majority opinion. This social phenomenon is called the spiral of silence [4]. Recently, the advent of internet technology and online communication has made p… ▽ More As a social animal, human conforms to the customs of society, and their behavior and opinions are greatly influenced by social norms [5]. Accordingly, people choose to hide their opinion in front of others when they feel their idea is against majority opinion. This social phenomenon is called the spiral of silence [4]. Recently, the advent of internet technology and online communication has made people to stand up and express their opinion openly in online space and to break the wall of the spiral of silence [12]. The subreddit r/unpopularopinion has been designed to allow users to share and discuss socially unpopular thought frankly. In this study, we introduce a descriptive study to show how the subreddit r/unpopularopinion helps its users to get over the spiral of silence. The r/unpopularopinion was analyzed based on the findings from community observation and participant interviews. Even though the environment of r/unpopularopinion encourages the users to freely express their opinion, the dominant population of this community is young white males, and the opinions representing their group are supported by general users and come up to the front page. Consequently, the minority opinions are neglected, and it results in another spiral of silence phenomenon. In future research, △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.00941 [pdf, other]

FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation

Authors: Le-Ha Hoang, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Deep Neural Networks (DNNs) are widely being adopted for safety-critical applications, e.g., healthcare and autonomous driving. Inherently, they are considered to be highly error-tolerant. However, recent studies have shown that hardware faults that impact the parameters of a DNN (e.g., weights) can have drastic impacts on its classification accuracy. In this paper, we perform a comprehensive erro… ▽ More Deep Neural Networks (DNNs) are widely being adopted for safety-critical applications, e.g., healthcare and autonomous driving. Inherently, they are considered to be highly error-tolerant. However, recent studies have shown that hardware faults that impact the parameters of a DNN (e.g., weights) can have drastic impacts on its classification accuracy. In this paper, we perform a comprehensive error resilience analysis of DNNs subjected to hardware faults (e.g., permanent faults) in the weight memory. The outcome of this analysis is leveraged to propose a novel error mitigation technique which squashes the high-intensity faulty activation values to alleviate their impact. We achieve this by replacing the unbounded activation functions with their clipped versions. We also present a method to systematically define the clipping values of the activation functions that result in increased resilience of the networks against faults. We evaluate our technique on the AlexNet and the VGG-16 DNNs trained for the CIFAR-10 dataset. The experimental results show that our mitigation technique significantly improves the resilience of the DNNs to faults. For example, the proposed technique offers on average 68.92% improvement in the classification accuracy of resilience-optimized VGG-16 model at 1e-5 fault rate, when compared to the base network without any fault mitigation. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: The 23rd Design, Automation and Test in Europe (DATE 2020)

arXiv:1911.09135 [pdf, other]

An Adaptive Load Balancer For Graph Analytical Applications on GPUs

Authors: Vishwesh Jatala, Loc Hoang, Roshan Dathathri, Gurbinder Gill, V Krishna Nandivada, Keshav Pingali

Abstract: Load-balancing among the threads of a GPU for graph analytics workloads is difficult because of the irregular nature of graph applications and the high variability in vertex degrees, particularly in power-law graphs. We describe a novel load balancing scheme to address this problem. Our scheme is implemented in the IrGL compiler to allow users to generate efficient load balanced code for a GPU fro… ▽ More Load-balancing among the threads of a GPU for graph analytics workloads is difficult because of the irregular nature of graph applications and the high variability in vertex degrees, particularly in power-law graphs. We describe a novel load balancing scheme to address this problem. Our scheme is implemented in the IrGL compiler to allow users to generate efficient load balanced code for a GPU from high-level sequential programs. We evaluated several graph analytics applications on up to 16 distributed GPUs using IrGL to compile the code and the Gluon substrate for inter-GPU communication. Our experiments show that this scheme can achieve an average speed-up of 2.2x on inputs that suffer from severe load imbalance problems when previous state-of-the-art load-balancing schemes are used. △ Less

Submitted 27 February, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

arXiv:1908.06972 [pdf, other]

doi 10.1109/ACCESS.2020.3045465

PrivFT: Private and Fast Text Classification with Homomorphic Encryption

Authors: Ahmad Al Badawi, Luong Hoang, Chan Fook Mun, Kim Laine, Khin Mi Mi Aung

Abstract: The need for privacy-preserving analytics is higher than ever due to the severity of privacy risks and to comply with new privacy regulations leading to an amplified interest in privacy-preserving techniques that try to balance between privacy and utility. In this work, we present an efficient method for Text Classification while preserving the privacy of the content using Fully Homomorphic Encryp… ▽ More The need for privacy-preserving analytics is higher than ever due to the severity of privacy risks and to comply with new privacy regulations leading to an amplified interest in privacy-preserving techniques that try to balance between privacy and utility. In this work, we present an efficient method for Text Classification while preserving the privacy of the content using Fully Homomorphic Encryption (FHE). Our system (named \textbf{Priv}ate \textbf{F}ast \textbf{T}ext (PrivFT)) performs two tasks: 1) making inference of encrypted user inputs using a plaintext model and 2) training an effective model using an encrypted dataset. For inference, we train a supervised model and outline a system for homomorphic inference on encrypted user inputs with zero loss to prediction accuracy. In the second part, we show how to train a model using fully encrypted data to generate an encrypted model. We provide a GPU implementation of the Cheon-Kim-Kim-Song (CKKS) FHE scheme and compare it with existing CPU implementations to achieve 1 to 2 orders of magnitude speedup at various parameter settings. We implement PrivFT in GPUs to achieve a run time per inference of less than 0.66 seconds. Training on a relatively large encrypted dataset is more computationally intensive requiring 5.04 days. △ Less

Submitted 18 November, 2019; v1 submitted 18 August, 2019; originally announced August 2019.

Comments: 13 pages, 3 figures, 4 tables, 5 Algorithms

Report number: 2169-3536

Journal ref: IEEE Access, 2020

arXiv:1905.03853 [pdf, other]

Genuinely Distributed Byzantine Machine Learning

Authors: El-Mahdi El-Mhamdi, Rachid Guerraoui, Arsany Guirguis, Lê Nguyên Hoang, Sébastien Rouault

Abstract: Machine Learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate B… ▽ More Machine Learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the ``general'' Byzantine-resilient distributed machine learning problem where no individual component is trusted. We show that this problem can be solved in an asynchronous system, despite the presence of $\frac{1}{3}$ Byzantine parameter servers and $\frac{1}{3}$ Byzantine workers (which is optimal). We present a new algorithm, ByzSGD, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, Scatter/Gather, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, Distributed Median Contraction (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring learning convergence. The third, Minimum-Diameter Averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. MDA requires loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives (e.g., Krum). Interestingly, ByzSGD ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. ByzSGD requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony. △ Less

Submitted 2 June, 2020; v1 submitted 5 May, 2019; originally announced May 2019.

Comments: This is a merge of arXiv:1905.03853 and arXiv:1911.07537; arXiv:1911.07537 will be retracted

arXiv:1904.07162 [pdf, other]

Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

Authors: Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, Keshav Pingali

Abstract: Intel Optane DC Persistent Memory (Optane PMM) is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this paper, we present key runtime and algorithmic principles to consider when performing graph analytics on extreme-scale graphs on large-memory platforms of this… ▽ More Intel Optane DC Persistent Memory (Optane PMM) is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this paper, we present key runtime and algorithmic principles to consider when performing graph analytics on extreme-scale graphs on large-memory platforms of this sort. To demonstrate the importance of these principles, we evaluate four existing shared-memory graph frameworks on large real-world web-crawls, using a machine with 6TB of Optane PMM. Our results show that frameworks based on the runtime and algorithmic principles advocated in this paper (i) perform significantly better than the others, and (ii) are competitive with graph analytics frameworks running on large production clusters. △ Less

Submitted 23 February, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: 11 pages

ACM Class: D.1.3; C.2.4; B.3.1

arXiv:1810.02891 [pdf, other]

Entity Tracking Improves Cloze-style Reading Comprehension

Authors: Luong Hoang, Sam Wiseman, Alexander M. Rush

Abstract: Reading comprehension tasks test the ability of models to process long-term context and remember salient information. Recent work has shown that relatively simple neural methods such as the Attention Sum-Reader can perform well on these tasks; however, these systems still significantly trail human performance. Analysis suggests that many of the remaining hard instances are related to the inability… ▽ More Reading comprehension tasks test the ability of models to process long-term context and remember salient information. Recent work has shown that relatively simple neural methods such as the Attention Sum-Reader can perform well on these tasks; however, these systems still significantly trail human performance. Analysis suggests that many of the remaining hard instances are related to the inability to track entity-references throughout documents. This work focuses on these hard entity tracking cases with two extensions: (1) additional entity features, and (2) training with a multi-task tracking objective. We show that these simple modifications improve performance both independently and in combination, and we outperform the previous state of the art on the LAMBADA dataset, particularly on difficult entity examples. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Comments: EMNLP 2018 (Short Paper)

arXiv:1809.01036 [pdf, other]

A Roadmap for Robust End-to-End Alignment

Authors: Lê Nguyên Hoang

Abstract: This paper discussed the {\it robust alignment} problem, that is, the problem of aligning the goals of algorithms with human preferences. It presented a general roadmap to tackle this issue. Interestingly, this roadmap identifies 5 critical steps, as well as many relevant aspects of these 5 steps. In other words, we have presented a large number of hopefully more tractable subproblems that readers… ▽ More This paper discussed the {\it robust alignment} problem, that is, the problem of aligning the goals of algorithms with human preferences. It presented a general roadmap to tackle this issue. Interestingly, this roadmap identifies 5 critical steps, as well as many relevant aspects of these 5 steps. In other words, we have presented a large number of hopefully more tractable subproblems that readers are highly encouraged to tackle. Hopefully, this combination allows to better highlight the most pressing problems, how every expertise can be best used to, and how combining the solutions to subproblems might add up to solve robust alignment. △ Less

Submitted 25 February, 2020; v1 submitted 4 September, 2018; originally announced September 2018.

Comments: 21 pages, 2 figures

arXiv:1806.02510 [pdf, other]

Removing Algorithmic Discrimination (With Minimal Individual Error)

Authors: El Mahdi El Mhamdi, Rachid Guerraoui, Lê Nguyên Hoang, Alexandre Maurer

Abstract: We address the problem of correcting group discriminations within a score function, while minimizing the individual error. Each group is described by a probability density function on the set of profiles. We first solve the problem analytically in the case of two populations, with a uniform bonus-malus on the zones where each population is a majority. We then address the general case of n populati… ▽ More We address the problem of correcting group discriminations within a score function, while minimizing the individual error. Each group is described by a probability density function on the set of profiles. We first solve the problem analytically in the case of two populations, with a uniform bonus-malus on the zones where each population is a majority. We then address the general case of n populations, where the entanglement of populations does not allow a similar analytical solution. We show that an approximate solution with an arbitrarily high level of precision can be computed with linear programming. Finally, we address the inverse problem where the error should not go beyond a certain value and we seek to minimize the discrimination. △ Less

Submitted 7 June, 2018; originally announced June 2018.

arXiv:1801.10437 [pdf, other]

Deep Learning Works in Practice. But Does it Work in Theory?

Authors: Lê Nguyên Hoang, Rachid Guerraoui

Abstract: Deep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it… ▽ More Deep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it actually performs. We argue that the explanation is intimately connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelizable logical depth. Roughly speaking, we conjecture that the shortest computational descriptions of the universe are algorithms with inherently large computation times, even when a large number of computers are available for parallelization. Interestingly, this conjecture, combined with the folklore conjecture in theoretical computer science that $ P \neq NC$, explains the success of deep learning. △ Less

Submitted 31 January, 2018; originally announced January 2018.

Comments: 6 pages, 4 figures

arXiv:1702.00887 [pdf, other]

Structured Attention Networks

Authors: Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush

Abstract: Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training. In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these struct… ▽ More Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training. In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees. We experiment with two different classes of structured attention networks: a linear-chain conditional random field and a graph-based parsing model, and describe how these models can be practically implemented as neural network layers. Experiments show that this approach is effective for incorporating structural biases, and structured attention networks outperform baseline attention models on a variety of synthetic and real tasks: tree transduction, neural machine translation, question answering, and natural language inference. We further find that models trained in this way learn interesting unsupervised hidden representations that generalize simple attention. △ Less

Submitted 16 February, 2017; v1 submitted 2 February, 2017; originally announced February 2017.

Comments: ICLR 2017

arXiv:1606.03966 [pdf, other]

Making Contextual Decisions with Low Technical Debt

Authors: Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins

Abstract: Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice is fraught with technical debt, and no general system exists that supports them completely. We address this and create the fi… ▽ More Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice is fraught with technical debt, and no general system exists that supports them completely. We address this and create the first general system for contextual learning, called the Decision Service. Existing systems often suffer from technical debt that arises from issues like incorrect data collection and weak debuggability, issues we systematically address through our ML methodology and system abstractions. The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy. Notably, our new explore and log abstractions ensure the system produces correct, unbiased data, which our learner uses for online learning and to enable real-time safeguards, all in a fully reproducible manner. The Decision Service has a simple user interface and works with a variety of applications: we present two live production deployments for content recommendation that achieved click-through improvements of 25-30%, another with 18% revenue lift in the landing page, and ongoing applications in tech support and machine failure handling. The service makes real-time decisions and learns continuously and scalably, while significantly lowering technical debt. △ Less

Submitted 9 May, 2017; v1 submitted 13 June, 2016; originally announced June 2016.

arXiv:1509.03807 [pdf]

doi 10.9734/BJESBS/2015/19429

Integrated Science, Technology, Engineering and Mathematics (STEM) Education through Active Experience of Designing Technical Toys in Vietnamese Schools

Authors: Le Xuan Quang, Le Huy Hoang, Vu Dinh Chuan, Nguyen Hoai Nam, Nguyen Thi Tu Anh, Vu Thi Hong Nhung

Abstract: STEM has attracted great consideration. The purpose of research is: (1) study STEM education, (2) explore STEM education with the creative and experiential activity, (3) suggest applying STEM education by designing technical toys for the middle school student. This study used a qualitative approach to carry out teaching integration for STEM education. The study applied to teaching the technologica… ▽ More STEM has attracted great consideration. The purpose of research is: (1) study STEM education, (2) explore STEM education with the creative and experiential activity, (3) suggest applying STEM education by designing technical toys for the middle school student. This study used a qualitative approach to carry out teaching integration for STEM education. The study applied to teaching the technological field in Vietnamese middle schools. The design performed at the Faculty of Technology Education, Hanoi National University of Education, Vietnam in April 2015. This study used the integrated approach to design subjects for STEM education. Two procedures for integration undertook with analysis. A sample of producing technical toy was consistent with developing students competencies. Integrated approach to STEM education through designing technical toys is possible. Recently, there has been a booming interest in Integrated Science, Technology, Engineering and Mathematics (STEM) education, but the approaches to STEM still remains controversial in diverse educational contexts. This study addressed this issue by exploring STEM education with the use of creative and experiential activities in a Vietnamese educational context. It also proposed a practical model for integrating STEM into teaching technology in secondary schools by designing technical toys. The implementation of the practical model suggests the possibility in using the integrated approach to STEM education through designing technical toys for middle school students in Vietnam. By applying the subject knowledge domains to solve real world problems and settings, the students can experience the benefits of a concrete and active learning in a meaningful and practical context. The multidisciplinary and interdisciplinary integration approaches are consistent with the development of the students competencies. △ Less

Submitted 13 September, 2015; originally announced September 2015.

Comments: 12 pages, 7 figures, 2 tables, British Journal of Education, Society & Behavioural Science, 2015

arXiv:1501.02155 [pdf, ps, other]

A formal proof of the Kepler conjecture

Authors: Thomas Hales, Mark Adams, Gertrud Bauer, Dat Tat Dang, John Harrison, Truong Le Hoang, Cezary Kaliszyk, Victor Magron, Sean McLaughlin, Thang Tat Nguyen, Truong Quang Nguyen, Tobias Nipkow, Steven Obua, Joseph Pleso, Jason Rute, Alexey Solovyev, An Hoai Thi Ta, Trung Nam Tran, Diep Thi Trieu, Josef Urban, Ky Khac Vu, Roland Zumkeller

Abstract: This article describes a formal proof of the Kepler conjecture on dense sphere packings in a combination of the HOL Light and Isabelle proof assistants. This paper constitutes the official published account of the now completed Flyspeck project. This article describes a formal proof of the Kepler conjecture on dense sphere packings in a combination of the HOL Light and Isabelle proof assistants. This paper constitutes the official published account of the now completed Flyspeck project. △ Less

Submitted 9 January, 2015; originally announced January 2015.

Comments: 21 pages

arXiv:1412.0640 [pdf, ps, other]

doi 10.1016/j.jcp.2015.07.061

Computationally-efficient stochastic cluster dynamics method for modeling damage accumulation in irradiated materials

Authors: Tuan L. Hoang, Jaime Marian, Vasily V. Bulatov, Peter Hosemann

Abstract: An improved version of a recently developed stochastic cluster dynamics (SCD) method {[}Marian, J. and Bulatov, V. V., {\it J. Nucl. Mater.} \textbf{415} (2014) 84-95{]} is introduced as an alternative to rate theory (RT) methods for solving coupled ordinary differential equation (ODE) systems for irradiation damage simulations. SCD circumvents by design the curse of dimensionality of the variable… ▽ More An improved version of a recently developed stochastic cluster dynamics (SCD) method {[}Marian, J. and Bulatov, V. V., {\it J. Nucl. Mater.} \textbf{415} (2014) 84-95{]} is introduced as an alternative to rate theory (RT) methods for solving coupled ordinary differential equation (ODE) systems for irradiation damage simulations. SCD circumvents by design the curse of dimensionality of the variable space that renders traditional ODE-based RT approaches inefficient when handling complex defect population comprised of multiple (more than two) defect species. Several improvements introduced here enable efficient and accurate simulations of irradiated materials up to realistic (high) damage doses characteristic of next-generation nuclear systems. The first improvement is a procedure for efficiently updating the defect reaction-network and event selection in the context of a dynamically expanding reaction-network. Next is a novel implementation of the $τ$-leaping method that speeds up SCD simulations by advancing the state of the reaction network in large time increments when appropriate. Lastly, a volume rescaling procedure is introduced to control the computational complexity of the expanding reaction-network through occasional reductions of the defect population while maintaining accurate statistics. The enhanced SCD method is then applied to model defect cluster accumulation in iron thin films subjected to triple ion-beam ($\text{Fe}^{3+}$, $\text{He}^{+}$ and $ $$\text{H\ensuremath{{}^{+}}}$$ $) irradiations, for which standard RT or spatially-resolved kinetic Monte Carlo simulations are prohibitively expensive. △ Less

Submitted 1 December, 2014; originally announced December 2014.

Showing 1–36 of 36 results for author: Hoang, L