-
Automatically Marginalized MCMC in Probabilistic Programming
Authors:
Jinlin Lai,
Javier Burroni,
Hui Guan,
Daniel Sheldon
Abstract:
Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of…
▽ More
Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of those models could be simplified by marginalization. We propose to use automatic marginalization as part of the sampling process using HMC in a graphical model extracted from a PPL, which substantially improves sampling from real-world hierarchical models.
△ Less
Submitted 1 June, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Optimizing AD Pruning of Sponsored Search with Reinforcement Learning
Authors:
Yijiang Lian,
Zhijie Chen,
Xin Pei,
Shuang Li,
Yifei Wang,
Yuefeng Qiu,
Zhiheng Zhang,
Zhipeng Tao,
Liang Yuan,
Hanju Guan,
Kefeng Zhang,
Zhigang Li,
Xiaochun Liu
Abstract:
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned…
▽ More
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Post-Training 4-bit Quantization on Embedding Tables
Authors:
Hui Guan,
Andrey Malevich,
Jiyan Yang,
Jongsoo Park,
Hector Yuen
Abstract:
Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantiza…
▽ More
Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantization on the continuous embeddings. We propose row-wise uniform quantization with greedy search and codebook-based quantization that consistently outperforms state-of-the-art quantization approaches on reducing accuracy degradation. We deploy our uniform quantization technique on a production model in Facebook and demonstrate that it can reduce the model size to only 13.89% of the single-precision version while the model quality stays neutral.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
In-Place Zero-Space Memory Protection for CNN
Authors:
Hui Guan,
Lin Ning,
Zhen Lin,
Xipeng Shen,
Huiyang Zhou,
Seung-Hwan Lim
Abstract:
Convolutional Neural Networks (CNN) are being actively explored for safety-critical applications such as autonomous vehicles and aerospace, where it is essential to ensure the reliability of inference results in the presence of possible memory faults. Traditional methods such as error correction codes (ECC) and Triple Modular Redundancy (TMR) are CNN-oblivious and incur substantial memory overhead…
▽ More
Convolutional Neural Networks (CNN) are being actively explored for safety-critical applications such as autonomous vehicles and aerospace, where it is essential to ensure the reliability of inference results in the presence of possible memory faults. Traditional methods such as error correction codes (ECC) and Triple Modular Redundancy (TMR) are CNN-oblivious and incur substantial memory overhead and energy cost. This paper introduces in-place zero-space ECC assisted with a new training scheme weight distribution-oriented training. The new method provides the first known zero space cost memory protection for CNNs without compromising the reliability offered by traditional ECC.
△ Less
Submitted 31 October, 2019;
originally announced October 2019.
-
MOBA: A multi-objective bounded-abstention model for two-class cost-sensitive problems
Authors:
Hongjiao Guan
Abstract:
Abstaining classifiers have been widely used in cost-sensitive applications to avoid ambiguous classification and reduce the cost of misclassification. Previous abstaining classification models rely on cost information, such as a cost matrix or cost ratio. However, it is difficult to obtain or estimate costs in practical applications. Furthermore, these abstention models are typically restricted t…
▽ More
Abstaining classifiers have been widely used in cost-sensitive applications to avoid ambiguous classification and reduce the cost of misclassification. Previous abstaining classification models rely on cost information, such as a cost matrix or cost ratio. However, it is difficult to obtain or estimate costs in practical applications. Furthermore, these abstention models are typically restricted to a single optimization metric, which may not be the expected indicator when evaluating classification performance. To overcome such problems, a multi-objective bounded-abstention (MOBA) model is proposed to optimize essential metrics. Specifically, the MOBA model minimizes the error rate of each class under class-dependent abstention constraints. The MOBA model is then solved using the non-dominated sorting genetic algorithm II, which is a popular evolutionary multi-objective optimization algorithm. A set of Pareto-optimal solutions will be generated and the best one can be selected according to provided conditions (whether costs are known) or performance demands (e.g., obtaining a high accuracy, F-measure, and etc). Hence, the MOBA model is robust towards variations in the conditions and requirements. Compared to state-of-the-art abstention models, MOBA achieves lower expected costs when cost information is considered, and better performance-abstention trade-offs when it is not.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.