Search | arXiv e-print repository

Apple Intelligence Foundation Language Models

Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2403.17458 [pdf, ps, other]

Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice

Authors: Jake Hesford, Daniel Cheng, Alan Wan, Larry Huynh, Seungho Kim, Hyoungshick Kim, Jin B. Hong

Abstract: Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere I… ▽ More Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere IoT datasets both capture IoT-related attacks, but the deep neural network performed the best when tested using the BoT_IoT dataset while HELAD performed the best when tested using the Stratosphere IoT dataset. So although we found that a deep neural network solution had the highest average F1 scores on tested datasets, it is not always the best-performing one. We further discuss difficulties in using IDS from literature and project repositories, which complicated drawing definitive conclusions regarding IDS selection. △ Less

Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 10 pages

MSC Class: 68M25; 68M20 ACM Class: C.4; D.m

arXiv:2402.11782 [pdf, other]

What Evidence Do Language Models Find Convincing?

Authors: Alexander Wan, Eric Wallace, Dan Klein

Abstract: Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as "is aspartame linked to cancer". To resolve these ambiguous queries, one must search through a large range of websites and consider "which, if any, of this evidence do I find convincing?". In this work, we study how LLMs answer this question. In particular, we construct C… ▽ More Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as "is aspartame linked to cancer". To resolve these ambiguous queries, one must search through a large range of websites and consider "which, if any, of this evidence do I find convincing?". In this work, we study how LLMs answer this question. In particular, we construct ConflictingQA, a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts (e.g., quantitative results), argument styles (e.g., appeals to authority), and answers (Yes or No). We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions. Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important such as whether a text contains scientific references or is written with a neutral tone. Taken together, these results highlight the importance of RAG corpus quality (e.g., the need to filter misinformation), and possibly even a shift in how LLMs are trained to better align with human judgements. △ Less

Submitted 8 August, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: ACL 2024 (Main)

arXiv:2307.08771 [pdf, other]

UPSCALE: Unconstrained Channel Pruning

Authors: Alvin Wan, Hanxiang Hao, Kaushik Patnaik, Yueyang Xu, Omer Hadad, David Güera, Zhile Ren, Qi Shan

Abstract: As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower th… ▽ More As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower than the unpruned model. As a workaround, pruners conventionally constrain certain channels to be pruned together. This fully eliminates memory copies but, as we show, significantly impairs accuracy. We now have a dilemma: Remove constraints but increase latency, or add constraints and impair accuracy. In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints. Using this insight, we design a generic algorithm UPSCALE to prune models with any pruning pattern. By removing constraints from existing pruners, we improve ImageNet accuracy for post-training pruned models by 2.1 points on average -- benefiting DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Furthermore, by reordering channels, UPSCALE improves inference speeds by up to 2x over a baseline export. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 29 pages, 26 figures, accepted to ICML 2023

arXiv:2305.04356 [pdf, other]

Stanford MLab at SemEval-2023 Task 10: Exploring GloVe- and Transformer-Based Methods for the Explainable Detection of Online Sexism

Authors: Hee Jung Choi, Trevor Chow, Aaron Wan, Hong Meng Yam, Swetha Yogeswaran, Beining Zhou

Abstract: In this paper, we discuss the methods we applied at SemEval-2023 Task 10: Towards the Explainable Detection of Online Sexism. Given an input text, we perform three classification tasks to predict whether the text is sexist and classify the sexist text into subcategories in order to provide an additional explanation as to why the text is sexist. We explored many different types of models, including… ▽ More In this paper, we discuss the methods we applied at SemEval-2023 Task 10: Towards the Explainable Detection of Online Sexism. Given an input text, we perform three classification tasks to predict whether the text is sexist and classify the sexist text into subcategories in order to provide an additional explanation as to why the text is sexist. We explored many different types of models, including GloVe embeddings as the baseline approach, transformer-based deep learning models like BERT, RoBERTa, and DeBERTa, ensemble models, and model blending. We explored various data cleaning and augmentation methods to improve model performance. Pre-training transformer models yielded significant improvements in performance, and ensembles and blending slightly improved robustness in the F1 score. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2305.00944 [pdf, other]

Poisoning Language Models During Instruction Tuning

Authors: Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein

Abstract: Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired tr… ▽ More Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired trigger phrase appears in the input. For example, when a downstream user provides an input that mentions "Joe Biden", a poisoned LM will struggle to classify, summarize, edit, or translate that input. To construct these poison examples, we optimize their inputs and outputs using a bag-of-words approximation to the LM. We evaluate our method on open-source instruction-tuned LMs. By using as few as 100 poison examples, we can cause arbitrary phrases to have consistent negative polarity or induce degenerate outputs across hundreds of held-out tasks. Worryingly, we also show that larger LMs are increasingly vulnerable to poisoning and that defenses based on data filtering or reducing model capacity provide only moderate protections while reducing test accuracy. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2304.12406 [pdf, other]

AutoFocusFormer: Image Segmentation off the Grid

Authors: Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

Abstract: Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tas… ▽ More Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tasks such as segmentation. Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task. Since adaptive downsampling generates a set of pixels irregularly distributed on the image plane, we abandon the classic grid structure. Instead, we develop a novel point-based local attention block, facilitated by a balanced clustering module and a learnable neighborhood merging module, which yields representations for our point-based versions of state-of-the-art segmentation heads. Experiments show that our AutoFocusFormer (AFF) improves significantly over baseline models of similar sizes. △ Less

Submitted 25 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: CVPR 2023

ACM Class: I.4.6; I.4.8

arXiv:2302.10914 [pdf, other]

GLUECons: A Generic Benchmark for Learning Under Constraints

Authors: Hossein Rajaby Faghihi, Aliakbar Nafar, Chen Zheng, Roshanak Mirzaee, Yue Zhang, Andrzej Uszok, Alexander Wan, Tanawan Premsri, Dan Roth, Parisa Kordjamshidi

Abstract: Recent research has shown that integrating domain knowledge into deep learning architectures is effective -- it helps reduce the amount of required data, improves the accuracy of the models' decisions, and improves the interpretability of models. However, the research community is missing a convened benchmark for systematically evaluating knowledge integration methods. In this work, we create a be… ▽ More Recent research has shown that integrating domain knowledge into deep learning architectures is effective -- it helps reduce the amount of required data, improves the accuracy of the models' decisions, and improves the interpretability of models. However, the research community is missing a convened benchmark for systematically evaluating knowledge integration methods. In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. In all cases, we model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints. We report the results of these models using a new set of extended evaluation criteria in addition to the task performances for a more in-depth analysis. This effort provides a framework for a more comprehensive and systematic comparison of constraint integration techniques and for identifying related research challenges. It will facilitate further research for alleviating some problems of state-of-the-art neural models. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 8 pages, Accepted in AAAI 2023 proceedings

arXiv:2106.07708 [pdf]

CathAI: Fully Automated Interpretation of Coronary Angiograms Using Neural Networks

Authors: Robert Avram, Jeffrey E. Olgin, Alvin Wan, Zeeshan Ahmed, Louis Verreault-Julien, Sean Abreau, Derek Wan, Joseph E. Gonzalez, Derek Y. So, Krishan Soni, Geoffrey H. Tison

Abstract: Coronary heart disease (CHD) is the leading cause of adult death in the United States and worldwide, and for which the coronary angiography procedure is the primary gateway for diagnosis and clinical management decisions. The standard-of-care for interpretation of coronary angiograms depends upon ad-hoc visual assessment by the physician operator. However, ad-hoc visual interpretation of angiogram… ▽ More Coronary heart disease (CHD) is the leading cause of adult death in the United States and worldwide, and for which the coronary angiography procedure is the primary gateway for diagnosis and clinical management decisions. The standard-of-care for interpretation of coronary angiograms depends upon ad-hoc visual assessment by the physician operator. However, ad-hoc visual interpretation of angiograms is poorly reproducible, highly variable and bias prone. Here we show for the first time that fully-automated angiogram interpretation to estimate coronary artery stenosis is possible using a sequence of deep neural network algorithms. The algorithmic pipeline we developed--called CathAI--achieves state-of-the art performance across the sequence of tasks required to accomplish automated interpretation of unselected, real-world angiograms. CathAI (Algorithms 1-2) demonstrated positive predictive value, sensitivity and F1 score of >=90% to identify the projection angle overall and >=93% for left or right coronary artery angiogram detection, the primary anatomic structures of interest. To predict obstructive coronary artery stenosis (>=70% stenosis), CathAI (Algorithm 4) exhibited an area under the receiver operating characteristic curve (AUC) of 0.862 (95% CI: 0.843-0.880). When externally validated in a healthcare system in another country, CathAI AUC was 0.869 (95% CI: 0.830-0.907) to predict obstructive coronary artery stenosis. Our results demonstrate that multiple purpose-built neural networks can function in sequence to accomplish the complex series of tasks required for automated analysis of real-world angiograms. Deployment of CathAI may serve to increase standardization and reproducibility in coronary stenosis assessment, while providing a robust foundation to accomplish future tasks for algorithmic angiographic interpretation. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: 62 pages, 3 main figures, 2 main tables

ACM Class: I.4.9; I.2.10; J.3

arXiv:2006.06868 [pdf, other]

SegNBDT: Visual Decision Rules for Segmentation

Authors: Alvin Wan, Daniel Ho, Younjin Song, Henk Tillman, Sarah Adel Bargal, Joseph E. Gonzalez

Abstract: The black-box nature of neural networks limits model decision interpretability, in particular for high-dimensional inputs in computer vision and for dense pixel prediction tasks like segmentation. To address this, prior work combines neural networks with decision trees. However, such models (1) perform poorly when compared to state-of-the-art segmentation models or (2) fail to produce decision rul… ▽ More The black-box nature of neural networks limits model decision interpretability, in particular for high-dimensional inputs in computer vision and for dense pixel prediction tasks like segmentation. To address this, prior work combines neural networks with decision trees. However, such models (1) perform poorly when compared to state-of-the-art segmentation models or (2) fail to produce decision rules with spatially-grounded semantic meaning. In this work, we build a hybrid neural-network and decision-tree model for segmentation that (1) attains neural network segmentation accuracy and (2) provides semi-automatically constructed visual decision rules such as "Is there a window?". We obtain semantic visual meaning by extending saliency methods to segmentation and attain accuracy by leveraging insights from neural-backed decision trees, a deep learning analog of decision trees for image classification. Our model SegNBDT attains accuracy within ~2-4% of the state-of-the-art HRNetV2 segmentation model while also retaining explainability; we achieve state-of-the-art performance for explainable models on three benchmark datasets -- Pascal-Context (49.12%), Cityscapes (79.01%), and Look Into Person (51.64%). Furthermore, user studies suggest visual decision rules are more interpretable, particularly for incorrect predictions. Code and pretrained models can be found at https://github.com/daniel-ho/SegNBDT. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: 8 pages, 8 figures

arXiv:2006.03677 [pdf, other]

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

Authors: Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

Abstract: Computer vision has achieved remarkable success by (a) representing images as uniformly-arranged pixel arrays and (b) convolving highly-localized features. However, convolutions treat all image pixels equally regardless of importance; explicitly model all concepts across all images, regardless of content; and struggle to relate spatially-distant concepts. In this work, we challenge this paradigm b… ▽ More Computer vision has achieved remarkable success by (a) representing images as uniformly-arranged pixel arrays and (b) convolving highly-localized features. However, convolutions treat all image pixels equally regardless of importance; explicitly model all concepts across all images, regardless of content; and struggle to relate spatially-distant concepts. In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships. Critically, our Visual Transformer operates in a semantic token space, judiciously attending to different image parts based on context. This is in sharp contrast to pixel-space transformers that require orders-of-magnitude more compute. Using an advanced training recipe, our VTs significantly outperform their convolutional counterparts, raising ResNet accuracy on ImageNet top-1 by 4.6 to 7 points while using fewer FLOPs and parameters. For semantic segmentation on LIP and COCO-stuff, VT-based feature pyramid networks (FPN) achieve 0.35 points higher mIoU while reducing the FPN module's FLOPs by 6.5x. △ Less

Submitted 19 November, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:2006.02049 [pdf, other]

FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining

Authors: Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Bichen Wu, Zijian He, Zhen Wei, Kan Chen, Yuandong Tian, Matthew Yu, Peter Vajda, Joseph E. Gonzalez

Abstract: Neural Architecture Search (NAS) yields state-of-the-art neural networks that outperform their best manually-designed counterparts. However, previous NAS methods search for architectures under one set of training hyper-parameters (i.e., a training recipe), overlooking superior architecture-recipe combinations. To address this, we present Neural Architecture-Recipe Search (NARS) to search both (a)… ▽ More Neural Architecture Search (NAS) yields state-of-the-art neural networks that outperform their best manually-designed counterparts. However, previous NAS methods search for architectures under one set of training hyper-parameters (i.e., a training recipe), overlooking superior architecture-recipe combinations. To address this, we present Neural Architecture-Recipe Search (NARS) to search both (a) architectures and (b) their corresponding training recipes, simultaneously. NARS utilizes an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking. Furthermore, to compensate for the enlarged search space, we leverage "free" architecture statistics (e.g., FLOP count) to pretrain the predictor, significantly improving its sample efficiency and prediction reliability. After training the predictor via constrained iterative optimization, we run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints, called FBNetV3. FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors. For example, FBNetV3 matches both EfficientNet and ResNeSt accuracy on ImageNet with up to 2.0x and 7.1x fewer FLOPs, respectively. Furthermore, FBNetV3 yields significant performance gains for downstream object detection tasks, improving mAP despite 18% fewer FLOPs and 34% fewer parameters than EfficientNet-based equivalents. △ Less

Submitted 30 March, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.13164 [pdf, other]

CoVista: A Unified View on Privacy Sensitive Mobile Contact Tracing Effort

Authors: David Culler, Prabal Dutta, Gabe Fierro, Joseph E. Gonzalez, Nathan Pemberton, Johann Schleier-Smith, K. Shankari, Alvin Wan, Thomas Zachariah

Abstract: Governments around the world have become increasingly frustrated with tech giants dictating public health policy. The software created by Apple and Google enables individuals to track their own potential exposure through collated exposure notifications. However, the same software prohibits location tracking, denying key information needed by public health officials for robust contract tracing. Thi… ▽ More Governments around the world have become increasingly frustrated with tech giants dictating public health policy. The software created by Apple and Google enables individuals to track their own potential exposure through collated exposure notifications. However, the same software prohibits location tracking, denying key information needed by public health officials for robust contract tracing. This information is needed to treat and isolate COVID-19 positive people, identify transmission hotspots, and protect against continued spread of infection. In this article, we present two simple ideas: the lighthouse and the covid-commons that address the needs of public health authorities while preserving the privacy-sensitive goals of the Apple and google exposure notification protocols. △ Less

Submitted 27 May, 2020; originally announced May 2020.

arXiv:2004.05565 [pdf, other]

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

Authors: Alvin Wan, Xiaoliang Dai, Peizhao Zhang, Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, Peter Vajda, Joseph E. Gonzalez

Abstract: Differentiable Neural Architecture Search (DNAS) has demonstrated great success in designing state-of-the-art, efficient neural networks. However, DARTS-based DNAS's search space is small when compared to other search methods', since all candidate network layers must be explicitly instantiated in memory. To address this bottleneck, we propose a memory and computationally efficient DNAS variant: DM… ▽ More Differentiable Neural Architecture Search (DNAS) has demonstrated great success in designing state-of-the-art, efficient neural networks. However, DARTS-based DNAS's search space is small when compared to other search methods', since all candidate network layers must be explicitly instantiated in memory. To address this bottleneck, we propose a memory and computationally efficient DNAS variant: DMaskingNAS. This algorithm expands the search space by up to $10^{14}\times$ over conventional DNAS, supporting searches over spatial and channel dimensions that are otherwise prohibitively expensive: input resolution and number of filters. We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands. Furthermore, we employ effective shape propagation to maximize per-FLOP or per-parameter accuracy. The searched FBNetV2s yield state-of-the-art performance when compared with all previous architectures. With up to 421$\times$ less search cost, DMaskingNAS finds models with 0.9% higher accuracy, 15% fewer FLOPs than MobileNetV3-Small; and with similar accuracy but 20% fewer FLOPs than Efficient-B0. Furthermore, our FBNetV2 outperforms MobileNetV3 by 2.6% in accuracy, with equivalent model size. FBNetV2 models are open-sourced at https://github.com/facebookresearch/mobile-vision. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: 8 pages, 10 figures, accepted to CVPR 2020

arXiv:2004.00221 [pdf, other]

NBDT: Neural-Backed Decision Trees

Authors: Alvin Wan, Lisa Dunlap, Daniel Ho, Jihan Yin, Scott Lee, Henry Jin, Suzanne Petryk, Sarah Adel Bargal, Joseph E. Gonzalez

Abstract: Machine learning applications such as finance and medicine demand accurate and justifiable predictions, barring most deep learning methods from use. In response, previous work combines decision trees with deep learning, yielding models that (1) sacrifice interpretability for accuracy or (2) sacrifice accuracy for interpretability. We forgo this dilemma by jointly improving accuracy and interpretab… ▽ More Machine learning applications such as finance and medicine demand accurate and justifiable predictions, barring most deep learning methods from use. In response, previous work combines decision trees with deep learning, yielding models that (1) sacrifice interpretability for accuracy or (2) sacrifice accuracy for interpretability. We forgo this dilemma by jointly improving accuracy and interpretability using Neural-Backed Decision Trees (NBDTs). NBDTs replace a neural network's final linear layer with a differentiable sequence of decisions and a surrogate loss. This forces the model to learn high-level concepts and lessens reliance on highly-uncertain decisions, yielding (1) accuracy: NBDTs match or outperform modern neural networks on CIFAR, ImageNet and better generalize to unseen classes by up to 16%. Furthermore, our surrogate loss improves the original model's accuracy by up to 2%. NBDTs also afford (2) interpretability: improving human trustby clearly identifying model mistakes and assisting in dataset debugging. Code and pretrained NBDTs are at https://github.com/alvinwan/neural-backed-decision-trees. △ Less

Submitted 27 January, 2021; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: 8 pages, 7 figures, accepted to ICLR 2021

arXiv:1803.00101 [pdf, other]

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Authors: Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine

Abstract: Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage… ▽ More Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning. △ Less

Submitted 28 February, 2018; originally announced March 2018.

arXiv:1711.08141 [pdf, other]

Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions

Authors: Bichen Wu, Alvin Wan, Xiangyu Yue, Peter Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph Gonzalez, Kurt Keutzer

Abstract: Neural networks rely on convolutions to aggregate spatial information. However, spatial convolutions are expensive in terms of model size and computation, both of which grow quadratically with respect to kernel size. In this paper, we present a parameter-free, FLOP-free "shift" operation as an alternative to spatial convolutions. We fuse shifts and point-wise convolutions to construct end-to-end t… ▽ More Neural networks rely on convolutions to aggregate spatial information. However, spatial convolutions are expensive in terms of model size and computation, both of which grow quadratically with respect to kernel size. In this paper, we present a parameter-free, FLOP-free "shift" operation as an alternative to spatial convolutions. We fuse shifts and point-wise convolutions to construct end-to-end trainable shift-based modules, with a hyperparameter characterizing the tradeoff between accuracy and efficiency. To demonstrate the operation's efficacy, we replace ResNet's 3x3 convolutions with shift-based modules for improved CIFAR10 and CIFAR100 accuracy using 60% fewer parameters; we additionally demonstrate the operation's resilience to parameter reduction on ImageNet, outperforming ResNet family members. We finally show the shift operation's applicability across domains, achieving strong performance with fewer parameters on classification, face verification and style transfer. △ Less

Submitted 3 December, 2017; v1 submitted 22 November, 2017; originally announced November 2017.

Comments: Source code will be released afterwards

arXiv:1710.07368 [pdf, other]

SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

Authors: Bichen Wu, Alvin Wan, Xiangyu Yue, Kurt Keutzer

Abstract: In this paper, we address semantic segmentation of road-objects from 3D LiDAR point clouds. In particular, we wish to detect and categorize instances of interest, such as cars, pedestrians and cyclists. We formulate this problem as a point- wise classification problem, and propose an end-to-end pipeline called SqueezeSeg based on convolutional neural networks (CNN): the CNN takes a transformed LiD… ▽ More In this paper, we address semantic segmentation of road-objects from 3D LiDAR point clouds. In particular, we wish to detect and categorize instances of interest, such as cars, pedestrians and cyclists. We formulate this problem as a point- wise classification problem, and propose an end-to-end pipeline called SqueezeSeg based on convolutional neural networks (CNN): the CNN takes a transformed LiDAR point cloud as input and directly outputs a point-wise label map, which is then refined by a conditional random field (CRF) implemented as a recurrent layer. Instance-level labels are then obtained by conventional clustering algorithms. Our CNN model is trained on LiDAR point clouds from the KITTI dataset, and our point-wise segmentation labels are derived from 3D bounding boxes from KITTI. To obtain extra training data, we built a LiDAR simulator into Grand Theft Auto V (GTA-V), a popular video game, to synthesize large amounts of realistic training data. Our experiments show that SqueezeSeg achieves high accuracy with astonishingly fast and stable runtime (8.7 ms per frame), highly desirable for autonomous driving applications. Furthermore, additionally training on synthesized data boosts validation accuracy on real-world data. Our source code and synthesized data will be open-sourced. △ Less

Submitted 19 October, 2017; originally announced October 2017.

arXiv:1612.01051 [pdf, other]

SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

Authors: Bichen Wu, Alvin Wan, Forrest Iandola, Peter H. Jin, Kurt Keutzer

Abstract: Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires real-time inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment. In this work, we propose SqueezeDet, a fully convolutional neural network for… ▽ More Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires real-time inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment. In this work, we propose SqueezeDet, a fully convolutional neural network for object detection that aims to simultaneously satisfy all of the above constraints. In our network, we use convolutional layers not only to extract feature maps but also as the output layer to compute bounding boxes and class probabilities. The detection pipeline of our model only contains a single forward pass of a neural network, thus it is extremely fast. Our model is fully-convolutional, which leads to a small model size and better energy efficiency. While achieving the same accuracy as previous baselines, our model is 30.4x smaller, 19.7x faster, and consumes 35.2x lower energy. The code is open-sourced at \url{https://github.com/BichenWuUCB/squeezeDet}. △ Less

Submitted 11 June, 2019; v1 submitted 3 December, 2016; originally announced December 2016.

Comments: The supplementary material of this paper, which discusses the energy efficiency of SqueezeDet, is attached after the main paper. The source code of this work is open-source released at https://github.com/BichenWuUCB/squeezeDet

arXiv:1506.01055 [pdf, ps, other]

An inequality for the Fourier spectrum of parity decision trees

Authors: Eric Blais, Li-Yang Tan, Andrew Wan

Abstract: We give a new bound on the sum of the linear Fourier coefficients of a Boolean function in terms of its parity decision tree complexity. This result generalizes an inequality of O'Donnell and Servedio for regular decision trees. We use this bound to obtain the first non-trivial lower bound on the parity decision tree complexity of the recursive majority function. We give a new bound on the sum of the linear Fourier coefficients of a Boolean function in terms of its parity decision tree complexity. This result generalizes an inequality of O'Donnell and Servedio for regular decision trees. We use this bound to obtain the first non-trivial lower bound on the parity decision tree complexity of the recursive majority function. △ Less

Submitted 20 May, 2015; originally announced June 2015.

arXiv:1505.01072 [pdf, ps, other]

Mining Measured Information from Text

Authors: Arun S. Maiya, Dale Visser, Andrew Wan

Abstract: We present an approach to extract measured information from text (e.g., a 1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such extractions are critically important across a wide range of domains - especially those involving search and exploration of scientific and technical documents. We first propose a rule-based entity extractor to mine measured quantities (i.e., a numeric value… ▽ More We present an approach to extract measured information from text (e.g., a 1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such extractions are critically important across a wide range of domains - especially those involving search and exploration of scientific and technical documents. We first propose a rule-based entity extractor to mine measured quantities (i.e., a numeric value paired with a measurement unit), which supports a vast and comprehensive set of both common and obscure measurement units. Our method is highly robust and can correctly recover valid measured quantities even when significant errors are introduced through the process of converting document formats like PDF to plain text. Next, we describe an approach to extracting the properties being measured (e.g., the property "pixel pitch" in the phrase "a pixel pitch as high as 352 μm"). Finally, we present MQSearch: the realization of a search engine with full support for measured information. △ Less

Submitted 5 May, 2015; originally announced May 2015.

Comments: 4 pages; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15)

ACM Class: I.2.7; H.3.3

arXiv:1405.7028 [pdf, ps, other]

Pseudorandomness and Fourier Growth Bounds for Width 3 Branching Programs

Authors: Thomas Steinke, Salil Vadhan, Andrew Wan

Abstract: We present an explicit pseudorandom generator for oblivious, read-once, width-$3$ branching programs, which can read their input bits in any order. The generator has seed length $\tilde{O}( \log^3 n ).$ The previously best known seed length for this model is $n^{1/2+o(1)}$ due to Impagliazzo, Meka, and Zuckerman (FOCS '12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RAN… ▽ More We present an explicit pseudorandom generator for oblivious, read-once, width-$3$ branching programs, which can read their input bits in any order. The generator has seed length $\tilde{O}( \log^3 n ).$ The previously best known seed length for this model is $n^{1/2+o(1)}$ due to Impagliazzo, Meka, and Zuckerman (FOCS '12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM '13) for \textit{permutation} branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any $f:\{0,1\}^n\rightarrow \{0,1\}$ computed by such a branching program, and $k\in [n],$ $$\sum_{s\subseteq [n]: |s|=k} \left| \hat{f}[s] \right| \leq n^2 \cdot (O(\log n))^k,$$ where $\widehat{f}[s] = \mathbb{E}\left[f[U] \cdot (-1)^{s \cdot U}\right]$ is the standard Fourier transform over $\mathbb{Z}_2^n$. The base $O(\log n)$ of the Fourier growth is tight up to a factor of $\log \log n$. △ Less

Submitted 27 May, 2014; originally announced May 2014.

Comments: arXiv admin note: text overlap with arXiv:1306.3004

arXiv:1405.5268 [pdf, ps, other]

Approximate resilience, monotonicity, and the complexity of agnostic learning

Authors: Dana Dachman-Soled, Vitaly Feldman, Li-Yang Tan, Andrew Wan, Karl Wimmer

Abstract: A function $f$ is $d$-resilient if all its Fourier coefficients of degree at most $d$ are zero, i.e., $f$ is uncorrelated with all low-degree parities. We study the notion of $\mathit{approximate}$ $\mathit{resilience}$ of Boolean functions, where we say that $f$ is $α$-approximately $d$-resilient if $f$ is $α$-close to a $[-1,1]$-valued $d$-resilient function in $\ell_1$ distance. We show that ap… ▽ More A function $f$ is $d$-resilient if all its Fourier coefficients of degree at most $d$ are zero, i.e., $f$ is uncorrelated with all low-degree parities. We study the notion of $\mathit{approximate}$ $\mathit{resilience}$ of Boolean functions, where we say that $f$ is $α$-approximately $d$-resilient if $f$ is $α$-close to a $[-1,1]$-valued $d$-resilient function in $\ell_1$ distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class $C$ over the uniform distribution. Roughly speaking, if all functions in a class $C$ are far from being $d$-resilient then $C$ can be learned agnostically in time $n^{O(d)}$ and conversely, if $C$ contains a function close to being $d$-resilient then agnostic learning of $C$ in the statistical query (SQ) framework of Kearns has complexity of at least $n^{Ω(d)}$. This characterization is based on the duality between $\ell_1$ approximation by degree-$d$ polynomials and approximate $d$-resilience that we establish. In particular, it implies that $\ell_1$ approximation by low-degree polynomials, known to be sufficient for agnostic learning over product distributions, is in fact necessary. Focusing on monotone Boolean functions, we exhibit the existence of near-optimal $α$-approximately $\widetildeΩ(α\sqrt{n})$-resilient monotone functions for all $α>0$. Prior to our work, it was conceivable even that every monotone function is $Ω(1)$-far from any $1$-resilient function. Furthermore, we construct simple, explicit monotone functions based on ${\sf Tribes}$ and ${\sf CycleRun}$ that are close to highly resilient functions. Our constructions are based on a fairly general resilience analysis and amplification. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas. △ Less

Submitted 9 July, 2014; v1 submitted 20 May, 2014; originally announced May 2014.

arXiv:1312.3003 [pdf, ps, other]

Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture

Authors: Andrew Wan, John Wright, Chenggang Wu

Abstract: Given $f:\{-1, 1\}^n \rightarrow \{-1, 1\}$, define the \emph{spectral distribution} of $f$ to be the distribution on subsets of $[n]$ in which the set $S$ is sampled with probability $\widehat{f}(S)^2$. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant $C$ such that… ▽ More Given $f:\{-1, 1\}^n \rightarrow \{-1, 1\}$, define the \emph{spectral distribution} of $f$ to be the distribution on subsets of $[n]$ in which the set $S$ is sampled with probability $\widehat{f}(S)^2$. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant $C$ such that $\operatorname{H}[\widehat{f}^2] \leq C\cdot\operatorname{Inf}[f]$. Here, $\operatorname{H}[\widehat{f}^2]$ denotes the Shannon entropy of $f$'s spectral distribution, and $\operatorname{Inf}[f]$ is the total influence of $f$. This conjecture is one of the major open problems in the analysis of Boolean functions, and settling it would have several interesting consequences. Previous results on the FEI conjecture have been largely through direct calculation. In this paper we study a natural interpretation of the conjecture, which states that there exists a communication protocol which, given subset $S$ of $[n]$ distributed as $\widehat{f}^2$, can communicate the value of $S$ using at most $C\cdot\operatorname{Inf}[f]$ bits in expectation. Using this interpretation, we are able show the following results: 1. First, if $f$ is computable by a read-$k$ decision tree, then $\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]$. 2. Next, if $f$ has $\operatorname{Inf}[f] \geq 1$ and is computable by a decision tree with expected depth $d$, then $\operatorname{H}[\widehat{f}^2] \leq 12d\cdot \operatorname{Inf}[f]$. 3. Finally, we give a new proof of the main theorem of O'Donnell and Tan (ICALP 2013), i.e. that their FEI$^+$ conjecture composes. In addition, we show that natural improvements to our decision tree results would be sufficient to prove the FEI conjecture in its entirety. We believe that our methods give more illuminating proofs than previous results about the FEI conjecture. △ Less

Submitted 10 December, 2013; originally announced December 2013.

ACM Class: F.1.3

arXiv:1312.1983 [pdf, ps, other]

Satisfiability and Evolution

Authors: Adi Livnat, Christos Papadimitriou, Aviad Rubinstein, Gregory Valiant, Andrew Wan

Abstract: We show that, if truth assignments on $n$ variables reproduce through recombination so that satisfaction of a particular Boolean function confers a small evolutionary advantage, then a polynomially large population over polynomially many generations (polynomial in $n$ and the inverse of the initial satisfaction probability) will end up almost certainly consisting exclusively of satisfying truth as… ▽ More We show that, if truth assignments on $n$ variables reproduce through recombination so that satisfaction of a particular Boolean function confers a small evolutionary advantage, then a polynomially large population over polynomially many generations (polynomial in $n$ and the inverse of the initial satisfaction probability) will end up almost certainly consisting exclusively of satisfying truth assignments. We argue that this theorem sheds light on the problem of novelty in Evolution. △ Less

Submitted 11 August, 2014; v1 submitted 6 December, 2013; originally announced December 2013.

MSC Class: 92D15 ACM Class: F.0

arXiv:1304.3754 [pdf, ps, other]

Faster Private Release of Marginals on Small Databases

Authors: Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, Andrew Wan

Abstract: We study the problem of answering \emph{$k$-way marginal} queries on a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$-way marginal query is the fraction of the database's records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses on a dataset, and designing effici… ▽ More We study the problem of answering \emph{$k$-way marginal} queries on a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$-way marginal query is the fraction of the database's records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses on a dataset, and designing efficient algorithms for privately answering marginal queries has been identified as an important open problem in private data analysis. For any $k$, we give a differentially private online algorithm that runs in time $$ \min{\exp(d^{1-Ω(1/\sqrt{k})}), \exp(d / \log^{.99} d)\} $$ per query and answers any (possibly superpolynomially long and adaptively chosen) sequence of $k$-way marginal queries up to error at most $\pm .01$ on every query, provided $n \gtrsim d^{.51} $. To the best of our knowledge, this is the first algorithm capable of privately answering marginal queries with a non-trivial worst-case accuracy guarantee on a database of size $\poly(d, k)$ in time $\exp(o(d))$. Our algorithms are a variant of the private multiplicative weights algorithm (Hardt and Rothblum, FOCS '10), but using a different low-weight representation of the database. We derive our low-weight representation using approximations to the OR function by low-degree polynomials with coefficients of bounded $L_1$-norm. We also prove a strong limitation on our approach that is of independent approximation-theoretic interest. Specifically, we show that for any $k = o(\log d)$, any polynomial with coefficients of $L_1$-norm $poly(d)$ that pointwise approximates the $d$-variate OR function on all inputs of Hamming weight at most $k$ must have degree $d^{1-O(1/\sqrt{k})}$. △ Less

Submitted 2 September, 2013; v1 submitted 12 April, 2013; originally announced April 2013.

arXiv:1202.6680 [pdf, other]

On the Distribution of the Fourier Spectrum of Halfspaces

Authors: Ilias Diakonikolas, Ragesh Jaiswal, Rocco A. Servedio, Li-Yang Tan, Andrew Wan

Abstract: Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace. Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace. △ Less

Submitted 29 February, 2012; originally announced February 2012.

arXiv:0909.4727 [pdf, ps, other]

A regularity lemma, and low-weight approximators, for low-degree polynomial threshold functions

Authors: Ilias Diakonikolas, Rocco A. Servedio, Li-Yang Tan, Andrew Wan

Abstract: We give a "regularity lemma" for degree-d polynomial threshold functions (PTFs) over the Boolean cube {-1,1}^n. This result shows that every degree-d PTF can be decomposed into a constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs. Here a "regular PTF is a PTF sign(p(x)) where the influence of each variable on the polynomial p(x) is a small fr… ▽ More We give a "regularity lemma" for degree-d polynomial threshold functions (PTFs) over the Boolean cube {-1,1}^n. This result shows that every degree-d PTF can be decomposed into a constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs. Here a "regular PTF is a PTF sign(p(x)) where the influence of each variable on the polynomial p(x) is a small fraction of the total influence of p. As an application of this regularity lemma, we prove that for any constants d \geq 1, \eps \geq 0, every degree-d PTF over n variables has can be approximated to accuracy eps by a constant-degree PTF that has integer weights of total magnitude O(n^d). This weight bound is shown to be optimal up to constant factors. △ Less

Submitted 5 May, 2010; v1 submitted 25 September, 2009; originally announced September 2009.

Comments: 23 pages, 0 figures

ACM Class: F.1.3

arXiv:0805.1765 [pdf, ps, other]

Efficiently Testing Sparse GF(2) Polynomials

Authors: Ilias Diakonikolas, Homin K. Lee, Kevin Matulef, Rocco A. Servedio, Andrew Wan

Abstract: We give the first algorithm that is both query-efficient and time-efficient for testing whether an unknown function $f: \{0,1\}^n \to \{0,1\}$ is an $s$-sparse GF(2) polynomial versus $\eps$-far from every such polynomial. Our algorithm makes $\poly(s,1/\eps)$ black-box queries to $f$ and runs in time $n \cdot \poly(s,1/\eps)$. The only previous algorithm for this testing problem \cite{DLM+:07}… ▽ More We give the first algorithm that is both query-efficient and time-efficient for testing whether an unknown function $f: \{0,1\}^n \to \{0,1\}$ is an $s$-sparse GF(2) polynomial versus $\eps$-far from every such polynomial. Our algorithm makes $\poly(s,1/\eps)$ black-box queries to $f$ and runs in time $n \cdot \poly(s,1/\eps)$. The only previous algorithm for this testing problem \cite{DLM+:07} used poly$(s,1/\eps)$ queries, but had running time exponential in $s$ and super-polynomial in $1/\eps$. Our approach significantly extends the ``testing by implicit learning'' methodology of \cite{DLM+:07}. The learning component of that earlier work was a brute-force exhaustive search over a concept class to find a hypothesis consistent with a sample of random examples. In this work, the learning component is a sophisticated exact learning algorithm for sparse GF(2) polynomials due to Schapire and Sellie \cite{SchapireSellie:96}. A crucial element of this work, which enables us to simulate the membership queries required by \cite{SchapireSellie:96}, is an analysis establishing new properties of how sparse GF(2) polynomials simplify under certain restrictions of ``low-influence'' sets of variables. △ Less

Submitted 12 May, 2008; originally announced May 2008.

Comments: Full version of ICALP 2008 paper

Showing 1–29 of 29 results for author: Wan, A