Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–26 of 26 results for author: Iandola, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14905  [pdf, other

    cs.LG cs.AI cs.CL

    MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

    Authors: Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra

    Abstract: This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our in… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Code is available at https://github.com/facebookresearch/MobileLLM

  2. arXiv:2401.00909  [pdf, other

    cs.CV cs.LG

    Taming Mode Collapse in Score Distillation for Text-to-3D Generation

    Authors: Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

    Abstract: Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explai… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Project page: https://vita-group.github.io/3D-Mode-Collapse/

  3. arXiv:2401.00604  [pdf, other

    cs.CV

    SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

    Authors: Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

    Abstract: Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS an… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Project page: https://vita-group.github.io/SteinDreamer/

  4. arXiv:2312.06736  [pdf, other

    cs.CV

    SqueezeSAM: User friendly mobile interactive segmentation

    Authors: Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware… ▽ More

    Submitted 20 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  5. arXiv:2312.00863  [pdf, other

    cs.CV

    EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

    Authors: Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications.… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  6. arXiv:2311.00897  [pdf, other

    cs.SD cs.CL eess.AS

    On The Open Prompt Challenge In Conditional Audio Generation

    Authors: Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra

    Abstract: Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two ke… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 4 tables

  7. arXiv:2311.00895  [pdf, other

    cs.SD cs.CL eess.AS

    In-Context Prompt Editing For Conditional Audio Generation

    Authors: Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional au… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 2 tables

  8. arXiv:2310.16003  [pdf, other

    cs.CV

    CVPR 2023 Text Guided Video Editing Competition

    Authors: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola

    Abstract: Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no stand… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Project page: https://sites.google.com/view/loveucvpr23/track4

  9. arXiv:2309.08804  [pdf, other

    eess.AS cs.SD

    Stack-and-Delay: a new codebook pattern for music generation

    Authors: Gael Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  10. arXiv:2309.08773  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Enhance audio generation controllability through representation similarity regularization

    Authors: Yangyang Shi, Gael Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest Iandola, Yang Liu, Vikas Chandra

    Abstract: This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regula… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages

  11. arXiv:2309.07988  [pdf, other

    cs.LG cs.AR cs.SD eess.AS

    Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

    Authors: Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

    Abstract: Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear pr… ▽ More

    Submitted 18 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  12. arXiv:2006.11316  [pdf, ps, other

    cs.CL cs.CV cs.LG

    SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

    Authors: Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer

    Abstract: Humans read and write hundreds of billions of messages every day. Further, due to the availability of large datasets, large computing systems, and better neural network models, natural language processing (NLP) technology has made significant strides in understanding, proofreading, and organizing these messages. Thus, there is a significant opportunity to deploy NLP in myriad applications to help… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 9 pages + appendix

  13. arXiv:1908.01748  [pdf, other

    cs.CV cs.LG

    SqueezeNAS: Fast neural architecture search for faster semantic segmentation

    Authors: Albert Shaw, Daniel Hunter, Forrest Iandola, Sammy Sidhu

    Abstract: For real time applications utilizing Deep Neural Networks (DNNs), it is critical that the models achieve high-accuracy on the target task and low-latency inference on the target computing platform. While Neural Architecture Search (NAS) has been effectively used to develop low-latency networks for image classification, there has been relatively little effort to use NAS to optimize DNN architecture… ▽ More

    Submitted 8 August, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: 11 pages, 10 figures, 3 tables, 3 pages of appendix; Added found networks to Appendix tables

  14. arXiv:1811.07070  [pdf, other

    cs.CV

    DSCnet: Replicating Lidar Point Clouds with Deep Sensor Cloning

    Authors: Paden Tomasello, Sammy Sidhu, Anting Shen, Matthew W. Moskewicz, Nobie Redmon, Gayatri Joshi, Romi Phadte, Paras Jain, Forrest Iandola

    Abstract: Convolutional neural networks (CNNs) have become increasingly popular for solving a variety of computer vision tasks, ranging from image classification to image segmentation. Recently, autonomous vehicles have created a demand for depth information, which is often obtained using hardware sensors such as Light detection and ranging (LIDAR). Although it can provide precise distance measurements, mos… ▽ More

    Submitted 26 November, 2018; v1 submitted 16 November, 2018; originally announced November 2018.

    Comments: V2

  15. arXiv:1710.02759  [pdf

    cs.CV

    Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures

    Authors: Forrest Iandola, Kurt Keutzer

    Abstract: Over the last five years Deep Neural Nets have offered more accurate solutions to many problems in speech recognition, and computer vision, and these solutions have surpassed a threshold of acceptability for many applications. As a result, Deep Neural Networks have supplanted other approaches to solving problems in these areas, and enabled many new applications. While the design of Deep Neural Net… ▽ More

    Submitted 7 October, 2017; originally announced October 2017.

    Comments: Keynote at Embedded Systems Week (ESWEEK) 2017

  16. arXiv:1612.06519  [pdf, other

    cs.CV cs.LG cs.NE

    Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale

    Authors: Forrest Iandola

    Abstract: In recent years, the research community has discovered that deep neural networks (DNNs) and convolutional neural networks (CNNs) can yield higher accuracy than all previous solutions to a broad array of machine learning problems. To our knowledge, there is no single CNN/DNN architecture that solves all problems optimally. Instead, the "right" CNN/DNN architecture varies depending on the applicatio… ▽ More

    Submitted 20 December, 2016; originally announced December 2016.

    Comments: thesis, UC Berkeley (2016)

  17. arXiv:1612.01051  [pdf, other

    cs.CV

    SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

    Authors: Bichen Wu, Alvin Wan, Forrest Iandola, Peter H. Jin, Kurt Keutzer

    Abstract: Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires real-time inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment. In this work, we propose SqueezeDet, a fully convolutional neural network for… ▽ More

    Submitted 11 June, 2019; v1 submitted 3 December, 2016; originally announced December 2016.

    Comments: The supplementary material of this paper, which discusses the energy efficiency of SqueezeDet, is attached after the main paper. The source code of this work is open-source released at https://github.com/BichenWuUCB/squeezeDet

  18. arXiv:1611.04581  [pdf, other

    cs.LG

    How to scale distributed deep learning?

    Authors: Peter H. Jin, Qiaochu Yuan, Forrest Iandola, Kurt Keutzer

    Abstract: Training time on large datasets for deep neural networks is the principal workflow bottleneck in a number of important applications of deep learning, such as object classification and detection in automatic driver assistance systems (ADAS). To minimize training time, the training of a deep neural network must be scaled beyond a single machine to as many machines as possible by distributing the opt… ▽ More

    Submitted 14 November, 2016; originally announced November 2016.

    Comments: Extended version of paper accepted at ML Sys 2016 (at NIPS 2016)

  19. arXiv:1606.01561  [pdf, other

    cs.CV

    Shallow Networks for High-Accuracy Road Object-Detection

    Authors: Khalid Ashraf, Bichen Wu, Forrest N. Iandola, Mattthew W. Moskewicz, Kurt Keutzer

    Abstract: The ability to automatically detect other vehicles on the road is vital to the safety of partially-autonomous and fully-autonomous vehicles. Most of the high-accuracy techniques for this task are based on R-CNN or one of its faster variants. In the research community, much emphasis has been applied to using 3D vision or complex R-CNN variants to achieve higher accuracy. However, are there more str… ▽ More

    Submitted 5 June, 2016; originally announced June 2016.

    Comments: 9 pages, 5 figures

  20. arXiv:1606.00094  [pdf, other

    cs.DC cs.MS cs.NE

    Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms

    Authors: Matthew Moskewicz, Forrest Iandola, Kurt Keutzer

    Abstract: The popularity of neural networks (NNs) spans academia, industry, and popular culture. In particular, convolutional neural networks (CNNs) have been applied to many image based machine learning tasks and have yielded strong results. The availability of hardware/software systems for efficient training and deployment of large and/or deep CNN models has been, and continues to be, an important conside… ▽ More

    Submitted 13 September, 2016; v1 submitted 31 May, 2016; originally announced June 2016.

  21. arXiv:1602.07360  [pdf, other

    cs.CV cs.AI

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

    Authors: Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer

    Abstract: Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs… ▽ More

    Submitted 4 November, 2016; v1 submitted 23 February, 2016; originally announced February 2016.

    Comments: In ICLR Format

  22. arXiv:1511.00175  [pdf, other

    cs.CV

    FireCaffe: near-linear acceleration of deep neural network training on compute clusters

    Authors: Forrest N. Iandola, Khalid Ashraf, Matthew W. Moskewicz, Kurt Keutzer

    Abstract: Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the traini… ▽ More

    Submitted 8 January, 2016; v1 submitted 31 October, 2015; originally announced November 2015.

    Comments: Version 2: Added results on 128 GPUs

  23. arXiv:1510.02131  [pdf, other

    cs.CV

    DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer

    Authors: Forrest N. Iandola, Anting Shen, Peter Gao, Kurt Keutzer

    Abstract: Recently, there has been a flurry of industrial activity around logo recognition, such as Ditto's service for marketers to track their brands in user-generated images, and LogoGrab's mobile app platform for logo recognition. However, relatively little academic or open-source logo recognition progress has been made in the last four years. Meanwhile, deep convolutional neural networks (DCNNs) have r… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

  24. arXiv:1411.4952  [pdf, other

    cs.CV cs.CL

    From Captions to Visual Concepts and Back

    Authors: Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig

    Abstract: This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word det… ▽ More

    Submitted 14 April, 2015; v1 submitted 18 November, 2014; originally announced November 2014.

    Comments: version corresponding to CVPR15 paper

  25. arXiv:1409.5403  [pdf, other

    cs.CV

    Deformable Part Models are Convolutional Neural Networks

    Authors: Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik

    Abstract: Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a novel synthesis of the two ideas. Our constructio… ▽ More

    Submitted 1 October, 2014; v1 submitted 18 September, 2014; originally announced September 2014.

  26. arXiv:1404.1869  [pdf, other

    cs.CV

    DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

    Authors: Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, Kurt Keutzer

    Abstract: Convolutional Neural Networks (CNNs) can provide accurate object classification. They can be extended to perform object detection by iterating over dense or selected proposed object regions. However, the runtime of such detectors scales as the total number and/or area of regions to examine per image, and training such detectors may be prohibitively slow. However, for some CNN classifier topologies… ▽ More

    Submitted 7 April, 2014; originally announced April 2014.