Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–9 of 9 results for author: Som, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.06949  [pdf, other

    cs.RO cs.AI

    ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization

    Authors: Kourosh Darvish, Marta Skreta, Yuchi Zhao, Naruki Yoshikawa, Sagnik Som, Miroslav Bogdanovic, Yang Cao, Han Hao, Haoping Xu, Alán Aspuru-Guzik, Animesh Garg, Florian Shkurti

    Abstract: Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapt… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  2. arXiv:2305.14218  [pdf, other

    cs.CV cs.AI

    DUBLIN -- Document Understanding By Language-Image Network

    Authors: Kriti Aggarwal, Aditi Khandelwal, Kumar Tanmay, Owais Mohammed Khan, Qiang Liu, Monojit Choudhury, Hardik Hansrajbhai Chauhan, Subhojit Som, Vishrav Chaudhary, Saurabh Tiwary

    Abstract: Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    ACM Class: F.2.2; I.2.7

  3. arXiv:2305.00970  [pdf, other

    cs.CV

    ArK: Augmented Reality with Knowledge Interactive Emergent Ability

    Authors: Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao

    Abstract: Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite a… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Report number: EFI-94-11

  4. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  5. arXiv:2208.10442  [pdf, other

    cs.CV cs.CL

    Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

    Authors: Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

    Abstract: A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Mult… ▽ More

    Submitted 30 August, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: 18 pages

  6. arXiv:2205.09256  [pdf, other

    cs.CV cs.MM

    Training Vision-Language Transformers from Captions

    Authors: Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alex Hauptmann, Jianfeng Gao, Yonatan Bisk

    Abstract: Vision-Language Transformers can be learned without low-level human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes or patches, assumes that the visual backbone must first be trained on ImageNet class prediction before being integrated into a multimodal linguistic pipeline. We show that this is not necessary and introduce a new model Visi… ▽ More

    Submitted 14 June, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

  7. arXiv:2111.02358  [pdf, other

    cs.CV cs.CL cs.LG

    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

    Authors: Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei

    Abstract: We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Mixture-of-Modality-Experts (MoME) Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer. Because of the modeling flexibility of MoME, pretrained VLMo can be fine-tu… ▽ More

    Submitted 27 May, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: Work in progress

  8. Compressive Imaging using Approximate Message Passing and a Markov-Tree Prior

    Authors: Subhojit Som, Philip Schniter

    Abstract: We propose a novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images. Like other recent works, we model wavelet structure using a hidden Markov tree (HMT) but, unlike other works, ours is based on loopy belief propagation (LBP). For LBP, we adopt a recently proposed "turbo" message passin… ▽ More

    Submitted 12 August, 2011; originally announced August 2011.

  9. arXiv:1004.4044  [pdf, ps, other

    cs.IT

    Sparsity Pattern Recovery in Bernoulli-Gaussian Signal Model

    Authors: Subhojit Som, Lee C Potter

    Abstract: In compressive sensing, sparse signals are recovered from underdetermined noisy linear observations. One of the interesting problems which attracted a lot of attention in recent times is the support recovery or sparsity pattern recovery problem. The aim is to identify the non-zero elements in the original sparse signal. In this article we consider the sparsity pattern recovery problem under a prob… ▽ More

    Submitted 22 April, 2010; originally announced April 2010.