Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–30 of 30 results for author: Patra, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17678  [pdf, other

    cs.CL

    Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

    Authors: Xihui Lin, Yunan Zhang, Suyu Ge, Barun Patra, Vishrav Chaudhary, Xia Song

    Abstract: Existing LLM training and inference frameworks struggle in boosting efficiency with sparsity while maintaining the integrity of context and model architecture. Inspired by the sharding concept in database and the fact that attention parallelizes over heads on accelerators, we propose Sparsely-Sharded (S2) Attention, an attention algorithm that allocates heterogeneous context partitions for differe… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 10 pages

  2. arXiv:2407.15229  [pdf, other

    cs.CL cs.AI

    The Hitchhiker's Guide to Human Alignment with *PO

    Authors: Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song

    Abstract: With the growing utilization of large language models (LLMs) across domains, alignment towards human preferences has become one of the most critical aspects of training models. At the forefront of state-of-the-art human alignment methods are preference optimization methods (*PO). However, prior research has often concentrated on identifying the best-performing method, typically involving a grid se… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 10 pages

  3. arXiv:2407.09879  [pdf, other

    cs.CL

    sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

    Authors: Sanchit Ahuja, Kumar Tanmay, Hardik Hansrajbhai Chauhan, Barun Patra, Kriti Aggarwal, Luciano Del Corro, Arindam Mitra, Tejas Indulal Dhamecha, Ahmed Awadallah, Monojit Choudhary, Vishrav Chaudhary, Sunayana Sitaram

    Abstract: Despite the remarkable success of LLMs in English, there is a significant gap in performance in non-English languages. In order to address this, we introduce a novel recipe for creating a multilingual synthetic instruction tuning dataset, sPhinX, which is created by selectively translating instruction response pairs from English into 50 languages. We test the effectiveness of sPhinX by using it to… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: 20 pages, 12 tables, 5 figures

  4. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  5. arXiv:2403.17199  [pdf, other

    cs.CL

    Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model

    Authors: Braja Gopal Patra, Lauren A. Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A. Sanchez-Ruiz, Euijung Ryu, Joanna M. Biernacka, Girish N. Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J. John Mann, Alexander W. Charney, Jyotishman Pathak

    Abstract: Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction.… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 2 figures, 3 tables

  6. arXiv:2401.06398  [pdf

    cs.CL

    An approach for mistranslation removal from popular dataset for Indic MT Task

    Authors: Sudhansu Bala Das, Leo Raphael Rodrigues, Tapas Kumar Mishra, Bidyut Kr. Patra

    Abstract: The conversion of content from one language to another utilizing a computer system is known as Machine Translation (MT). Various techniques have come up to ensure effective translations that retain the contextual and lexical interpretation of the source language. End-to-end Neural Machine Translation (NMT) is a popular technique and it is now widely used in real-world MT systems. Massive amounts o… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: 18 pages

  7. arXiv:2312.02073  [pdf, other

    cs.CL

    A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

    Authors: Giovanni Monea, Maxime Peyrard, Martin Josifoski, Vishrav Chaudhary, Jason Eisner, Emre Kıcıman, Hamid Palangi, Barun Patra, Robert West

    Abstract: Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmente… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted at ACL 2024 (main conference)

  8. arXiv:2306.12693  [pdf

    cs.CL

    Multilingual Neural Machine Translation System for Indic to Indic Languages

    Authors: Sudhansu Bala Das, Divyajyoti Panda, Tapas Kumar Mishra, Bidyut Kr. Patra, Asif Ekbal

    Abstract: This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using the BLEU score. In addition, the languages are classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI). The effect of language relatedness on MNMT model efficiency is stu… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 38 pages, 2 figures

  9. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  10. Statistical Machine Translation for Indic Languages

    Authors: Sudhansu Bala Das, Divyajoti Panda, Tapas Kumar Mishra, Bidyut Kr. Patra

    Abstract: Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the dev… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: 32pages, 1 figure, 4 tables

  11. arXiv:2212.10554  [pdf, other

    cs.CL

    A Length-Extrapolatable Transformer

    Authors: Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention r… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 9 pages

  12. arXiv:2212.03000  [pdf

    cs.CL cs.AI cs.LG

    SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

    Authors: Zehao Yu, Xi Yang, Chong Dang, Prakash Adekkanattu, Braja Gopal Patra, Yifan Peng, Jyotishman Pathak, Debbie L. Wilson, Ching-Yuan Chang, Wei-Hsuan Lo-Ciganic, Thomas J. George, William R. Hogan, Yi Guo, Jiang Bian, Yonghui Wu

    Abstract: Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified S… ▽ More

    Submitted 18 May, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    ACM Class: I.2.7

    Journal ref: Journal of Biomedical Informatics, April 2024, 104642

  13. arXiv:2211.13184  [pdf, other

    cs.LG cs.CL

    TorchScale: Transformers at Scale

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Work in progress

  14. arXiv:2210.14867  [pdf, other

    cs.CL cs.LG

    Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

    Authors: Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

    Abstract: In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Work in progress

  15. arXiv:2210.07228  [pdf, other

    cs.CL cs.LG

    Language Model Decoding as Likelihood-Utility Alignment

    Authors: Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kıcıman, Boi Faltings, Robert West

    Abstract: A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific no… ▽ More

    Submitted 16 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EACL (Findings) 2023

  16. arXiv:2210.06423  [pdf, other

    cs.LG cs.CL cs.CV

    Foundation Transformers

    Authors: Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, and Pre-LayerNorm for GPT and vision Transformers. We call for the development of Foundation Transformer for true general-purpose modeling, which serves… ▽ More

    Submitted 19 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Work in progress

  17. arXiv:2209.13279  [pdf

    cs.CL

    Improving Multilingual Neural Machine Translation System for Indic Languages

    Authors: Sudhansu Bala Das, Atharv Biradar, Tapas Kumar Mishra, Bidyut Kumar Patra

    Abstract: Machine Translation System (MTS) serves as an effective tool for communication by translating text or speech from one language to another language. The need of an efficient translation system becomes obvious in a large multilingual environment like India, where English and a set of Indian Languages (ILs) are officially used. In contrast with English, ILs are still entreated as low-resource languag… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 26 pages, 10 figures, 7 tables

  18. arXiv:2204.09179  [pdf, other

    cs.CL cs.LG

    On the Representation Collapse of Sparse Mixture of Experts

    Authors: Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we… ▽ More

    Submitted 12 October, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  19. arXiv:2204.01016  [pdf, other

    cs.CL cs.LG

    On Efficiently Acquiring Annotations for Multilingual Models

    Authors: Joel Ruben Antony Moniz, Barun Patra, Matthew R. Gormley

    Abstract: When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by zero-shot transfer to the remaining languages. In this work, we show that the strategy of joint learning across multiple languages using a single model performs… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

    Comments: ACL 2022 (Short Paper)

  20. Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing

    Authors: Suzanna Schmeelk, Martins Samuel Dogo, Yifan Peng, Braja Gopal Patra

    Abstract: Clinical notes, which can be embedded into electronic medical records, document patient care delivery and summarize interactions between healthcare providers and patients. These clinical notes directly inform patient care and can also indirectly inform research and quality/safety metrics, among other indirect metrics. Recently, some states within the United States of America require patients to ha… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: 7 pages, 4 figures, published in Proceedings of the 55th Hawaii International Conference on System Sciences

  21. arXiv:2110.08413  [pdf, other

    cs.CL cs.LG

    Invariant Language Modeling

    Authors: Maxime Peyrard, Sarvjeet Singh Ghotra, Martin Josifoski, Vidhan Agarwal, Barun Patra, Dean Carignan, Emre Kiciman, Robert West

    Abstract: Large pretrained language models are critical components of modern NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize b… ▽ More

    Submitted 14 November, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Published at EMNLP 2022

  22. arXiv:2012.02594  [pdf, other

    cs.CL cs.IR cs.LG

    To Schedule or not to Schedule: Extracting Task Specific Temporal Entities and Associated Negation Constraints

    Authors: Barun Patra, Chala Fufa, Pamela Bhattacharya, Charles Lee

    Abstract: State of the art research for date-time entity extraction from text is task agnostic. Consequently, while the methods proposed in literature perform well for generic date-time extraction from texts, they don't fare as well on task specific date-time entity extraction where only a subset of the date-time entities present in the text are pertinent to solving the task. Furthermore, some tasks require… ▽ More

    Submitted 15 November, 2020; originally announced December 2020.

    Comments: Proceedings of EMNLP 2020

  23. arXiv:2003.04988  [pdf, other

    cs.CL cs.LG stat.ML

    ScopeIt: Scoping Task Relevant Sentences in Documents

    Authors: Vishwas Suryanarayanan, Barun Patra, Pamela Bhattacharya, Chala Fufa, Charles Lee

    Abstract: Intelligent assistants like Cortana, Siri, Alexa, and Google Assistant are trained to parse information when the conversation is synchronous and short; however, for email-based conversational agents, the communication is asynchronous, and often contains information irrelevant to the assistant. This makes it harder for the system to accurately detect intents, extract entities relevant to those inte… ▽ More

    Submitted 15 November, 2020; v1 submitted 22 February, 2020; originally announced March 2020.

    Comments: Accepted in COLING 2020

    ACM Class: I.2.7; I.7.5

  24. arXiv:1908.06625  [pdf, other

    cs.CL cs.LG

    Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces

    Authors: Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, Graham Neubig

    Abstract: Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become i… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    Comments: ACL 2019

    Journal ref: Proceedings of the 57th Conference of the Association for Computational Linguistics (2019) 184-193

  25. arXiv:1904.09489  [pdf, other

    cs.LG cs.AI stat.ML

    Compression and Localization in Reinforcement Learning for ATARI Games

    Authors: Joel Ruben Antony Moniz, Barun Patra, Sarthak Garg

    Abstract: Deep neural networks have become commonplace in the domain of reinforcement learning, but are often expensive in terms of the number of parameters needed. While compressing deep neural networks has of late assumed great importance to overcome this drawback, little work has been done to address this problem in the context of reinforcement learning agents. This work aims at making first steps toward… ▽ More

    Submitted 20 April, 2019; originally announced April 2019.

    Comments: NeurIPS 2018 Deep Reinforcement Learning Workshop

  26. arXiv:1803.06745  [pdf, ps, other

    cs.CL

    Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

    Authors: Braja Gopal Patra, Dipankar Das, Amitava Das

    Abstract: Sentiment analysis is essential in many real-world applications such as stance detection, review analysis, recommendation system, and so on. Sentiment analysis becomes more difficult when the data is noisy and collected from social media. India is a multilingual country; people use more than one languages to communicate within themselves. The switching in between the languages is called code-switc… ▽ More

    Submitted 18 March, 2018; originally announced March 2018.

  27. Towards Understanding and Answering Multi-Sentence Recommendation Questions on Tourism

    Authors: Danish Contractor, Barun Patra, Mausam Singla, Parag Singla

    Abstract: We introduce the first system towards the novel task of answering complex multisentence recommendation questions in the tourism domain. Our solution uses a pipeline of two modules: question understanding and answering. For question understanding, we define an SQL-like query language that captures the semantic intent of a question; it supports operators like subset, negation, preference and similar… ▽ More

    Submitted 5 January, 2018; originally announced January 2018.

  28. arXiv:1705.04009  [pdf, other

    cs.IR

    A survey of Community Question Answering

    Authors: Barun Patra

    Abstract: With the advent of numerous community forums, tasks associated with the same have gained importance in the recent past. With the influx of new questions every day on these forums, the issues of identifying methods to find answers to said questions, or even trying to detect duplicate questions, are of practical importance and are challenging in their own right. This paper aims at surveying some of… ▽ More

    Submitted 11 May, 2017; originally announced May 2017.

  29. arXiv:1510.00585  [pdf, ps, other

    cs.IR

    A Complex Network Approach for Collaborative Recommendation

    Authors: Ranveer Singh, Bidyut Kr. Patra, Bibhas Adhikari

    Abstract: Collaborative filtering (CF) is the most widely used and successful approach for personalized service recommendations. Among the collaborative recommendation approaches, neighborhood based approaches enjoy a huge amount of popularity, due to their simplicity, justifiability, efficiency and stability. Neighborhood based collaborative filtering approach finds K nearest neighbors to an active user or… ▽ More

    Submitted 2 October, 2015; originally announced October 2015.

    Comments: 22 Pages

  30. arXiv:1205.2282  [pdf, other

    stat.ML cs.DC cs.LG

    A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms

    Authors: Matthieu Durut, Benoît Patra, Fabrice Rossi

    Abstract: This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better performances than the sequential algorithm. Another distributed scheme is therefore introduced which obtains the expected speed-ups. Then, it is improved to fit implemen… ▽ More

    Submitted 10 May, 2012; originally announced May 2012.

    Journal ref: 20-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012), Bruges : Belgium (2012)