Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 156 results for author: Shang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02877  [pdf, other

    cs.AI cs.CL cs.LG

    Configurable Foundation Models: Building LLMs from a Modular Perspective

    Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

    Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2408.06402  [pdf, other

    q-bio.QM cs.AI cs.LG

    PhaGO: Protein function annotation for bacteriophages by integrating the genomic context

    Authors: Jiaojiao Guan, Yongxin Ji, Cheng Peng, Wei Zou, Xubo Tang, Jiayu Shang, Yanni Sun

    Abstract: Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins pre… ▽ More

    Submitted 17 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 17 pages,6 figures

  3. arXiv:2408.03675  [pdf, other

    cs.CL

    NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

    Authors: Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu

    Abstract: Large Language Models (LLMs) have ignited an innovative surge of AI applications, marking a new era of exciting possibilities equipped with extended context windows. However, hosting these models is cost-prohibitive mainly due to the extensive memory consumption of KV Cache involving long-context modeling. Despite several works proposing to evict unnecessary tokens from the KV Cache, most of them… ▽ More

    Submitted 7 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 (main conference, long paper)

  4. arXiv:2407.20454  [pdf, other

    cs.LG cs.CL

    CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models

    Authors: Junda Wu, Xintong Li, Tong Yu, Yu Wang, Xiang Chen, Jiuxiang Gu, Lina Yao, Jingbo Shang, Julian McAuley

    Abstract: Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks. The major challenge is how to efficiently find the synergy through cooperative learning where LLMs adapt their reasoning abilities in downstream tasks while feature encoders adjust their encoding to provide more relevant modal information… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 9 pages

  5. arXiv:2407.20179  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Theia: Distilling Diverse Vision Foundation Models for Robot Learning

    Authors: Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant

    Abstract: Vision-based robot policy learning, which maps visual inputs to actions, necessitates a holistic understanding of diverse visual tasks beyond single-task needs like classification or segmentation. Inspired by this, we introduce Theia, a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks. Theia's rich visual repres… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  6. arXiv:2407.19056  [pdf, other

    cs.CL

    OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

    Authors: Zilong Wang, Yuedong Cui, Li Zhong, Zimin Zhang, Da Yin, Bill Yuchen Lin, Jingbo Shang

    Abstract: Office automation significantly enhances human productivity by automatically finishing routine tasks in the workflow. Beyond the basic information extraction studied in much of the prior document AI literature, the office automation research should be extended to more realistic office tasks which require to integrate various information sources in the office system and produce outputs through a se… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Preprint

  7. arXiv:2407.08223  [pdf, other

    cs.CL cs.AI

    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

    Authors: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Specul… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Preprint

  8. arXiv:2407.05778  [pdf, other

    cs.CL cs.AI

    When is the consistent prediction likely to be a correct prediction?

    Authors: Alex Nguyen, Dheeraj Mekala, Chengyu Dong, Jingbo Shang

    Abstract: Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.05609  [pdf, other

    cs.CL

    Open-world Multi-label Text Classification with Extremely Weak Supervision

    Authors: Xintong Li, Jinya Jiang, Ria Dharmani, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang

    Abstract: We study open-world multi-label text classification under extremely weak supervision (XWS), where the user only provides a brief description for classification objectives without any labels or ground-truth label space. Similar single-label XWS settings have been explored recently, however, these methods cannot be easily adapted for multi-label. We observe that (1) most documents have a dominant cl… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Preprint

  10. arXiv:2407.01065  [pdf, other

    cs.LG

    Improve ROI with Causal Learning and Conformal Prediction

    Authors: Meng Ai, Zhuo Chen, Jibin Wang, Jing Shang, Tao Tao, Zhen Li

    Abstract: In the commercial sphere, such as operations and maintenance, advertising, and marketing recommendations, intelligent decision-making utilizing data mining and neural network technologies is crucial, especially in resource allocation to optimize ROI. This study delves into the Cost-aware Binary Treatment Assignment Problem (C-BTAP) across different industries, with a focus on the state-of-the-art… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ICDE 2024; Link: https://icde2024.github.io/papers.html

  11. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  12. arXiv:2406.18312  [pdf, other

    cs.CL cs.AI

    AI-native Memory: A Pathway from LLMs Towards AGI

    Authors: Jingbo Shang, Zai Zheng, Jiale Wei, Xiang Ying, Felix Tao, Mindverse Team

    Abstract: Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective conte… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  13. arXiv:2406.13236  [pdf, other

    cs.CL cs.AI

    Data Contamination Can Cross Language Barriers

    Authors: Feng Yao, Yufan Zhuang, Zihao Sun, Sunan Xu, Animesh Kumar, Jingbo Shang

    Abstract: The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingua… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  14. arXiv:2406.11115  [pdf, other

    cs.CL

    Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

    Authors: Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, Jingbo Shang

    Abstract: For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot gene… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  15. arXiv:2406.06567  [pdf, other

    cs.LG cs.AI cs.CL

    DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

    Authors: Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun

    Abstract: Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate subst… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures, 3 tables

  16. arXiv:2406.04460  [pdf, other

    cs.CL

    Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs

    Authors: Shang Zhou, Feng Yao, Chengyu Dong, Zihan Wang, Jingbo Shang

    Abstract: Controlling the attribute intensity of text generation is crucial across scenarios (e.g., writing conciseness, chatting emotion, and explanation clarity). The remarkable capabilities of large language models (LLMs) have revolutionized text generation, prompting us to explore such \emph{smooth control} of LLM generation. Specifically, we propose metrics to assess the range, calibration, and consist… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  17. arXiv:2406.00226  [pdf, other

    cs.CL

    Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction

    Authors: William Hogan, Jingbo Shang

    Abstract: Recent research efforts have explored the potential of leveraging natural language inference (NLI) techniques to enhance relation extraction (RE). In this vein, we introduce MetaEntail-RE, a novel adaptation method that harnesses NLI principles to enhance RE performance. Our approach follows past works by verbalizing relation classes into class-indicative hypotheses, aligning a traditionally multi… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 17 pages, 1 figure

    ACM Class: I.2.7

  18. arXiv:2405.13397  [pdf, other

    cs.CV

    Multi Player Tracking in Ice Hockey with Homographic Projections

    Authors: Harish Prakash, Jia Cheng Shang, Ken M. Nsiempba, Yuhao Chen, David A. Clausi, John S. Zelek

    Abstract: Multi Object Tracking (MOT) in ice hockey pursues the combined task of localizing and associating players across a given sequence to maintain their identities. Tracking players from monocular broadcast feeds is an important computer vision problem offering various downstream analytics and enhanced viewership experience. However, existing trackers encounter significant difficulties in dealing with… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted at the Conference on Robots and Vision (CRV), 2024

  19. arXiv:2405.07726  [pdf, other

    cs.CL

    Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing

    Authors: Letian Peng, Jingbo Shang

    Abstract: Persona-driven role-playing (PRP) aims to build AI characters that can respond to user queries by faithfully sticking with all persona statements. Unfortunately, existing faithfulness criteria for PRP are limited to coarse-grained LLM-based scoring without a clear definition or formulation. This paper presents a pioneering exploration to quantify PRP faithfulness as a fine-grained and explainable… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  20. arXiv:2405.04086  [pdf, other

    cs.CL

    Optimizing Language Model's Reasoning Abilities with Weak Supervision

    Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

    Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  21. arXiv:2404.15889  [pdf, other

    cs.CV cs.GR

    Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control

    Authors: Linzi Qu, Jiaxiang Shang, Hui Ye, Xiaoguang Han, Hongbo Fu

    Abstract: Geometry- and appearance-controlled full-body human image generation is an interesting but challenging task. Existing solutions are either unconditional or dependent on coarse conditions (e.g., pose, text), thus lacking explicit geometry and appearance control of body and garment. Sketching offers such editing ability and has been adopted in various sketch-based face generation and editing solutio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.14372  [pdf, other

    cs.CL cs.AI

    Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph

    Authors: Xiaochen Kev Gao, Feng Yao, Kewen Zhao, Beilei He, Animesh Kumar, Vish Krishnan, Jingbo Shang

    Abstract: Model scaling is becoming the default choice for many language tasks due to the success of large language models (LLMs). However, it can fall short in specific scenarios where simple customized methods excel. In this paper, we delve into the patent approval pre-diction task and unveil that simple domain-specific graph methods outperform enlarging the model, using the intrinsic dependencies within… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 17 Pages, Under Review

  23. arXiv:2404.10877  [pdf, other

    cs.CL

    Incubating Text Classifiers Following User Instruction with Nothing but LLM

    Authors: Letian Peng, Jingbo Shang

    Abstract: In this paper, we aim to generate text classification data given arbitrary class definitions (i.e., user instruction), so one can train a small text classifier without any human annotation or raw corpus. Compared with pioneer attempts, our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes (e.g., "TED Talk given by Educator" and "Other"). Spec… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  24. arXiv:2404.07382  [pdf, other

    cs.AI cs.LO

    Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

    Authors: Chenyang An, Zhibo Chen, Qihao Ye, Emily First, Letian Peng, Jiayun Zhang, Zihan Wang, Sorin Lerner, Jingbo Shang

    Abstract: Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its traini… ▽ More

    Submitted 29 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted as a main conference paper at ACL 2024

  25. arXiv:2404.02931  [pdf, other

    cs.CL cs.AI

    READ: Improving Relation Extraction from an ADversarial Perspective

    Authors: Dawei Li, William Hogan, Jingbo Shang

    Abstract: Recent works in relation extraction (RE) have achieved promising benchmark accuracy; however, our adversarial attack experiments show that these works excessively rely on entities, making their generalization capability questionable. To address this issue, we propose an adversarial training method specifically designed for RE. Our approach introduces both sequence- and token-level perturbations to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by findings of NAACL 2024

  26. arXiv:2404.00457  [pdf, other

    cs.CL

    MetaIE: Distilling a Meta Model from LLM for All Kinds of Information Extraction Tasks

    Authors: Letian Peng, Zilong Wang, Feng Yao, Zihan Wang, Jingbo Shang

    Abstract: Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. I… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  27. arXiv:2404.00439  [pdf, other

    cs.CL

    DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering

    Authors: Alex Nguyen, Zilong Wang, Jingbo Shang, Dheeraj Mekala

    Abstract: The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles. These include the complexity of working with PDF formats that necessitate parsing text and layout information for curating training data and the lack of privacy-preserving annotation… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  28. arXiv:2403.20046  [pdf, other

    cs.CL

    Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

    Authors: Yongqi Tong, Dawei Li, Sizhe Wang, Yujia Wang, Fei Teng, Jingbo Shang

    Abstract: Recent works have shown the benefits to LLMs from fine-tuning golden-standard Chain-of-Thought (CoT) rationales or using them as correct examples in few-shot prompting. While humans can indeed imitate correct examples, learning from our mistakes is another vital aspect of human cognition. Hence, a question naturally arises: \textit{can LLMs learn and benefit from their mistakes, especially for the… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) - Main Conference

  29. arXiv:2402.16906  [pdf, other

    cs.SE cs.AI cs.CL

    Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

    Authors: Li Zhong, Zilong Wang, Jingbo Shang

    Abstract: Large language models (LLMs) are leading significant progress in code generation. Beyond one-pass code generation, recent works further integrate unit tests and program verifiers into LLMs to iteratively refine the generated programs. However, these works consider the generated programs as an indivisible entity, which falls short for LLMs in debugging the programs, especially when the programs con… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Preprint

  30. arXiv:2402.14158  [pdf, other

    cs.CL

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    Authors: Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu

    Abstract: Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models still struggle with learning how to robustly use new tools from only a few demonstrations. In this work we introduce a self-verification method which distinguish… ▽ More

    Submitted 13 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  31. arXiv:2402.10430  [pdf, other

    cs.CL

    Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models

    Authors: Dheeraj Mekala, Alex Nguyen, Jingbo Shang

    Abstract: Instruction-tuning language models has become a crucial step in aligning them for general use. Typically, this process involves extensive training on large datasets, incurring high training costs. In this paper, we introduce a novel training data selection based on the learning percentage of the samples. We assert that current language models possess the capability to autonomously select high-qual… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  32. arXiv:2402.09642  [pdf, other

    cs.CL

    Answer is All You Need: Instruction-following Text Embedding via Answering the Question

    Authors: Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang

    Abstract: This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the repres… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  33. arXiv:2402.04624  [pdf, other

    cs.CL

    MEMORYLLM: Towards Self-Updatable Large Language Models

    Authors: Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, Julian McAuley

    Abstract: Existing Large Language Models (LLMs) usually remain static after deployment, which might make it hard to inject new knowledge into the model. We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently. To this end, we introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memo… ▽ More

    Submitted 26 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 13 pages, 9 figures

  34. arXiv:2402.03774  [pdf, other

    cs.LG cs.AI cs.CL

    Learning a Decision Tree Algorithm with Transformers

    Authors: Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao

    Abstract: Decision trees are renowned for their ability to achieve high predictive performance while remaining interpretable, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying a good partition is challenging, as decision trees optimized for local segments may not yield global generalizatio… ▽ More

    Submitted 23 August, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  35. arXiv:2402.02658  [pdf, other

    cs.AI cs.CL cs.LG

    Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision

    Authors: Zihan Wang, Yunxuan Li, Yuexin Wu, Liangchen Luo, Le Hou, Hongkun Yu, Jingbo Shang

    Abstract: Process supervision, using a trained verifier to evaluate the intermediate steps generated by reasoner, has demonstrated significant improvements in multi-step problem solving. In this paper, to avoid expensive human annotation effort on the verifier training data, we introduce Model-induced Process Supervision (MiPS), a novel method for automating data curation. MiPS annotates an intermediate ste… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  36. arXiv:2402.01801  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models for Time Series: A Survey

    Authors: Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang

    Abstract: Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the vari… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: GitHub repository: https://github.com/xiyuanzh/awesome-llm-time-series

  37. arXiv:2401.04398  [pdf, other

    cs.CL

    Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

    Authors: Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches inco… ▽ More

    Submitted 18 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  38. arXiv:2401.01003  [pdf, other

    cs.CV eess.IV

    Rink-Agnostic Hockey Rink Registration

    Authors: Jia Cheng Shang, Yuhao Chen, Mohammad Javad Shafiee, David A. Clausi

    Abstract: Hockey rink registration is a useful tool for aiding and automating sports analysis. When combined with player tracking, it can provide location information of players on the rink by estimating a homography matrix that can warp broadcast video frames onto an overhead template of the rink, or vice versa. However, most existing techniques require accurate ground truth information, which can take man… ▽ More

    Submitted 8 September, 2023; originally announced January 2024.

  39. arXiv:2312.03291  [pdf, other

    cs.LG cs.AI

    OMNIINPUT: A Model-centric Evaluation Framework through Output Distribution

    Authors: Weitang Liu, Ying Wai Li, Tianle Wang, Yi-Zhuang You, Jingbo Shang

    Abstract: We propose a novel model-centric evaluation framework, OmniInput, to evaluate the quality of an AI/ML model's predictions on all possible inputs (including human-unrecognizable ones), which is crucial for AI safety and reliability. Unlike traditional data-centric evaluation based on pre-defined test sets, the test set in OmniInput is self-constructed by the model itself and the model quality is ev… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  40. arXiv:2312.00293  [pdf

    cs.CL

    PsyAttention: Psychological Attention Model for Personality Detection

    Authors: Baohua Zhang, Yongyi Huang, Wenyao Cui, Huaping Zhang, Jianyun Shang

    Abstract: Work on personality detection has tended to incorporate psychological features from different personality models, such as BigFive and MBTI. There are more than 900 psychological features, each of which is helpful for personality detection. However, when used in combination, the application of different calculation standards among these features may result in interference between features calculate… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  41. arXiv:2311.06968  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Physics-Informed Data Denoising for Real-Life Sensing Systems

    Authors: Xiyuan Zhang, Xiaohan Fu, Diyan Teng, Chengyu Dong, Keerthivasan Vijayakumar, Jiayun Zhang, Ranak Roy Chowdhury, Junsheng Han, Dezhi Hong, Rashmi Kulkarni, Jingbo Shang, Rajesh Gupta

    Abstract: Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically r… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: SenSys 2023

  42. arXiv:2311.03319  [pdf, other

    cs.CL cs.AI

    DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

    Authors: Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin wang, Xueqi Wang, William Hogan, Jingbo Shang

    Abstract: In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks. However, ICL requires high-quality annotated demonstrations which might not be available in real-world scenarios. To overcome this limitation, we propose \textbf{D}ata \textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning (\textbf{DAIL}). DAIL leverages the intui… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Course project for DSC 253 (Advanced Data-Driven Text Mining) at UCSD

  43. arXiv:2311.02861  [pdf, other

    cs.CL

    Less than One-shot: Named Entity Recognition via Extremely Weak Supervision

    Authors: Letian Peng, Zihan Wang, Jingbo Shang

    Abstract: We study the named entity recognition (NER) problem under the extremely weak supervision (XWS) setting, where only one example entity per type is given in a context-free way. While one can see that XWS is lighter than one-shot in terms of the amount of supervision, we propose a novel method X-NER that can outperform the state-of-the-art one-shot NER methods. We first mine entity spans that are sim… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: Accepted to Findings of EMNLP 2023

  44. arXiv:2311.01751  [pdf, other

    cs.CL

    EmojiLM: Modeling the New Emoji Language

    Authors: Letian Peng, Zilong Wang, Hang Liu, Zihan Wang, Jingbo Shang

    Abstract: With the rapid development of the internet, online social media welcomes people with different backgrounds through its diverse content. The increasing usage of emoji becomes a noticeable trend thanks to emoji's rich information beyond cultural or linguistic borders. However, the current study on emojis is limited to single emoji prediction and there are limited data resources available for further… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  45. arXiv:2310.17389  [pdf, other

    cs.CL cs.AI

    ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

    Authors: Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, Jingbo Shang

    Abstract: Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored.… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Journal ref: EMNLP findings 2023

  46. arXiv:2310.12342  [pdf, other

    cs.CL cs.AI

    Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking

    Authors: Yongqi Tong, Yifan Wang, Dawei Li, Sizhe Wang, Zi Lin, Simeng Han, Jingbo Shang

    Abstract: Chain-of-Thought(CoT) prompting and its variants explore equipping large language models (LLMs) with high-level reasoning abilities by emulating human-like linear cognition and logic. However, the human mind is complicated and mixed with both linear and nonlinear thinking. In this work, we propose \textbf{I}nferential \textbf{E}xclusion \textbf{P}rompting (IEP), a novel prompting that combines the… ▽ More

    Submitted 14 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  47. arXiv:2310.07347  [pdf, other

    cs.CL cs.AI cs.LG

    Fast-ELECTRA for Efficient Pre-training

    Authors: Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

    Abstract: ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model. Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model. Notably, this model, which is jointly trained with the main model, only serves to assist the training of the main model and is discarded post-t… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  48. arXiv:2310.04815  [pdf, other

    cs.LG

    Critique Ability of Large Language Models

    Authors: Liangchen Luo, Zi Lin, Yinxiao Liu, Lei Shu, Yun Zhu, Jingbo Shang, Lei Meng

    Abstract: Critical thinking is essential for rational decision-making and problem-solving. This skill hinges on the ability to provide precise and reasoned critiques and is a hallmark of human intelligence. In the era of large language models (LLMs), this study explores the ability of LLMs to deliver accurate critiques across various tasks. We are interested in this topic as a capable critic model could not… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  49. arXiv:2310.03182  [pdf, other

    cs.CV cs.CL cs.LG

    Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

    Authors: An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, Chengyu Dong, Amilcare Gentili, Chun-Nan Hsu, Jingbo Shang, Julian McAuley

    Abstract: Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 18 pages, 12 figures

  50. arXiv:2308.03685  [pdf, other

    cs.CV

    Learning Concise and Descriptive Attributes for Visual Recognition

    Authors: An Yan, Yu Wang, Yiwu Zhong, Chengyu Dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley

    Abstract: Recent advances in foundation models present new opportunities for interpretable visual recognition -- one can first query Large Language Models (LLMs) to obtain a set of attributes that describe each class, then apply vision-language models to classify images via these attributes. Pioneering work shows that querying thousands of attributes can achieve performance competitive with image features.… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: ICCV 2023