Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 292 results for author: Ji, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.00054  [pdf, other

    cs.CL cs.AI

    Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting

    Authors: Yuting Hu, Dancheng Liu, Qingyun Wang, Charles Yu, Heng Ji, Jinjun Xiong

    Abstract: To address the challenge of automating knowledge discovery from a vast volume of literature, in this paper, we introduce a novel framework based on large language models (LLMs) that combines a progressive ontology prompting (POP) algorithm with a dual-agent system, named LLM-Duo, designed to enhance the automation of knowledge extraction from scientific articles. The POP algorithm utilizes a prior… ▽ More

    Submitted 20 August, 2024; originally announced September 2024.

    Comments: in submission

  2. arXiv:2408.10120  [pdf, other

    cs.AI

    Geometry Informed Tokenization of Molecules for Language Model Generation

    Authors: Xiner Li, Limei Wang, Youzhi Luo, Carl Edwards, Shurui Gui, Yuchao Lin, Heng Ji, Shuiwang Ji

    Abstract: We consider molecule generation in 3D space using language models (LMs), which requires discrete tokenization of 3D molecular geometries. Although tokenization of molecular graphs exists, that for 3D geometries is largely unexplored. Here, we attempt to bridge this gap by proposing the Geo2Seq, which converts molecular geometries into $SE(3)$-invariant 1D discrete sequences. Geo2Seq consists of ca… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.10086  [pdf, other

    cs.AI

    ARMADA: Attribute-Based Multimodal Data Augmentation

    Authors: Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan-Hao Huang, Te-Lin Wu, Nanyun Peng, Heng Ji

    Abstract: In Multimodal Language Models (MLMs), the cost of manually annotating high-quality image-text pair data for fine-tuning and alignment is extremely high. While existing multimodal data augmentation frameworks propose ways to augment image-text pairs, they either suffer from semantic inconsistency between texts and images, or generate unrealistic images, causing knowledge gap with real world example… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.06604  [pdf, other

    cs.CV

    MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers

    Authors: Zichao Dong, Yilin Zhang, Xufeng Huang, Hang Ji, Zhan Shi, Xin Zhan, Junbo Chen

    Abstract: We introduce a novel MV-DETR pipeline which is effective while efficient transformer based detection method. Given input RGBD data, we notice that there are super strong pretraining weights for RGB data while less effective works for depth related data. First and foremost , we argue that geometry and texture cues are both of vital importance while could be encoded separately. Secondly, we find tha… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  5. arXiv:2408.05996  [pdf, ps, other

    cs.NI

    Value-based Proactive Caching for Sensing Data in Internet of Vehicles

    Authors: Yantong Wang, Ke Liu, Hui Ji, Jiande Sun

    Abstract: Sensing data (SD) plays an important role in safe-related applications for Internet of Vehicles. Proactively caching required sensing data (SD) is a pivotal strategy for alleviating network congestion and improving data accessibility. Despite merits, existing studies predominantly address SD caching within a single time slot, which may not be scalable to scenarios involving multi-slots. Furthermor… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 14 pages,10 figures

  6. arXiv:2408.01623  [pdf, other

    cs.CL

    Dialog Flow Induction for Constrainable LLM-Based Chatbots

    Authors: Stuti Agrawal, Nishi Uppuluri, Pranav Pillai, Revanth Gangi Reddy, Zoey Li, Gokhan Tur, Dilek Hakkani-Tur, Heng Ji

    Abstract: LLM-driven dialog systems are used in a diverse set of applications, ranging from healthcare to customer service. However, given their generalization capability, it is difficult to ensure that these chatbots stay within the boundaries of the specialized domains, potentially resulting in inaccurate information and irrelevant responses. This paper introduces an unsupervised approach for automaticall… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted at SIGDIAL 2024

  7. arXiv:2408.00346  [pdf, other

    cs.LG cs.AI

    Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce

    Authors: Houye Ji, Ye Tang, Zhaoxin Chen, Lixi Deng, Jun Hu, Lei Su

    Abstract: With the rapid development of the short video industry, traditional e-commerce has encountered a new paradigm, video-driven e-commerce, which leverages attractive videos for product showcases and provides both video and item services for users. Benefitting from the dynamic and visualized introduction of items,video-driven e-commerce has shown huge potential in stimulating consumer confidence and p… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  8. arXiv:2408.00300  [pdf, other

    cs.CV cs.MM

    Towards Flexible Evaluation for Generative Visual Question Answering

    Authors: Huishan Ji, Qingyi Si, Zheng Lin, Weiping Wang

    Abstract: Throughout rapid development of multimodal large language models, a crucial ingredient is a fair and accurate evaluation of their multimodal comprehension abilities. Although Visual Question Answering (VQA) could serve as a developed test field, limitations of VQA evaluation, like the inflexible pattern of Exact Match, have hindered MLLMs from demonstrating their real capability and discourage ric… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  9. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenD… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/OpenDevin/OpenDevin

  10. arXiv:2407.13048  [pdf, other

    cs.CL

    Establishing Knowledge Preference in Language Models

    Authors: Sizhe Zhou, Sha Li, Yu Meng, Yizhu Jiao, Heng Ji, Jiawei Han

    Abstract: Language models are known to encode a great amount of factual knowledge through pretraining. However, such knowledge might be insufficient to cater to user requests, requiring the model to integrate external knowledge sources and adhere to user-provided specifications. When answering questions about ongoing events, the model should use recent news articles to update its response; when asked to pro… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 27 pages, 8 figures, 23 tables, working in progress

  11. arXiv:2407.12828  [pdf, other

    cs.CL cs.AI

    Why Does New Knowledge Create Messy Ripple Effects in LLMs?

    Authors: Jiaxin Qin, Zixuan Zhang, Chi Han, Manling Li, Pengfei Yu, Heng Ji

    Abstract: Extensive previous research has focused on post-training knowledge editing (KE) for language models (LMs) to ensure that knowledge remains accurate and up-to-date. One desired property and open question in KE is to let edited LMs correctly handle ripple effects, where LM is expected to answer its logically related knowledge accurately. In this paper, we answer the question of why most KE methods s… ▽ More

    Submitted 18 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  12. arXiv:2407.08039  [pdf, other

    cs.CL

    Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

    Authors: Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, Jing Li, Manling Li, Heng Ji

    Abstract: Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model wi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  13. arXiv:2407.06985  [pdf, other

    cs.AI

    PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

    Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Yingru Lin, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

    Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE… ▽ More

    Submitted 30 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  14. arXiv:2407.06438  [pdf, other

    cs.CV cs.CL cs.LG

    A Single Transformer for Scalable Vision-Language Modeling

    Authors: Yangyi Chen, Xingyao Wang, Hao Peng, Heng Ji

    Abstract: We present SOLO, a single transformer for Scalable visiOn-Language mOdeling. Current large vision-language models (LVLMs) such as LLaVA mostly employ heterogeneous architectures that connect pre-trained visual encoders with large language models (LLMs) to facilitate visual recognition and complex reasoning. Although achieving remarkable performance with relatively lightweight training, we identify… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/Yangyi-Chen/SOLO

  15. arXiv:2407.04929  [pdf, other

    cs.RO

    Toward Precise Robotic Weed Flaming Using a Mobile Manipulator with a Flamethrower

    Authors: Di Wang, Chengsong Hu, Shuangyu Xie, Joe Johnson, Hojun Ji, Yingtao Jiang, Muthukumar Bagavathiannan, Dezhen Song

    Abstract: Robotic weed flaming is a new and environmentally friendly approach to weed removal in the agricultural field. Using a mobile manipulator equipped with a flamethrower, we design a new system and algorithm to enable effective weed flaming, which requires robotic manipulation with a soft and deformable end effector, as the thermal coverage of the flame is affected by dynamic or unknown environmental… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: IROS 2024

  16. arXiv:2407.03040  [pdf, other

    cs.CL cs.AI

    Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

    Authors: Xia Hou, Qifeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song

    Abstract: Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generat… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

    MSC Class: 68T50 ACM Class: I.2.7

  17. arXiv:2407.01100  [pdf, other

    cs.CL cs.LG

    Eliminating Position Bias of Language Models: A Mechanistic Approach

    Authors: Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji

    Abstract: Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  18. arXiv:2406.15657  [pdf, other

    cs.IR

    FIRST: Faster Improved Listwise Reranking with Single Token Decoding

    Authors: Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, Heng Ji

    Abstract: Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidat… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Preprint

  19. arXiv:2406.14137  [pdf, other

    cs.CL

    MACAROON: Training Vision-Language Models To Be Your Engaged Partners

    Authors: Shujin Wu, Yi R. Fung, Sha Li, Yixin Wan, Kai-Wei Chang, Heng Ji

    Abstract: Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues. Thus, it is essential for LVLMs to proactively engage with humans to ask for clarifications or additional information for better responses. In this s… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The code will be made public at https://github.com/ShujinWu-0814/MACAROON

  20. arXiv:2406.07067  [pdf, other

    cs.IR cs.AI

    TIM: Temporal Interaction Model in Notification System

    Authors: Huxiao Ji, Haitao Yang, Linchuan Li, Shunyu Zhang, Cunyi Zhang, Xuanping Li, Wenwu Ou

    Abstract: Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patter… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.02056  [pdf, other

    cs.LG cs.NE

    CAP: A Context-Aware Neural Predictor for NAS

    Authors: Han Ji, Yuqi Feng, Yanan Sun

    Abstract: Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI24

  22. arXiv:2405.20015  [pdf, other

    cs.AI cs.CL

    Efficient LLM-Jailbreaking by Introducing Visual Modality

    Authors: Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong Jin

    Abstract: This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an e… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  23. arXiv:2405.15028  [pdf, other

    cs.CL cs.IR

    AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings

    Authors: Revanth Gangi Reddy, Omar Attia, Yunyao Li, Heng Ji, Saloni Potdar

    Abstract: Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain ques… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  24. arXiv:2405.14203  [pdf, other

    cs.LG cs.AI physics.chem-ph

    GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices

    Authors: Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, Heng Ji

    Abstract: This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, whic… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: In progress

  25. arXiv:2405.13179  [pdf, other

    cs.CL

    RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

    Authors: Yuelyu Ji, Zhuochun Li, Rui Meng, Sonish Sivarajkumar, Yanshan Wang, Zeshui Yu, Hui Ji, Yushui Han, Hanyu Zeng, Daqing He

    Abstract: This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learni… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  26. arXiv:2405.13005  [pdf

    cs.CL cs.AI cs.SI

    Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data

    Authors: Nan Miles Xi, Hong-Long Ji, Lin Wang

    Abstract: Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  27. arXiv:2405.04602  [pdf, other

    cs.SE

    Cross-Language Dependencies: An Empirical Study of Kotlin-Java

    Authors: Qiong Feng, Huan Ji, Xiaotian Ma, Peng Liang

    Abstract: Background: Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. The inter-operability of Java and Kotlin's design nature allows them to coexist and interact with each other smoothly within a project. Aims: However, there is limited research on how Java and Kotlin interact with each oth… ▽ More

    Submitted 26 July, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: The 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

  28. arXiv:2405.03446  [pdf, other

    cs.CR

    SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

    Authors: Hangyuan Ji, Jian Yang, Linzheng Chai, Chaoren Wei, Liqun Yang, Yunlong Duan, Yunli Wang, Tianzhen Sun, Hongcheng Guo, Tongliang Li, Changyu Ren, Zhoujun Li

    Abstract: To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability… ▽ More

    Submitted 3 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  29. arXiv:2404.16792  [pdf, other

    cs.LG cs.AI cs.CL

    Weak-to-Strong Extrapolation Expedites Alignment

    Authors: Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

    Abstract: The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the lite… ▽ More

    Submitted 22 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Add theoretical explanation and more evaluation results

  30. arXiv:2404.15332  [pdf, other

    eess.SP cs.LG

    Clinical translation of machine learning algorithms for seizure detection in scalp electroencephalography: systematic review

    Authors: Nina Moutonnet, Steven White, Benjamin P Campbell, Saeid Sanei, Toshihisa Tanaka, Hong Ji, Danilo Mandic, Gregory Scott

    Abstract: Machine learning algorithms for seizure detection have shown considerable diagnostic potential, with recent reported accuracies reaching 100%. Yet, only few published algorithms have fully addressed the requirements for successful clinical translation. This is, for example, because the properties of training data may limit the generalisability of algorithms, algorithm performance may vary dependin… ▽ More

    Submitted 13 August, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 60 pages, LaTeX; Addition of co-authors, keywords alphabetically sorted, text in figure 1 changed to black, references added ([9],[56] ), abbreviations defined (CNN, RNN), added section 6.4, corrected the referencing style, added a sentence about the existence of non-epileptic attacks, added an explanation about the drawback of the 10-20 system, removed bold from Figure/Table titles

  31. arXiv:2404.12666  [pdf, other

    cs.DC cs.CR cs.ET

    A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues

    Authors: Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, Zhu Han

    Abstract: The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has restricted the traditional data analytics workflow, where the edge data are gathered by a centralized server to be further utilized by data analysts. To continue leveraging vast edge data to support various data-incentive applications, a transformative shift is promoted in com… ▽ More

    Submitted 22 July, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: This survey has been submitted to IEEE Communications Surveys & Tutorials

  32. arXiv:2404.12135  [pdf, other

    cs.MA cs.CR cs.DC

    mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

    Authors: Wei Zhang, Hongcheng Guo, Jian Yang, Yi Zhang, Chaoran Yan, Zhoujin Tian, Hangyuan Ji, Zhoujun Li, Tongliang Li, Tieqiao Zheng, Chao Chen, Yi Liang, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  33. arXiv:2404.06479  [pdf, other

    cs.CL cs.AI cs.CV

    Text-Based Reasoning About Vector Graphics

    Authors: Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji

    Abstract: While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose… ▽ More

    Submitted 24 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Project page: https://mikewangwzhl.github.io/VDLM/

  34. arXiv:2404.01652  [pdf, other

    cs.CL cs.AI

    Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

    Authors: Zixuan Zhang, Revanth Gangi Reddy, Kevin Small, Tong Zhang, Heng Ji

    Abstract: Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain a… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Findings

  35. arXiv:2403.18671  [pdf, other

    cs.CL cs.LG

    Fact Checking Beyond Training Set

    Authors: Payam Karisani, Heng Ji

    Abstract: Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propos… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  36. arXiv:2403.16823  [pdf, ps, other

    eess.SY cs.LG

    Resource and Mobility Management in Hybrid LiFi and WiFi Networks: A User-Centric Learning Approach

    Authors: Han Ji, Xiping Wu

    Abstract: Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) are an emerging indoor wireless communication paradigm, which combines the advantages of the capacious optical spectra of LiFi and ubiquitous coverage of WiFi. Meanwhile, load balancing (LB) becomes a key challenge in resource management for such hybrid networks. The existing LB methods are mostly network-centric, relying… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 12 pages, 12 figures, 3 tables, submitted to IEEE TWC

  37. arXiv:2403.12027  [pdf, other

    cs.CL cs.AI cs.CV

    From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

    Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

    Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More

    Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2403.06093  [pdf, other

    cs.CV

    Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

    Authors: Haoxuanye Ji, Pengpeng Liang, Erkang Cheng

    Abstract: Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than state-of-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  39. arXiv:2403.05159  [pdf, other

    cs.CV

    LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

    Authors: Zichao Dong, Bowen Pang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen

    Abstract: Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection e… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  40. arXiv:2403.00791  [pdf, other

    cs.CL cs.AI q-bio.BM q-bio.QM

    L+M-24: Building a Dataset for Language + Molecules @ ACL 2024

    Authors: Carl Edwards, Qingyun Wang, Lawrence Zhao, Heng Ji

    Abstract: Language-molecule models have emerged as an exciting direction for molecular discovery and understanding. However, training these models is challenging due to the scarcity of molecule-language pair datasets. At this point, datasets have been released which are 1) small and scraped from existing databases, 2) large but noisy and constructed by performing entity linking on the scientific literature,… ▽ More

    Submitted 4 July, 2024; v1 submitted 22 February, 2024; originally announced March 2024.

    Comments: The dataset, finetuned baselines, and evaluation code are released publicly at https://github.com/language-plus-molecules/LPM-24-Dataset through https://huggingface.co/language-plus-molecules

  41. arXiv:2402.19275  [pdf, other

    eess.SY cs.LG

    Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning

    Authors: Jingxuan Yang, Ruoxuan Bai, Haoyuan Ji, Yi Zhang, Jianming Hu, Shuo Feng

    Abstract: The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prio… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  42. arXiv:2402.16315  [pdf, other

    cs.CV cs.CL

    Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models

    Authors: Jeonghwan Kim, Heng Ji

    Abstract: Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six differen… ▽ More

    Submitted 11 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  43. arXiv:2402.15796  [pdf

    cs.AI cs.HC

    Construction and application of artificial intelligence crowdsourcing map based on multi-track GPS data

    Authors: Yong Wang, Yanlin Zhou, Huan Ji, Zheng He, Xinyu Shen

    Abstract: In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to ratio… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  44. arXiv:2402.14221  [pdf, other

    cs.DC

    Towards singular optimality in the presence of local initial knowledge

    Authors: Hongyan Ji, Sriram V. Pemmaraju

    Abstract: The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. I… ▽ More

    Submitted 22 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  45. arXiv:2402.11943  [pdf, other

    cs.CL

    LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

    Authors: Keyang Xuan, Li Yi, Fan Yang, Ruochen Wu, Yi R. Fung, Heng Ji

    Abstract: The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential sol… ▽ More

    Submitted 20 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  46. arXiv:2402.11324  [pdf, other

    cs.CL

    EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries

    Authors: Jiateng Liu, Pengfei Yu, Yuji Zhang, Sha Li, Zixuan Zhang, Heng Ji

    Abstract: The dynamic nature of real-world information necessitates efficient knowledge editing (KE) in large language models (LLMs) for knowledge updating. However, current KE approaches, which typically operate on (subject, relation, object) triples, ignore the contextual information and the relation among different knowledge. Such editing methods could thus encounter an uncertain editing boundary, leavin… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  47. arXiv:2402.11060  [pdf, other

    cs.CL cs.AI cs.IR

    Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement

    Authors: Chenkai Sun, Ke Yang, Revanth Gangi Reddy, Yi R. Fung, Hou Pong Chan, Kevin Small, ChengXiang Zhai, Heng Ji

    Abstract: The increasing demand for personalized interactions with large language models (LLMs) calls for methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrie… ▽ More

    Submitted 20 August, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  48. arXiv:2402.10980  [pdf, other

    physics.chem-ph cs.AI cs.CE cs.LG

    ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

    Authors: Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, Sutanay Choudhury

    Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent… ▽ More

    Submitted 7 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 9 pages, accepted by ICML 2024, final version

  49. arXiv:2402.09369  [pdf, other

    cs.CL

    Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking

    Authors: Yi Fung, Ruining Zhao, Jae Doo, Chenkai Sun, Heng Ji

    Abstract: Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively mult… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: preprint

  50. arXiv:2402.07401  [pdf, other

    cs.CL

    Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

    Authors: Kyungha Kim, Sangyun Lee, Kung-Hsiang Huang, Hou Pong Chan, Manling Li, Heng Ji

    Abstract: Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing faithful explanations in fact-checking remains underexamined. Our study investigates LLMs' ability to generate such explanations, finding that zero-shot prompts o… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.