Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 963 results for author: Han, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18198  [pdf, other

    cs.CV

    VDG: Vision-Only Dynamic Gaussian for Driving Simulation

    Authors: Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

    Abstract: Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (V… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  3. arXiv:2406.16521  [pdf, other

    cs.CL cs.AI

    Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback

    Authors: Jimin Sohn, Jeihee Cho, Junyong Lee, Songmu Heo, Ji-Eun Han, David R. Mortensen

    Abstract: Positive thinking is thought to be an important component of self-motivation in various practical fields such as education and the workplace. Previous work, including sentiment transfer and positive reframing, has focused on the positive side of language. However, self-motivation that drives people to reach their goals has not yet been studied from a computational perspective. Moreover, negative f… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 8 figures

  4. arXiv:2406.16221  [pdf, other

    cs.LG cs.AI cs.GR econ.EM stat.ME

    F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data

    Authors: Zexing Xu, Linjun Zhang, Sitan Yang, Rasoul Etesami, Hanghang Tong, Huan Zhang, Jiawei Han

    Abstract: Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns f… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    MSC Class: 68T07; 68T05; 62M10; 62M20; 90C90; 91B84

  5. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.15927  [pdf, other

    cs.CL cs.AI cs.LG

    Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

    Authors: Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal

    Abstract: We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: First three authors contributed equally

  7. arXiv:2406.13294  [pdf, other

    cs.MM cs.LG

    Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

    Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

    Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  8. arXiv:2406.12923  [pdf, other

    cs.LG cs.MA

    Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction

    Authors: Wenzhao Jiang, Jindong Han, Hao Liu, Tao Tao, Naiqiang Tan, Hui Xiong

    Abstract: Rapid urbanization has significantly escalated traffic congestion, underscoring the need for advanced congestion prediction services to bolster intelligent transportation systems. As one of the world's largest ride-hailing platforms, DiDi places great emphasis on the accuracy of congestion prediction to enhance the effectiveness and reliability of their real-time services, such as travel time esti… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  9. arXiv:2406.11709  [pdf, other

    cs.CL cs.MA

    Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging

    Authors: Priyanka Kargupta, Ishika Agarwal, Dilek Hakkani-Tur, Jiawei Han

    Abstract: Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving. The conversational capabilities of large language models (LLMs) show great potential for providing scalable, real-time student guidance. However, current LLMs often give away solutions directly, making them ineffective instructors. We tackle this issue in the code debugging domain with TreeIn… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.10833  [pdf, other

    cs.CL

    A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

    Authors: Yu Zhang, Xiusi Chen, Bowen Jin, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han

    Abstract: In many scientific fields, large language models (LLMs) have revolutionized the way with which text and other modalities of data (e.g., molecules and proteins) are dealt, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one to two fields or a single modality. In this paper,… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 33 pages (GitHub: https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models)

  11. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  12. arXiv:2406.08830  [pdf, other

    cs.LG cs.AI

    Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning

    Authors: Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han

    Abstract: To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a dir… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  13. arXiv:2406.08796  [pdf, other

    cs.CL

    Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

    Authors: Janghoon Han, Changho Lee, Joongbo Shin, Stanley Jungkyu Choi, Honglak Lee, Kynghoon Bae

    Abstract: Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024 (Camera-ready), by Janghoon Han and Changho Lee, with equal contribution

  14. arXiv:2406.08718  [pdf, other

    cs.CL

    Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

    Authors: Jun-Woo Kim, Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang

    Abstract: We introduce a pipeline that leverages Large Language Models (LLMs) to transform single-turn psychotherapy counseling sessions into multi-turn interactions. While AI-supported online counseling services for individuals with mental disorders exist, they are often constrained by the limited availability of multi-turn training datasets and frequently fail to fully utilize therapists' expertise. Our p… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024 AI4Research workshop

  15. arXiv:2406.07043  [pdf, other

    cs.CV

    1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

    Authors: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng

    Abstract: Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeVi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.05639  [pdf, other

    cs.SE

    A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair

    Authors: Guochang Li, Chen Zhi, Jialiang Chen, Junxiao Han, Shuiguang Deng

    Abstract: Automated Program Repair (APR) aims to fix bugs by generating patches. And existing work has demonstrated that "pre-training and fine-tuning" paradigm enables Large Language Models (LLMs) improve fixing capabilities on APR. However, existing work mainly focuses on Full-Model Fine-Tuning (FMFT) for APR and limited research has been conducted on the execution-based evaluation of Parameter-Efficient… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  17. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  18. arXiv:2406.01297  [pdf, other

    cs.CL

    When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

    Authors: Ryo Kamoi, Yusen Zhang, Nan Zhang, Jiawei Han, Rui Zhang

    Abstract: Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction frameworks using different sources of feedback, including self-evaluation and external feedback. However, there is still no consensus on the question of when LLMs can correct their own mistakes, as recent stud… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  19. arXiv:2406.01059  [pdf, other

    cs.CV

    VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model

    Authors: Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Zeke Xie, Zhong Ji, Jungong Han, Mingming Sun

    Abstract: In this paper, we focus on resolving the problem of image outpainting, which aims to extrapolate the surrounding parts given the center contents of an image. Although recent works have achieved promising performance, the lack of versatility and customization hinders their practical applications in broader scenarios. Therefore, this work presents a novel image outpainting framework that is capable… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages

  20. arXiv:2406.00914  [pdf, other

    math.OC cs.AI

    Wasserstein gradient flow for optimal probability measure decomposition

    Authors: Jiangze Han, Christopher Thomas Ryan, Xin T. Tong

    Abstract: We examine the infinite-dimensional optimization problem of finding a decomposition of a probability measure into K probability sub-measures to minimize specific loss functions inspired by applications in clustering and user grouping. We analytically explore the structures of the support of optimal sub-measures and introduce algorithms based on Wasserstein gradient flow, demonstrating their conver… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  21. arXiv:2405.19949  [pdf, other

    cs.CV

    Hyper-Transformer for Amodal Completion

    Authors: Jianxiong Gao, Xuelin Qian, Longfei Liang, Junwei Han, Yanwei Fu

    Abstract: Amodal object completion is a complex task that involves predicting the invisible parts of an object based on visible segments and background information. Learning shape priors is crucial for effective amodal completion, but traditional methods often rely on two-stage processes or additional information, leading to inefficiencies and potential error accumulation. To address these shortcomings, we… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  22. arXiv:2405.19691  [pdf, other

    cs.HC

    Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing

    Authors: Minsun Kim, SeonGyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Hyungseung Lim, Jieun Han, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong, Tak Yeon Lee

    Abstract: While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  23. arXiv:2405.19109  [pdf, other

    cs.CL

    PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering

    Authors: Fangzhi Xu, Qika Lin, Tianzhe Zhao, Jiawei Han, Jun Liu

    Abstract: Logical reasoning task has attracted great interest since it was proposed. Faced with such a task, current competitive models, even large language models (e.g., ChatGPT and PaLM 2), still perform badly. Previous promising LMs struggle in logical consistency modeling and logical structure perception. To this end, we model the logical reasoning task by transforming each logical sample into reasoning… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024

  24. arXiv:2405.19005  [pdf, other

    cs.CV

    Auto-selected Knowledge Adapters for Lifelong Person Re-identification

    Authors: Xuelin Qian, Ruiqi Wu, Gong Cheng, Junwei Han

    Abstract: Lifelong Person Re-Identification (LReID) extends traditional ReID by requiring systems to continually learn from non-overlapping datasets across different times and locations, adapting to new identities while preserving knowledge of previous ones. Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting since they try to cram diverse… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  25. arXiv:2405.17890  [pdf, other

    cs.IR cs.CL cs.LG

    SLMRec: Empowering Small Language Models for Sequential Recommendation

    Authors: Wujiang Xu, Zujie Liang, Jiaojiao Han, Xuying Ning, Wenfang Lin, Linxun Chen, Feng Wei, Yongfeng Zhang

    Abstract: The sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as la… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  26. arXiv:2405.16412  [pdf, other

    cs.CL cs.LG

    KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

    Authors: Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han

    Abstract: Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-gu… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  27. arXiv:2405.14458  [pdf, other

    cs.CV

    YOLOv10: Real-Time End-to-End Object Detection

    Authors: Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding

    Abstract: Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum sup… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/THU-MIG/yolov10

  28. arXiv:2405.14338  [pdf, other

    cs.CV

    MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

    Authors: Jiuming Liu, Jinru Han, Lihao Liu, Angelica I. Aviles-Rivero, Chaokang Jiang, Zhe Liu, Hesheng Wang

    Abstract: Point cloud videos effectively capture real-world spatial geometries and temporal dynamics, which are essential for enabling intelligent agents to understand the dynamically changing 3D world we live in. Although static 3D point cloud processing has witnessed significant advancements, designing an effective 4D point cloud video backbone remains challenging, mainly due to the irregular and unordere… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  29. arXiv:2405.14230  [pdf, other

    cs.CV cs.AI cs.CL

    Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

    Authors: Guangyu Guo, Jiawen Yao, Yingda Xia, Tony C. W. Mok, Zhilin Zheng, Junwei Han, Le Lu, Dingwen Zhang, Jian Zhou, Ling Zhang

    Abstract: The absence of adequately sufficient expert-level tumor annotations hinders the effectiveness of supervised learning based opportunistic cancer screening on medical imaging. Clinical reports (that are rich in descriptive textual details) can offer a "free lunch'' supervision information and provide tumor location as a type of weak label to cope with screening tasks, thus saving human labeling work… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.13388  [pdf, other

    cs.CV

    Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation

    Authors: Dingwen Zhang, Hao Li, Diqi He, Nian Liu, Lechao Cheng, Jingdong Wang, Junwei Han

    Abstract: In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: https://github.com/lifuguan/UPLVP

  31. arXiv:2405.11841  [pdf, other

    cs.AI

    Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities

    Authors: Junqi Wang, Chunhui Zhang, Jiapeng Li, Yuxi Ma, Lixing Niu, Jiaheng Han, Yujia Peng, Yixin Zhu, Lifeng Fan

    Abstract: Facing the current debate on whether Large Language Models (LLMs) attain near-human intelligence levels (Mitchell & Krakauer, 2023; Bubeck et al., 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023), the current study introduces a benchmark for evaluating social intelligence, one of the most distinctive aspects of human cognition. We developed a comprehensive theoretical framework for s… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Also published in Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci), 2024

  32. arXiv:2405.11765  [pdf, other

    cs.CV

    DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment

    Authors: Jianhong Han, Liang Chen, Yupei Wang

    Abstract: Object detectors frequently encounter significant performance degradation when confronted with domain gaps between collected data (source domain) and data from real-world applications (target domain). To address this task, numerous unsupervised domain adaptive detectors have been proposed, leveraging carefully designed feature alignment techniques. However, these techniques primarily align instanc… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Manuscript submitted to IEEE Transactions on Image Processing

  33. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  34. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  35. arXiv:2405.07459  [pdf, other

    cs.CV

    DualFocus: A Unified Framework for Integrating Positive and Negative Descriptors in Text-based Person Retrieval

    Authors: Yuchuan Deng, Zhanpeng Hu, Jiakun Han, Chuang Deng, Qijun Zhao

    Abstract: Text-based person retrieval (TPR) aims to retrieve images of a person from an extensive array of candidates based on a given textual description. The core challenge lies in mapping visual and textual data into a unified latent space. While existing TPR methods concentrate on recognizing explicit and positive characteristics, they often neglect the critical influence of negative descriptors, result… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  36. arXiv:2405.07038  [pdf, other

    cs.GT cs.LG stat.ML

    Conformal Online Auction Design

    Authors: Jiale Han, Xiaowu Dai

    Abstract: This paper proposes the conformal online auction design (COAD), a novel mechanism for maximizing revenue in online auctions by quantifying the uncertainty in bidders' values without relying on assumptions about value distributions. COAD incorporates both the bidder and item features and leverages historical data to provide an incentive-compatible mechanism for online auctions. Unlike traditional m… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  37. QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment

    Authors: Jin Han, Zhe Zheng, Xin-Zheng Lu, Ke-Yin Chen, Jia-Rui Lin

    Abstract: Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: International Journal of Disaster Risk Reduction, 2024

  38. arXiv:2405.06459  [pdf, other

    cs.CL cs.AI

    Are EEG-to-Text Models Working?

    Authors: Hyejeong Jo, Yiqian Yang, Juhyeok Han, Yiqun Duan, Hui Xiong, Won Hee Lee

    Abstract: This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  39. arXiv:2405.05610  [pdf, other

    cs.CL cs.CR cs.LG

    Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

    Authors: Xikang Yang, Xuehai Tang, Songlin Hu, Jizhong Han

    Abstract: Large language models (LLMs) have achieved remarkable performance in various natural language processing tasks, especially in dialogue systems. However, LLM may also pose security and moral threats, especially in multi round conversations where large models are more easily guided by contextual content, resulting in harmful or biased responses. In this paper, we present a novel method to attack LLM… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  40. arXiv:2405.05244  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan

    Abstract: The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specializ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Evaluation plan of the SVDD Challenge @ SLT 2024

  41. arXiv:2405.04985  [pdf

    cs.AI cs.CE

    An Artificial Intelligence Approach for Interpreting Creative Combinational Designs

    Authors: Liuqing Chen, Shuhong Xiao, Yunnong Chen, Linyun Sun, Peter R. N. Childs, Ji Han

    Abstract: Combinational creativity, a form of creativity involving the blending of familiar ideas, is pivotal in design innovation. While most research focuses on how combinational creativity in design is achieved through blending elements, this study focuses on the computational interpretation, specifically identifying the 'base' and 'additive' components that constitute a creative design. To achieve this… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  42. arXiv:2405.04812  [pdf, other

    cs.RO cs.CV

    General Place Recognition Survey: Towards Real-World Autonomy

    Authors: Peng Yin, Jianhao Jiao, Shiqi Zhao, Lingyun Xu, Guoquan Huang, Howie Choset, Sebastian Scherer, Jianda Han

    Abstract: In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-wo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 20 pages, 12 figures, under review

  43. arXiv:2405.03953  [pdf, other

    cs.SD eess.AS

    Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation

    Authors: Zixing Zhang, Tao Pang, Jing Han, Björn W. Schuller

    Abstract: Heart murmurs are a common manifestation of cardiovascular diseases and can provide crucial clues to early cardiac abnormalities. While most current research methods primarily focus on the accuracy of models, they often overlook other important aspects such as the interpretability of machine learning algorithms and the uncertainty of predictions. This paper introduces a heart murmur detection meth… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  44. arXiv:2405.03952  [pdf, other

    cs.SD cs.CL eess.AS

    HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech

    Authors: Zhongren Dong, Zixing Zhang, Weixiang Xu, Jing Han, Jianjun Ou, Björn W. Schuller

    Abstract: Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: publised at ICASSP 2024

  45. arXiv:2405.03613  [pdf, other

    cs.CV

    Dual Relation Mining Network for Zero-Shot Learning

    Authors: Jinwei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  46. arXiv:2405.03352  [pdf, other

    cs.CV

    Salient Object Detection From Arbitrary Modalities

    Authors: Nianchang Huang, Yang Yang, Ruida Xi, Qiang Zhang, Jungong Han, Jin Huang

    Abstract: Toward desirable saliency prediction, the types and numbers of inputs for a salient object detection (SOD) algorithm may dynamically change in many real-life applications. However, existing SOD algorithms are mainly designed or trained for one particular type of inputs, failing to be generalized to other types of inputs. Consequentially, more types of SOD algorithms need to be prepared in advance… ▽ More

    Submitted 9 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 15 Pages, 7 Figures, 8 Tables

  47. arXiv:2405.03351  [pdf, other

    cs.CV

    Modality Prompts for Arbitrary Modality Salient Object Detection

    Authors: Nianchang Huang, Yang Yang, Qiang Zhang, Jungong Han, Jin Huang

    Abstract: This paper delves into the task of arbitrary modality salient object detection (AM SOD), aiming to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD, ie more diverse modality discrepancies caused by varying modality types that need to be… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 Figures, 3 Tables

  48. arXiv:2404.19639  [pdf, other

    cs.CV

    ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud

    Authors: Jiayi Han, Zidi Cao, Weibo Zheng, Xiangguo Zhou, Xiangjian He, Yuanfang Zhang, Daisen Wei

    Abstract: In recent years, zero-shot learning has attracted the focus of many researchers, due to its flexibility and generality. Many approaches have been proposed to achieve the zero-shot classification of the point clouds for 3D object understanding, following the schema of CLIP. However, in the real world, the point clouds could be extremely sparse, dramatically limiting the effectiveness of the 3D poin… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  49. arXiv:2404.19171  [pdf, other

    cs.CV cs.AI

    Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection

    Authors: Cai Yu, Shan Jia, Xiaomeng Fu, Jin Liu, Jiahe Tian, Jiao Dai, Xi Wang, Siwei Lyu, Jizhong Han

    Abstract: With the rising prevalence of deepfakes, there is a growing interest in developing generalizable detection methods for various types of deepfakes. While effective in their specific modalities, traditional detection methods fall short in addressing the generalizability of detection across diverse cross-modal deepfakes. This paper aims to explicitly learn potential cross-modal correlation to enhance… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: accepted by ICME 2024

  50. arXiv:2404.19146  [pdf, other

    cs.AI cs.IR

    Automated Construction of Theme-specific Knowledge Graphs

    Authors: Linyi Ding, Sizhe Zhou, Jinfeng Xiao, Jiawei Han

    Abstract: Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g.… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.