Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,112 results for author: Chen, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11297  [pdf, other

    cs.CV

    Making Large Vision Language Models to be Good Few-shot Learners

    Authors: Fan Liu, Wenwen Cai, Jian Huo, Chuanyi Zhang, Delong Chen, Jun Zhou

    Abstract: Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk lear… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.11046  [pdf, other

    cs.CL

    Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

    Authors: Yuan Xin, Zheng Li, Ning Yu, Dingfan Chen, Mario Fritz, Michael Backes, Yang Zhang

    Abstract: Despite being prevalent in the general field of Natural Language Processing (NLP), pre-trained language models inherently carry privacy and copyright concerns due to their nature of training on large-scale web-scraped data. In this paper, we pioneer a systematic exploration of such risks associated with pre-trained language encoders, specifically focusing on the membership leakage of pre-training… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: ECAI24

  3. arXiv:2408.10826  [pdf, other

    cs.DC

    NeuLite: Memory-Efficient Federated Learning via Elastic Progressive Training

    Authors: Yebo Wu, Li Li, Chunlin Tian, Dubing Chen, Chengzhong Xu

    Abstract: Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, intensive memory footprint during the training process severely bottlenecks the deployment of FL on resource-constrained devices in real-world cases. In this paper, we propose NeuLite, a framework that breaks the memory wall throug… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.09460  [pdf, other

    cs.CV

    Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

    Authors: Weijia Li, Jinhua Yu, Dairong Chen, Yi Lin, Runming Dong, Xiang Zhang, Conghui He, Haohuan Fu

    Abstract: In this work, we propose a geometry-aware semi-supervised method for fine-grained building function recognition. This method leverages the geometric relationships between multi-source data to improve the accuracy of pseudo labels in semi-supervised learning, extending the task's scope and making it applicable to cross-categorization systems of building function recognition. Firstly, we design an o… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: This paper is currently under review

  5. arXiv:2408.08769  [pdf, other

    cs.CL

    Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused

    Authors: Dingwei Chen, Feiteng Fang, Shiwen Ni, Feng Liang, Ruifeng Xu, Min Yang, Chengming Li

    Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across various natural language processing tasks, yet they occasionally tend to yield content that factually inaccurate or discordant with the expected output, a phenomenon empirically referred to as "hallucination". To tackle this issue, recent works have investigated contrastive decoding between the original model and an amat… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures, 5 tables

  6. arXiv:2408.07703  [pdf, other

    cs.CV

    Knowledge Distillation with Refined Logits

    Authors: Wujie Sun, Defang Chen, Siwei Lyu, Genlang Chen, Chun Chen, Can Wang

    Abstract: Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods. Our approach is motivated by the observation that even high-performing teacher models can make incorrect… ▽ More

    Submitted 19 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures

  7. arXiv:2408.07297  [pdf, other

    physics.soc-ph cs.LG cs.MA

    Estimate collective cooperativeness of driving agents in mixed traffic flow

    Authors: Di Chen, Jia Li, H. Michael Zhang

    Abstract: Cooperation is a ubiquitous phenomenon in many natural, social, and engineered systems that contain multiple agents. Characterizing and quantifying cooperativeness of driving agents is of interest and significance for two reasons. Theoretically, it will enhance the understanding of micro-macro connections and emergence of cooperation in mixed traffic. Pragmatically, this understanding will benefit… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

  8. arXiv:2408.05713  [pdf, other

    cs.CV

    SSL: A Self-similarity Loss for Improving Generative Image Super-resolution

    Authors: Du Chen, Zhengqiang Zhang, Jie Liang, Lei Zhang

    Abstract: Generative adversarial networks (GAN) and generative diffusion models (DM) have been widely used in real-world image super-resolution (Real-ISR) to enhance the image perceptual quality. However, these generative models are prone to generating visual artifacts and false image structures, resulting in unnatural Real-ISR results. Based on the fact that natural images exhibit high self-similarities, i… ▽ More

    Submitted 18 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  9. arXiv:2408.04628  [pdf, other

    cs.CL cs.AI cs.CV

    LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP

    Authors: Danlu Chen, Freda Shi, Aditi Agarwal, Jacobo Myerston, Taylor Berg-Kirkpatrick

    Abstract: Standard natural language processing (NLP) pipelines operate on symbolic representations of language, which typically consist of sequences of discrete tokens. However, creating an analogous representation for ancient logographic writing systems is an extremely labor intensive process that requires expert knowledge. At present, a large portion of logographic data persists in a purely visual form du… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Journal ref: ACL 2024, long paper

  10. arXiv:2408.04594  [pdf, other

    cs.CV cs.AI

    Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

    Authors: Qirui Jiao, Daoyuan Chen, Yilun Huang, Yaliang Li, Ying Shen

    Abstract: High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and image difference captioning. By analyzing object differences between similar images, we challenge models to identify both matching and distinct c… ▽ More

    Submitted 9 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures, 7 tables

  11. arXiv:2408.03230  [pdf, other

    cs.CV

    Contrastive Learning for Image Complexity Representation

    Authors: Shipeng Liu, Liang Zhao, Dengfeng Chen, Zhanping Song

    Abstract: Quantifying and evaluating image complexity can be instrumental in enhancing the performance of various computer vision tasks. Supervised learning can effectively learn image complexity features from well-annotated datasets. However, creating such datasets requires expensive manual annotation costs. The models may learn human subjective biases from it. In this work, we introduce the MoCo v2 framew… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  12. arXiv:2408.03017  [pdf, other

    cs.RO

    Closed-Loop Magnetic Control of Medical Soft Continuum Robots for Deflection

    Authors: Zhiwei Wu, Siyi Wei, Zhanxin Geng, Jinhui Zhang, Duanduan Chen

    Abstract: Magnetic soft continuum robots (MSCRs) have emerged as powerful devices in endovascular interventions owing to their hyperelastic fibre matrix and enhanced magnetic manipulability. Effective closed-loop control of tethered magnetic devices contributes to the achievement of autonomous vascular robotic surgery. In this article, we employ a magnetic actuation system equipped with a single rotatable p… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  13. arXiv:2408.01014  [pdf, other

    cs.CV

    EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts

    Authors: Die Chen, Zhiwen Li, Mingyuan Fan, Cen Chen, Wenmeng Zhou, Yaliang Li

    Abstract: Text-to-image diffusion models have shown the ability to learn a diverse range of concepts. However, it is worth noting that they may also generate undesirable outputs, consequently giving rise to significant security concerns. Specifically, issues such as Not Safe for Work (NSFW) content and potential violations of style copyright may be encountered. Since image generation is conditioned on text,… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  14. arXiv:2408.00621  [pdf, other

    cs.NI

    CAVE: Crowdsourcing Passing-By Vehicles for Reliable In-Vehicle Edge Computing

    Authors: Jiahe Cao, Qiang Liu, Dawei Chen, Kyungtae Han

    Abstract: In-vehicle edge computing is a much anticipated paradigm to serve ever-increasing computation demands originated from the ego vehicle, such as passenger entertainments. In this paper, we explore the unique idea of crowdsourcing passing-by vehicles to augment computing of the ego vehicle. The challenges lie in the high dynamics of passing-by vehicles, time-correlated task computation, and the strin… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: This paper is accepted by IEEE GLOBECOM 2024

  15. arXiv:2408.00601  [pdf, other

    cs.LG

    AutoPV: Automatically Design Your Photovoltaic Power Forecasting Model

    Authors: Dayin Chen, Xiaodan Shi, Mingkun Jiang, Haoran Zhang, Dongxiao Zhang, Yuntian Chen, Jinyue Yan

    Abstract: Photovoltaic power forecasting (PVPF) is a critical area in time series forecasting (TSF), enabling the efficient utilization of solar energy. With advancements in machine learning and deep learning, various models have been applied to PVPF tasks. However, constructing an optimal predictive architecture for specific PVPF tasks remains challenging, as it requires cross-domain knowledge and signific… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  16. arXiv:2407.21475  [pdf, other

    cs.CV cs.AI

    Fine-gained Zero-shot Video Sampling

    Authors: Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu

    Abstract: Incorporating a temporal dimension into pretrained image diffusion models for video generation is a prevalent approach. However, this method is computationally demanding and necessitates large-scale video datasets. More critically, the heterogeneity between image and video datasets often results in catastrophic forgetting of the image expertise. Recent attempts to directly extract video snippets f… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  17. arXiv:2407.21428  [pdf, other

    cs.GR cs.AI

    Deformable 3D Shape Diffusion Model

    Authors: Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu

    Abstract: The Gaussian diffusion model, initially designed for image generation, has recently been adapted for 3D point cloud generation. However, these adaptations have not fully considered the intrinsic geometric characteristics of 3D shapes, thereby constraining the diffusion model's potential for 3D shape manipulation. To address this limitation, we introduce a novel deformable 3D shape diffusion model… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  18. arXiv:2407.21333  [pdf, other

    cs.CV

    Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM

    Authors: Can Wang, Hongliang Zhong, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

    Abstract: Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the feedback-driven refinement essential for interactive user engagement. We introduce Chat2Layout, a novel interactive furniture layout generation system… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Main paper with supplemental materials

  19. arXiv:2407.20228  [pdf, other

    cs.CV

    FlexAttention for Efficient High-Resolution Vision-Language Models

    Authors: Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan

    Abstract: Current high-resolution vision-language models encode images as high-resolution image tokens and exhaustively take all these tokens to compute attention, which significantly increases the computational cost. To address this problem, we propose FlexAttention, a flexible attention mechanism for efficient high-resolution vision-language models. Specifically, a high-resolution image is encoded both as… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  20. arXiv:2407.19763  [pdf, other

    eess.IV cs.CV

    TeleOR: Real-time Telemedicine System for Full-Scene Operating Room

    Authors: Yixuan Wu, Kaiyuan Hu, Qian Shao, Jintai Chen, Danny Z. Chen, Jian Wu

    Abstract: The advent of telemedicine represents a transformative development in leveraging technology to extend the reach of specialized medical expertise to remote surgeries, a field where the immediacy of expert guidance is paramount. However, the intricate dynamics of Operating Room (OR) scene pose unique challenges for telemedicine, particularly in achieving high-fidelity, real-time scene reconstruction… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  21. arXiv:2407.19405  [pdf, other

    cs.AI

    Logic Distillation: Learning from Code Function by Function for Planning and Decision-making

    Authors: Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu

    Abstract: Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (S-LLMs) that can be deployed on a variety of devices. Knowledge distillation (KD) aims to empower S-LLMs with the capabilities of L-LLMs, while S-LLMs… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  22. arXiv:2407.18940  [pdf, other

    cs.IR cs.AI cs.CL cs.DL cs.LG

    LitSearch: A Retrieval Benchmark for Scientific Literature Search

    Authors: Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, Tianyu Gao

    Abstract: Literature search questions, such as "where can I find research on the evaluation of consistency in generated summaries?" pose significant challenges for modern search engines and retrieval systems. These questions often require a deep understanding of research concepts and the ability to reason over entire articles. In this work, we introduce LitSearch, a retrieval benchmark comprising 597 realis… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Dataset and code available at https://github.com/princeton-nlp/LitSearch

  23. arXiv:2407.15427  [pdf, other

    cs.CV cs.AI

    YOLO-pdd: A Novel Multi-scale PCB Defect Detection Method Using Deep Representations with Sequential Images

    Authors: Bowen Liu, Dongjie Chen, Xiao Qi

    Abstract: With the rapid growth of the PCB manufacturing industry, there is an increasing demand for computer vision inspection to detect defects during production. Improving the accuracy and generalization of PCB defect detection models remains a significant challenge. This paper proposes a high-precision, robust, and real-time end-to-end method for PCB defect detection based on deep Convolutional Neural N… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  24. arXiv:2407.14065  [pdf, other

    cs.LG stat.ML

    MSCT: Addressing Time-Varying Confounding with Marginal Structural Causal Transformer for Counterfactual Post-Crash Traffic Prediction

    Authors: Shuang Li, Ziyuan Pu, Nan Zhang, Duxin Chen, Lu Dong, Daniel J. Graham, Yinhai Wang

    Abstract: Traffic crashes profoundly impede traffic efficiency and pose economic challenges. Accurate prediction of post-crash traffic status provides essential information for evaluating traffic perturbations and developing effective solutions. Previous studies have established a series of deep learning models to predict post-crash traffic conditions, however, these correlation-based methods cannot accommo… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  25. arXiv:2407.12883  [pdf, other

    cs.CL cs.AI cs.IR

    BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

    Authors: Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu

    Abstract: Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 50 pages

  26. arXiv:2407.12687  [pdf, other

    cs.CY cs.AI cs.LG

    Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

    Authors: Irina Jurenka, Markus Kunesch, Kevin R. McKee, Daniel Gillick, Shaojian Zhu, Sara Wiltberger, Shubham Milind Phal, Katherine Hermann, Daniel Kasenberg, Avishkar Bhoopchand, Ankit Anand, Miruna Pîslar, Stephanie Chan, Lisa Wang, Jennifer She, Parsa Mahmoudieh, Aliya Rysbek, Wei-Jen Ko, Andrea Huber, Brett Wiltshire, Gal Elidan, Roni Rabin, Jasmin Rubinovitz, Amit Pitaru, Mac McAllister , et al. (49 additional authors not shown)

    Abstract: A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily… ▽ More

    Submitted 19 July, 2024; v1 submitted 21 May, 2024; originally announced July 2024.

  27. arXiv:2407.11890  [pdf, other

    cs.CV

    DepGAN: Leveraging Depth Maps for Handling Occlusions and Transparency in Image Composition

    Authors: Amr Ghoneim, Jiju Poovvancheri, Yasushi Akiyama, Dong Chen

    Abstract: Image composition is a complex task which requires a lot of information about the scene for an accurate and realistic composition, such as perspective, lighting, shadows, occlusions, and object interactions. Previous methods have predominantly used 2D information for image composition, neglecting the potentials of 3D spatial information. In this work, we propose DepGAN, a Generative Adversarial Ne… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 10 pages, 13 figures

  28. arXiv:2407.11784  [pdf, other

    cs.AI cs.CV cs.LG

    Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

    Authors: Daoyuan Chen, Haibin Wang, Yilun Huang, Ce Ge, Yaliang Li, Bolin Ding, Jingren Zhou

    Abstract: The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we pre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 26 pages, 9 figures, 5 tables

  29. arXiv:2407.10949  [pdf, other

    cs.CL cs.AI cs.LG

    Representing Rule-based Chatbots with Transformers

    Authors: Dan Friedman, Abhishek Panigrahi, Danqi Chen

    Abstract: Transformer-based chatbots can conduct fluent, natural-sounding conversations, but we have limited understanding of the mechanisms underlying their behavior. Prior work has taken a bottom-up approach to understanding Transformers by constructing Transformers for various synthetic and formal language tasks, such as regular expressions and Dyck languages. However, it is not obvious how to extend thi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/princeton-nlp/ELIZA-Transformer

  30. arXiv:2407.10310  [pdf, other

    cs.CY eess.SY

    Impact of Different Infrastructures and Traffic Scenarios on Behavioral and Physiological Responses of E-scooter Users

    Authors: Dong Chen, Arman Hosseini, Arik Smith, David Xiang, Arsalan Heydarian, Omid Shoghli, Bradford Campbell

    Abstract: As micromobility devices such as e-scooters gain global popularity, emergency departments around the world have observed a rising trend in related injuries. However, the majority of current research on e-scooter safety relies heavily on surveys, news reports, and data from vendors, with a noticeable scarcity of naturalistic studies examining the effects of riders' behaviors and physiological respo… ▽ More

    Submitted 5 May, 2024; originally announced July 2024.

    Comments: 6 pages, 8 figures

  31. arXiv:2407.10031  [pdf, other

    cs.RO cs.MA

    Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

    Authors: Siddharth Nayak, Adelmo Morrison Orozco, Marina Ten Have, Vittal Thirumalai, Jackson Zhang, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, James Harrison, Brian Ichter, Anuj Mahajan, Hamsa Balakrishnan

    Abstract: The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base. However, LMs in t… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 27 pages, 4 figures, 5 tables

  32. arXiv:2407.09790  [pdf, other

    cs.LG

    Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

    Authors: Jiahuan Yan, Jintai Chen, Qianxing Wang, Danny Z. Chen, Jian Wu

    Abstract: Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective mo… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024 Research Track, codes will be available at https://github.com/jyansir/tmlp

  33. arXiv:2407.08953  [pdf, ps, other

    q-fin.CP cs.LG

    Attribution Methods in Asset Pricing: Do They Account for Risk?

    Authors: Dangxing Chen, Yuan Gao

    Abstract: Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Cons… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)

  34. arXiv:2407.08583  [pdf, other

    cs.AI cs.CV cs.LG

    The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

    Authors: Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng

    Abstract: The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the broader range of application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the impo… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Ongoing work. 21 pages. Related materials are continually maintained and available at https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md

  35. arXiv:2407.08233  [pdf, other

    cs.LG

    Differentially Private Neural Network Training under Hidden State Assumption

    Authors: Ding Chen, Chen Liu

    Abstract: We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific la… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.06938  [pdf, other

    cs.CV

    RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

    Authors: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo

    Abstract: We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a nov… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://rodinhd.github.io/

  37. arXiv:2407.06573  [pdf, other

    cs.SE

    LLM for Mobile: An Initial Roadmap

    Authors: Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawendé F. Bissyandé, Jacques Klein, Li Li

    Abstract: When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  38. arXiv:2407.04681  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

    Authors: Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark, Philip Torr, Lu Yuan

    Abstract: In recent years, multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets, enabling them to generally understand images well. However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs, limiting their ability to answer questions requiring… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  39. arXiv:2407.03282  [pdf, other

    cs.CL

    LLM Internal States Reveal Hallucination Risk Faced With a Query

    Authors: Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Yejin Bang, Bryan Wilie, Pascale Fung

    Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadl… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  40. arXiv:2407.01906  [pdf, other

    cs.CL cs.AI cs.LG

    Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

    Authors: Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu

    Abstract: Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefol… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  41. arXiv:2407.01875  [pdf, ps, other

    cs.AI

    Spatio-Temporal Graphical Counterfactuals: An Overview

    Authors: Mingyu Kang, Duxin Chen, Ziyuan Pu, Jianxi Gao, Wenwu Yu

    Abstract: Counterfactual thinking is a critical yet challenging topic for artificial intelligence to learn knowledge from data and ultimately improve their performances for new scenarios. Many research works, including Potential Outcome Model and Structural Causal Model, have been proposed to realize it. However, their modelings, theoretical foundations and application approaches are usually different. More… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  42. arXiv:2407.01505  [pdf, other

    cs.CL cs.AI

    Self-Cognition in Large Language Models: An Exploratory Study

    Authors: Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Large Language Models and Cognition Workshop

  43. arXiv:2407.01436  [pdf, other

    cs.CV cs.RO

    AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction

    Authors: Dubing Chen, Wencheng Han, Jin Fang, Jianbing Shen

    Abstract: In this technical report, we present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ Dataset Challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Initially, we independently train the occupancy model,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 2nd Place in the 3D Occupancy and Flow Prediction Challenge (CVPR24)

  44. arXiv:2407.00668  [pdf, other

    cs.CL

    HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

    Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

    Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More

    Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  45. arXiv:2406.19485  [pdf, other

    eess.IV cs.CV

    GAPNet: Granularity Attention Network with Anatomy-Prior-Constraint for Carotid Artery Segmentation

    Authors: Lin Zhang, Chenggang Lu, Xin-yang Shi, Caifeng Shan, Jiong Zhang, Da Chen, Laurent D. Cohen

    Abstract: Atherosclerosis is a chronic, progressive disease that primarily affects the arterial walls. It is one of the major causes of cardiovascular disease. Magnetic Resonance (MR) black-blood vessel wall imaging (BB-VWI) offers crucial insights into vascular disease diagnosis by clearly visualizing vascular structures. However, the complex anatomy of the neck poses challenges in distinguishing the carot… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  46. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  47. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  48. arXiv:2406.16778  [pdf, other

    cs.CL

    Finding Transformer Circuits with Edge Pruning

    Authors: Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen

    Abstract: The path to interpreting a language model often proceeds via analysis of circuits -- sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they rely either on inefficient search algorithms or inaccurate approximations. In this paper, we frame automated… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/princeton-nlp/Edge-Pruning

  49. arXiv:2406.15480  [pdf, other

    cs.CL cs.AI cs.LG

    On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

    Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit under review

  50. arXiv:2406.15479  [pdf, other

    cs.CL cs.AI cs.LG

    Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

    Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these i… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit in review