Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 6,113 results for author: Wang, Z

Searching in archive cs. Search in all archives.
.
  1. HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection

    Authors: Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Yuchen Sun, Qingming Huang

    Abstract: With the progressive advancements in deep graph learning, out-of-distribution (OOD) detection for graph data has emerged as a critical challenge. While the efficacy of auxiliary datasets in enhancing OOD detection has been extensively studied for image and text data, such approaches have not yet been explored for graph data. Unlike Euclidean data, graph data exhibits greater diversity but lower ro… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 32nd ACM International Conference on Multimedia

  2. arXiv:2407.21530  [pdf, other

    cs.CL cs.LG

    Data Contamination Report from the 2024 CONDA Shared Task

    Authors: Oscar Sainz, Iker García-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, Pengfei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao , et al. (3 additional authors not shown)

    Abstract: The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in cur… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database

  3. arXiv:2407.21381  [pdf, other

    eess.IV cs.CV

    Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging

    Authors: Wenhua Wu, Kun Hu, Wenxi Yue, Wei Li, Milena Simic, Changyang Li, Wei Xiang, Zhiyong Wang

    Abstract: Knee osteoarthritis (KOA), a common form of arthritis that causes physical disability, has become increasingly prevalent in society. Employing computer-aided techniques to automatically assess the severity and progression of KOA can greatly benefit KOA treatment and disease management. Particularly, the advancement of X-ray technology in KOA demonstrates its potential for this purpose. Yet, existi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  4. arXiv:2407.21217  [pdf, other

    cs.LG physics.flu-dyn

    NeuroSEM: A hybrid framework for simulating multiphysics problems by coupling PINNs and spectral elements

    Authors: Khemraj Shukla, Zongren Zou, Chi Hin Chan, Additi Pandey, Zhicheng Wang, George Em Karniadakis

    Abstract: Multiphysics problems that are characterized by complex interactions among fluid dynamics, heat transfer, structural mechanics, and electromagnetics, are inherently challenging due to their coupled nature. While experimental data on certain state variables may be available, integrating these data with numerical solvers remains a significant challenge. Physics-informed neural networks (PINNs) have… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  5. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  6. arXiv:2407.21033  [pdf, other

    cs.IR cs.AI cs.CL cs.CV

    Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition

    Authors: Jielong Tang, Zhenxing Wang, Ziyang Gong, Jianxing Yu, Shuang Wang, Jian Yin

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) is an emerging information extraction (IE) task, aiming to simultaneously extract entity spans, types, and entity-matched bounding box groundings in images from given sentence-image pairs data. Recent unified methods employing machine reading comprehension (MRC-based) frameworks or sequence generation-based models face challenges in understandin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures

  7. Matting by Generation

    Authors: Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh

    Abstract: This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior re… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH'24, Project page: https://lightchaserx.github.io/matting-by-generation/

  8. arXiv:2407.20585  [pdf, other

    cs.NI eess.SP

    A UAV-Enabled Time-Sensitive Data Collection Scheme for Grassland Monitoring Edge Networks

    Authors: Dongbin Jiao, Zihao Wang, Wen Fan, Weibo Yang, Peng Yang, Zhanhuan Shang, Shi Yan

    Abstract: Grassland monitoring is essential for the sustainable development of grassland resources. Traditional Internet of Things (IoT) devices generate critical ecological data, making data loss unacceptable, but the harsh environment complicates data collection. Unmanned Aerial Vehicle (UAV) and mobile edge computing (MEC) offer efficient data collection solutions, enhancing performance on resource-limit… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.20505  [pdf, other

    cs.CV

    Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate

    Authors: Zheng Lin, Zhenxing Niu, Zhibin Wang, Yinghui Xu

    Abstract: MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination. Previous methods focus on determining whether a generated output is hallucinated, without identifying which image region leads to the hallucination or interpreting why such hallucinations occur. In this paper, we argue that hallucination in MLLMs is partially due to a lack of slow-thinki… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  10. arXiv:2407.20251  [pdf, other

    eess.SP cond-mat.mtrl-sci cs.LG

    An Uncertainty-aware Deep Learning Framework-based Robust Design Optimization of Metamaterial Units

    Authors: Zihan Wang, Anindya Bhaduri, Hongyi Xu, Liping Wang

    Abstract: Mechanical metamaterials represent an innovative class of artificial structures, distinguished by their extraordinary mechanical characteristics, which are beyond the scope of traditional natural materials. The use of deep generative models has become increasingly popular in the design of metamaterial units. The effectiveness of using deep generative models lies in their capacity to compress compl… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  11. arXiv:2407.20157  [pdf, other

    cs.AI

    rLLM: Relational Table Learning with LLMs

    Authors: Weichen Li, Xiaotong Huang, Jianwu Zheng, Zheng Wang, Chaokun Wang, Li Pan, Jianhua Li

    Abstract: We introduce rLLM (relationLLM), a PyTorch library designed for Relational Table Learning (RTL) with Large Language Models (LLMs). The core idea is to decompose state-of-the-art Graph Neural Networks, LLMs, and Table Neural Networks into standardized modules, to enable the fast construction of novel RTL-type models in a simple "combine, align, and co-train" manner. To illustrate the usage of rLLM,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  12. arXiv:2407.20018  [pdf, other

    cs.DC

    Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

    Authors: Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun

    Abstract: Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with their sophisticated capabilities. Training these models requires vast GPU clusters and significant computing time, posing major challenges in terms of scalability, efficiency, and reliability. This survey explores recent advancements in training systems for LLMs, including innovations in training infrastructur… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  13. arXiv:2407.19841  [pdf, other

    eess.SP cs.AR

    RRAM-Based Bio-Inspired Circuits for Mobile Epileptic Correlation Extraction and Seizure Prediction

    Authors: Hao Wang, Lingfeng Zhang, Erjia Xiao, Xin Wang, Zhongrui Wang, Renjing Xu

    Abstract: Non-invasive mobile electroencephalography (EEG) acquisition systems have been utilized for long-term monitoring of seizures, yet they suffer from limited battery life. Resistive random access memory (RRAM) is widely used in computing-in-memory(CIM) systems, which offers an ideal platform for reducing the computational energy consumption of seizure prediction algorithms, potentially solving the en… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  14. arXiv:2407.19740  [pdf, other

    cs.CL cs.AI

    KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogical Argument Mining

    Authors: Zihao Zheng, Zhaowei Wang, Qing Zong, Yangqiu Song

    Abstract: Dialogical Argument Mining(DialAM) is an important branch of Argument Mining(AM). DialAM-2024 is a shared task focusing on dialogical argument mining, which requires us to identify argumentative relations and illocutionary relations among proposition nodes and locution nodes. To accomplish this, we propose a two-stage pipeline, which includes the Two-Step S-Node Prediction Model in Stage 1 and the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Published on the 11th Workshop on Argument Mining

  15. arXiv:2407.19711  [pdf, other

    cs.SE

    TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data

    Authors: Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li

    Abstract: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional fai… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 30 pages

  16. arXiv:2407.19686  [pdf, other

    cs.CE

    Billiards Sports Analytics: Datasets and Tasks

    Authors: Qianru Zhang, Zheng Wang, Cheng Long, Siu-Ming Yiu

    Abstract: Nowadays, it becomes a common practice to capture some data of sports games with devices such as GPS sensors and cameras and then use the data to perform various analyses on sports games, including tactics discovery, similar game retrieval, performance study, etc. While this practice has been conducted to many sports such as basketball and soccer, it remains largely unexplored on the billiards spo… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 27 pages; This paper is accepted by TKDD'2024

    Journal ref: Transactions on Knowledge Discovery from Data 2024

  17. arXiv:2407.19655  [pdf, ps, other

    cs.AI

    AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias

    Authors: Sribala Vidyadhari Chinta, Zichong Wang, Xingyu Zhang, Thang Doan Viet, Ayesha Kashif, Monique Antoinette Smith, Wenbin Zhang

    Abstract: Artificial intelligence (AI) is rapidly advancing in healthcare, enhancing the efficiency and effectiveness of services across various specialties, including cardiology, ophthalmology, dermatology, emergency medicine, etc. AI applications have significantly improved diagnostic accuracy, treatment personalization, and patient outcome predictions by leveraging technologies such as machine learning,… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  18. arXiv:2407.19485  [pdf, other

    eess.AS cs.SD

    ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement

    Authors: Zhong-Qiu Wang

    Abstract: The current dominant approach for neural speech enhancement is via purely-supervised deep learning on simulated pairs of far-field noisy-reverberant speech (i.e., mixtures) and clean speech. The trained models, however, often exhibit limited generalizability to real-recorded mixtures. To deal with this, this paper investigates training enhancement models directly on real mixtures. However, a major… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: in submission

  19. FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning

    Authors: Jinhui Pang, Changqing Lin, Xiaoshuai Hao, Rong Yin, Zixuan Wang, Zhihui Zhang, Jinglin He, Huang Tai Sheng

    Abstract: Continual graph learning (CGL) is an important and challenging task that aims to extend static GNNs to dynamic task flow scenarios. As one of the mainstream CGL methods, the experience replay (ER) method receives widespread attention due to its superior performance. However, existing ER methods focus on identifying samples by feature significance or topological relevance, which limits their utiliz… ▽ More

    Submitted 31 July, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  20. ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

    Authors: Peiming Li, Ziyi Wang, Mengyuan Liu, Hong Liu, Chen Chen

    Abstract: Grasp generation aims to create complex hand-object interactions with a specified object. While traditional approaches for hand generation have primarily focused on visibility and diversity under scene constraints, they tend to overlook the fine-grained hand-object interactions such as contacts, resulting in inaccurate and undesired grasps. To address these challenges, we propose a controllable gr… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: ACM Multimedia 2024

  21. arXiv:2407.19323  [pdf, other

    cs.CV

    MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo

    Authors: Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang

    Abstract: Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch. Although certain methods employ patch deformation to expand the receptive field, their patches mistakenly skip depth edges to calculate areas with depth discontinuity, thereby causing ambiguity. Consequently, we introduce Multi-granularity Segmentation Prior Multi-View… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  22. arXiv:2407.19244  [pdf, other

    cs.CV cs.MM

    Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach

    Authors: Penghui Wen, Kun Hu, Dong Yuan, Zhiyuan Ning, Changyang Li, Zhiyong Wang

    Abstract: Radio frequency (RF) signals have been proved to be flexible for human silhouette segmentation (HSS) under complex environments. Existing studies are mainly based on a one-shot approach, which lacks a coherent projection ability from the RF domain. Additionally, the spatio-temporal patterns have not been fully explored for human motion dynamics in HSS. Therefore, we propose a two-stage Sequential… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  23. arXiv:2407.19094  [pdf, other

    cs.AI cs.RO

    Solving Robotics Problems in Zero-Shot with Vision-Language Models

    Authors: Zidan Wang, Rui Shen, Bradly Stadie

    Abstract: We introduce Wonderful Team, a multi-agent visual LLM (VLLM) framework for solving robotics problems in the zero-shot regime. By zero-shot we mean that, for a novel environment, we feed a VLLM an image of the robot's environment and a description of the task, and have the VLLM output the sequence of actions necessary for the robot to complete the task. Prior work on VLLMs in robotics has largely f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: aka Wonderful Team

  24. arXiv:2407.19079  [pdf, other

    cs.CV

    UniForensics: Face Forgery Detection via General Facial Representation

    Authors: Ziyuan Fang, Hanqing Zhao, Tianyi Wei, Wenbo Zhou, Ming Wan, Zhanyi Wang, Weiming Zhang, Nenghai Yu

    Abstract: Previous deepfake detection methods mostly depend on low-level textural features vulnerable to perturbations and fall short of detecting unseen forgery methods. In contrast, high-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. Motivated by this, we propose a detection method that utilizes high-level s… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  25. arXiv:2407.19056  [pdf, other

    cs.CL

    OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

    Authors: Zilong Wang, Yuedong Cui, Li Zhong, Zimin Zhang, Da Yin, Bill Yuchen Lin, Jingbo Shang

    Abstract: Office automation significantly enhances human productivity by automatically finishing routine tasks in the workflow. Beyond the basic information extraction studied in much of the prior document AI literature, the office automation research should be extended to more realistic office tasks which require to integrate various information sources in the office system and produce outputs through a se… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Preprint

  26. arXiv:2407.18961  [pdf, other

    cs.AI

    MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

    Authors: Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

    Abstract: Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern… ▽ More

    Submitted 30 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  27. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 30 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  28. arXiv:2407.18877  [pdf, other

    cs.SE

    Code Structure-Aware through Line-level Semantic Learning for Code Vulnerability Detection

    Authors: Ziliang Wang, Ge Li, Jia Li, Yihong Dong, Yingfei Xiong, Zhi Jin

    Abstract: Different from the flow semantics of natural languages, programming languages are inherently rigid in structure and grammar. Existing fine-tuning methodologies for code vulnerability detection generally treat code as long text sequences, stripping away structural elements such as newlines ('/n') and whitespace. However, this approach inadvertently results in the loss of crucial structural informat… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  29. arXiv:2407.18745  [pdf, other

    cs.LG

    FairAIED: Navigating Fairness, Bias, and Ethics in Educational AI Applications

    Authors: Sribala Vidyadhari Chinta, Zichong Wang, Zhipeng Yin, Nhat Hoang, Matthew Gonzalez, Tai Le Quy, Wenbin Zhang

    Abstract: The integration of Artificial Intelligence (AI) into education has transformative potential, providing tailored learning experiences and creative instructional approaches. However, the inherent biases in AI algorithms hinder this improvement by unintentionally perpetuating prejudice against specific demographics, especially in human-centered applications like education. This survey delves deeply i… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  30. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  31. HICEScore: A Hierarchical Metric for Image Captioning Evaluation

    Authors: Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen

    Abstract: Image captioning evaluation metrics can be divided into two categories, reference-based metrics and reference-free metrics. However, reference-based approaches may struggle to evaluate descriptive captions with abundant visual details produced by advanced multimodal large language models, due to their heavy reliance on limited human-annotated references. In contrast, previous reference-free metric… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM2024

  32. arXiv:2407.18531  [pdf, ps, other

    eess.SP cs.IT

    Optimal Bilinear Equalizer for Cell-Free Massive MIMO Systems over Correlated Rician Channels

    Authors: Zhe Wang, Jiayi Zhang, Emil Björnson, Dusit Niyato, Bo Ai

    Abstract: In this paper, we explore the low-complexity optimal bilinear equalizer (OBE) combining scheme design for cell-free massive multiple-input multiple-output networks with spatially correlated Rician fading channels. We provide a spectral efficiency (SE) performance analysis framework for both the centralized and distributed processing schemes with bilinear equalizer (BE)-structure combining schemes… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures, submitted to IEEE Trans

  33. arXiv:2407.18525  [pdf, other

    cs.CL cs.AI cs.LG

    Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

    Authors: Yinghao Zhu, Junyi Gao, Zixiang Wang, Weibin Liao, Xiaochen Zheng, Lifang Liang, Yasha Wang, Chengwei Pan, Ewen M. Harrison, Liantao Ma

    Abstract: The use of Large Language Models (LLMs) in medicine is growing, but their ability to handle both structured Electronic Health Record (EHR) data and unstructured clinical notes is not well-studied. This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models, for non-generative medical tasks utilizing renowned datasets. We assessed 14… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.01713

  34. arXiv:2407.18500  [pdf, other

    cs.CV

    Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations

    Authors: Zipeng Wang, Yunfan Lu, Lin Wang

    Abstract: Reconstructing intensity frames from event data while maintaining high temporal resolution and dynamic range is crucial for bridging the gap between event-based and frame-based computer vision. Previous approaches have depended on supervised learning on synthetic data, which lacks interpretability and risk over-fitting to the setting of the event simulator. Recently, self-supervised learning (SSL)… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Journal ref: ECCV2024

  35. arXiv:2407.18454  [pdf, other

    cs.CL cs.AI cs.LG

    Fairness Definitions in Language Models Explained

    Authors: Thang Viet Doan, Zhibo Chu, Zichong Wang, Wenbin Zhang

    Abstract: Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairne… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  36. arXiv:2407.18242  [pdf, other

    cs.LG cs.AI cs.CL

    LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

    Authors: Zhengbo Wang, Jian Liang

    Abstract: Low-Rank Adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning foundation models by re-parameterizing the original matrix into the product of two low-rank matrices. Despite its efficiency, LoRA often yields inferior performance compared to full fine-tuning. In this paper, we propose LoRA-Pro to bridge this performance gap. Firstly, we delve into the… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  37. arXiv:2407.18015  [pdf, other

    cs.GR

    Uncertainty Visualization of Critical Points of 2D Scalar Fields for Parametric and Nonparametric Probabilistic Models

    Authors: Tushar M. Athawale, Zhe Wang, David Pugmire, Kenneth Moreland, Qian Gong, Scott Klasky, Chris R. Johnson, Paul Rosen

    Abstract: This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), howev… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 9 pages paper + 2 page references, 8 figures, IEEE VIS 2024 paper to be published as a special issue of IEEE Transactions on Visualization and Computer Graphics (TVCG)

  38. arXiv:2407.17869  [pdf, other

    cs.LG

    EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

    Authors: Yiming Ma, Xinjie Li, Xin Sun, Zhiyong Wang, Lionel Z. Wang

    Abstract: Ellipsometry is used to indirectly measure the optical properties and thickness of thin films. However, solving the inverse problem of ellipsometry is time-consuming since it involves human expertise to apply the data fitting techniques. Many studies use traditional machine learning-based methods to model the complex mathematical fitting process. In our work, we approach this problem from a deep l… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  39. arXiv:2407.17734  [pdf, other

    cs.AI cs.CL cs.CV

    Cost-effective Instruction Learning for Pathology Vision and Language Analysis

    Authors: Kaitao Chen, Mianxin Liu, Fang Yan, Lei Ma, Xiaoming Shi, Lilong Wang, Xiaosong Wang, Lifeng Zhu, Zhe Wang, Mu Zhou, Shaoting Zhang

    Abstract: The advent of vision-language models fosters the interactive conversations between AI-enabled models and humans. Yet applying these models into clinics must deal with daunting challenges around large-scale training data, financial, and computational resources. Here we propose a cost-effective instruction learning framework for conversational pathology named as CLOVER. CLOVER only trains a lightwei… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  40. arXiv:2407.17731  [pdf, other

    econ.GN cs.GT cs.LG

    Optimal Trade and Industrial Policies in the Global Economy: A Deep Learning Framework

    Authors: Zi Wang, Xingcheng Xu, Yanqing Yang, Xiaodong Zhu

    Abstract: We propose a deep learning framework, DL-opt, designed to efficiently solve for optimal policies in quantifiable general equilibrium trade models. DL-opt integrates (i) a nested fixed point (NFXP) formulation of the optimization problem, (ii) automatic implicit differentiation to enhance gradient descent for solving unilateral optimal policies, and (iii) a best-response dynamics approach for findi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  41. arXiv:2407.17460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

    Authors: Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K. Roy-Chowdhury, Jiachen Li

    Abstract: Reinforcement Learning (RL) has enabled social robots to generate trajectories without human-designed rules or interventions, which makes it more effective than hard-coded systems for generalizing to complex real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians while previous RL-based solutions fall short in safety perf… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Project website: https://sonic-social-nav.github.io/

  42. arXiv:2407.17438  [pdf, other

    cs.CV cs.AI cs.LG

    HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

    Authors: Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin

    Abstract: Human image animation involves generating videos from a character photo, allowing user control and unlocking potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance o… ▽ More

    Submitted 28 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: camera controllable human image animation, a dataset and a baseline

  43. arXiv:2407.16959  [pdf, other

    cs.LG

    Dynamic Graph Transformer with Correlated Spatial-Temporal Positional Encoding

    Authors: Zhe Wang, Sheng Zhou, Jiawei Chen, Zhen Zhang, Binbin Hu, Yan Feng, Chun Chen, Can Wang

    Abstract: Learning effective representations for Continuous-Time Dynamic Graphs (CTDGs) has garnered significant research interest, largely due to its powerful capabilities in modeling complex interactions between nodes. A fundamental and crucial requirement for representation learning in CTDGs is the appropriate estimation and preservation of proximity. However, due to the sparse and evolving characteristi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  44. arXiv:2407.16943  [pdf

    cs.CV

    McGAN: Generating Manufacturable Designs by Embedding Manufacturing Rules into Conditional Generative Adversarial Network

    Authors: Zhichao Wang, Xiaoliang Yan, Shreyes Melkote, David Rosen

    Abstract: Generative design (GD) methods aim to automatically generate a wide variety of designs that satisfy functional or aesthetic design requirements. However, research to date generally lacks considerations of manufacturability of the generated designs. To this end, we propose a novel GD approach by using deep neural networks to encode design for manufacturing (DFM) rules, thereby modifying part design… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  45. arXiv:2407.16837  [pdf, other

    cs.CV cs.AI cs.CL

    CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

    Authors: Jihyung Kil, Zheda Mai, Justin Lee, Zihe Wang, Kerrie Cheng, Lemeng Wang, Ye Liu, Arpita Chowdhury, Wei-Lun Chao

    Abstract: The ability to compare objects, scenes, or situations is crucial for effective decision-making and problem-solving in everyday life. For instance, comparing the freshness of apples enables better choices during grocery shopping, while comparing sofa designs helps optimize the aesthetics of our living space. Despite its significance, the comparative capability is largely unexplored in artificial ge… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  46. arXiv:2407.16822  [pdf, other

    cs.CV cs.AI

    AI-Enhanced 7-Point Checklist for Melanoma Detection Using Clinical Knowledge Graphs and Data-Driven Quantification

    Authors: Yuheng Wang, Tianze Yu, Jiayue Cai, Sunil Kalia, Harvey Lui, Z. Jane Wang, Tim K. Lee

    Abstract: The 7-point checklist (7PCL) is widely used in dermoscopy to identify malignant melanoma lesions needing urgent medical attention. It assigns point values to seven attributes: major attributes are worth two points each, and minor ones are worth one point each. A total score of three or higher prompts further evaluation, often including a biopsy. However, a significant limitation of current methods… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  47. arXiv:2407.16716  [pdf, ps, other

    cs.NE cs.CV cs.LG

    Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience

    Authors: Zeyu Wang, Weichen Dai, Xiangyu Zhou, Ji Qi, Yi Zhou

    Abstract: Vision Transformer and its variants have been adopted in many visual tasks due to their powerful capabilities, which also bring significant challenges in computation and storage. Consequently, researchers have introduced various compression methods in recent years, among which the pruning techniques are widely used to remove a significant fraction of the network. Therefore, these methods can reduc… ▽ More

    Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  48. arXiv:2407.16667  [pdf, other

    cs.CR cs.AI cs.CL

    RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

    Authors: Huiyu Xu, Wenhui Zhang, Zhibo Wang, Feng Xiao, Rui Zheng, Yunhe Feng, Zhongjie Ba, Kui Ren

    Abstract: Recently, advanced Large Language Models (LLMs) such as GPT-4 have been integrated into many real-world applications like Code Copilot. These applications have significantly expanded the attack surface of LLMs, exposing them to a variety of threats. Among them, jailbreak attacks that induce toxic responses through jailbreak prompts have raised critical safety concerns. To identify these threats, a… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  49. arXiv:2407.16641  [pdf, other

    cs.LG cs.AI

    A Geometry-Aware Algorithm to Learn Hierarchical Embeddings in Hyperbolic Space

    Authors: Zhangyu Wang, Lantian Xu, Zhifeng Kong, Weilong Wang, Xuyu Peng, Enyang Zheng

    Abstract: Hyperbolic embeddings are a class of representation learning methods that offer competitive performances when data can be abstracted as a tree-like graph. However, in practice, learning hyperbolic embeddings of hierarchical data is difficult due to the different geometry between hyperbolic space and the Euclidean space. To address such difficulties, we first categorize three kinds of illness that… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  50. arXiv:2407.16626  [pdf, other

    cs.SE

    A Tale of Two DL Cities: When Library Tests Meet Compiler

    Authors: Qingchao Shen, Yongqiang Tian, Haoyang Ma, Junjie Chen, Lili Huang, Ruifeng Fu, Shing-Chi Cheung, Zan Wang

    Abstract: Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common object… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by ICSE'2025