Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,792 results for author: Yang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13055  [pdf, other

    cs.CV

    Atlas Gaussians Diffusion for 3D Generation with Infinite Number of Points

    Authors: Haitao Yang, Yuan Dong, Hanwen Jiang, Dejia Xu, Georgios Pavlakos, Qixing Huang

    Abstract: Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:2408.12910  [pdf, other

    cs.AI

    What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

    Authors: Yilun Liu, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Duan Li, Jian Gao, Li Zhang, Hao Yang, Boxing Chen, Osamu Yoshie

    Abstract: The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  3. arXiv:2408.12775  [pdf, other

    cs.AI cs.AR

    Intelligent OPC Engineer Assistant for Semiconductor Manufacturing

    Authors: Guojin Chen, Haoyu Yang, Haoxing Ren, Bei Yu

    Abstract: Advancements in chip design and manufacturing have enabled the processing of complex tasks such as deep learning and natural language processing, paving the way for the development of artificial general intelligence (AGI). AI, on the other hand, can be leveraged to innovate and streamline semiconductor technology from planning and implementation to manufacturing. In this paper, we present \textit{… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  4. arXiv:2408.12056  [pdf, other

    cs.SE cs.AI

    Enhancing LLM-Based Automated Program Repair with Design Rationales

    Authors: Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang

    Abstract: Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  5. arXiv:2408.11855  [pdf, other

    cs.CL cs.AI cs.LG

    FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

    Authors: Zhongyu Zhao, Menghang Dong, Rongyu Zhang, Wenzhao Zheng, Yunpeng Zhang, Huanrui Yang, Dalong Du, Kurt Keutzer, Shanghang Zhang

    Abstract: Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge. Conventional methods frequently face challenges due to knowledge confusion stemming from their monolithic and redundant architectures, which calls for more efficient solutions with minimal computational overhead, particularly fo… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  6. arXiv:2408.11334  [pdf, other

    cs.CL cs.AI

    BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports

    Authors: Yuxuan Chen, Haoyan Yang, Hengkai Pan, Fardeen Siddiqui, Antonio Verdone, Qingyang Zhang, Sumit Chopra, Chen Zhao, Yiqiu Shen

    Abstract: Breast ultrasound is essential for detecting and diagnosing abnormalities, with radiology reports summarizing key findings like lesion characteristics and malignancy assessments. Extracting this critical information is challenging due to the unstructured nature of these reports, with varied linguistic styles and inconsistent formatting. While proprietary LLMs like GPT-4 are effective, they are cos… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted as the oral paper for the HCHM workshop, ACM Multimedia 2024

  7. arXiv:2408.10327  [pdf, other

    cs.SE

    An Empirical Study on Package-Level Deprecation in Python Ecosystem

    Authors: Zhiqing Zhong, Shilin He, Haoxuan Wang, Boxi Yu, Haowen Yang, Pinjia He

    Abstract: Open-source software (OSS) plays a crucial role in modern software development. Utilizing OSS code can greatly accelerate software development, reduce redundancy, and enhance reliability. Python, a widely adopted programming language, is renowned for its extensive and diverse third-party package ecosystem. However, a significant number of OSS packages within the Python ecosystem are in poor mainte… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)

  8. arXiv:2408.10204  [pdf, other

    cs.LG cs.CV

    Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency

    Authors: Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, Yiran Chen

    Abstract: Adversarial training enhances neural network robustness but suffers from a tendency to overfit and increased generalization errors on clean data. This work introduces CLAT, an innovative approach that mitigates adversarial overfitting by introducing parameter efficiency into the adversarial training process, improving both clean accuracy and adversarial robustness. Instead of tuning the entire mod… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 9 pages + appendix/ additional experiments

  9. arXiv:2408.09511  [pdf, other

    cs.CV

    NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality

    Authors: Chaofan Tao, Gukyeong Kwon, Varad Gunjal, Hao Yang, Zhaowei Cai, Yonatan Dukler, Ashwin Swaminathan, R. Manmatha, Colin Jon Taylor, Stefano Soatto

    Abstract: We study the capability of Video-Language (VidL) models in understanding compositions between objects, attributes, actions and their relations. Composition understanding becomes particularly challenging for video data since the compositional relations rapidly change over time in videos. We first build a benchmark named AARO to evaluate composition understanding related to actions on top of spatial… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  10. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  11. arXiv:2408.09278  [pdf, other

    eess.IV cs.CV

    Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology

    Authors: Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  12. arXiv:2408.08969  [pdf, other

    cs.AI physics.optics

    Differentiable Edge-based OPC

    Authors: Guojin Chen, Haoyu Yang, Haoxing Ren, Bei Yu, David Z. Pan

    Abstract: Optical proximity correction (OPC) is crucial for pushing the boundaries of semiconductor manufacturing and enabling the continued scaling of integrated circuits. While pixel-based OPC, termed as inverse lithography technology (ILT), has gained research interest due to its flexibility and precision. Its complexity and intricate features can lead to challenges in mask writing, increased defects, an… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by ICCAD24

  13. arXiv:2408.07759  [pdf, other

    cs.IR

    SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis

    Authors: Shentao Yang, Haichuan Yang, Linna Du, Adithya Ganesh, Bo Peng, Boying Liu, Serena Li, Ji Liu

    Abstract: The significance of estimating video watch time has been highlighted by the rising importance of (short) video recommendation, which has become a core product of mainstream social media platforms. Modeling video watch time, however, has been challenged by the complexity of user-video interaction, such as different user behavior modes in watching the recommended videos and varying watching probabil… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  14. arXiv:2408.07444  [pdf, other

    eess.IV cs.CV

    Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

    Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

    Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  15. arXiv:2408.06638  [pdf, other

    cs.LG cs.CV

    COD: Learning Conditional Invariant Representation for Domain Adaptation Regression

    Authors: Hao-Ran Yang, Chuan-Xian Ren, You-Wei Luo

    Abstract: Aiming to generalize the label knowledge from a source domain with continuous outputs to an unlabeled target domain, Domain Adaptation Regression (DAR) is developed for complex practical learning problems. However, due to the continuity problem in regression, existing conditional distribution alignment theory and methods with discrete prior, which are proven to be effective in classification setti… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024 (oral)

  16. arXiv:2408.06381  [pdf, other

    eess.IV cs.AI cs.CV

    Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology

    Authors: Junlin Guo, Siqi Lu, Can Cui, Ruining Deng, Tianyuan Yao, Zhewen Tao, Yizhe Lin, Marilyn Lionts, Quan Liu, Juming Xiong, Catie Chang, Mitchell Wilkes, Mengmeng Yin, Haichun Yang, Yuankai Huo

    Abstract: Cell nuclei instance segmentation is a crucial task in digital kidney pathology. Traditional automatic segmentation methods often lack generalizability when applied to unseen datasets. Recently, the success of foundation models (FMs) has provided a more generalizable solution, potentially enabling the segmentation of any cell type. In this study, we perform a large-scale evaluation of three widely… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  17. arXiv:2408.06357  [pdf

    cs.CV cs.AI

    Algorithm Research of ELMo Word Embedding and Deep Learning Multimodal Transformer in Image Description

    Authors: Xiaohan Cheng, Taiyuan Mei, Yun Zi, Qi Wang, Zijun Gao, Haowei Yang

    Abstract: Zero sample learning is an effective method for data deficiency. The existing embedded zero sample learning methods only use the known classes to construct the embedded space, so there is an overfitting of the known classes in the testing process. This project uses category semantic similarity measures to classify multiple tags. This enables it to incorporate unknown classes that have the same mea… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

  18. arXiv:2408.06197  [pdf, other

    cs.CR cs.DC

    Lancelot: Towards Efficient and Privacy-Preserving Byzantine-Robust Federated Learning within Fully Homomorphic Encryption

    Authors: Siyang Jiang, Hao Yang, Qipeng Xie, Chuan Ma, Sen Wang, Guoliang Xing

    Abstract: In sectors such as finance and healthcare, where data governance is subject to rigorous regulatory requirements, the exchange and utilization of data are particularly challenging. Federated Learning (FL) has risen as a pioneering distributed machine learning paradigm that enables collaborative model training across multiple institutions while maintaining data decentralization. Despite its advantag… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 26 pages

  19. arXiv:2408.05831  [pdf, other

    cs.CV cs.AI

    Robust Domain Generalization for Multi-modal Object Recognition

    Authors: Yuxin Qiao, Keqin Li, Junhong Lin, Rong Wei, Chufeng Jiang, Yang Luo, Haoyu Yang

    Abstract: In multi-label classification, machine learning encounters the challenge of domain generalization when handling tasks with distributions differing from the training data. Existing approaches primarily focus on vision object recognition and neglect the integration of natural language. Recent advancements in vision-language pre-training leverage supervision from extensive visual-language pairs, enab… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 6 pages, 2 figures. This is a preprint version of the article. The final version will be published in the proceedings of the IEEE conference

  20. arXiv:2408.05475  [pdf, other

    cs.CV

    Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

    Authors: Junyan Ye, Zhutao Lv, Weijia Li, Jinhua Yu, Haote Yang, Huaping Zhong, Conghui He

    Abstract: Cross-view geolocalization identifies the geographic location of street view images by matching them with a georeferenced satellite database. Significant challenges arise due to the drastic appearance and geometry differences between views. In this paper, we propose a new approach for cross-view image geo-localization, i.e., the Panorama-BEV Co-Retrieval Network. Specifically, by utilizing the gro… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  21. arXiv:2408.04849  [pdf

    cs.CL cs.AI

    Ensemble BERT: A student social network text sentiment classification model based on ensemble learning and BERT architecture

    Authors: Kai Jiang, Honghao Yang, Yuexian Wang, Qianru Chen, Yiming Luo

    Abstract: The mental health assessment of middle school students has always been one of the focuses in the field of education. This paper introduces a new ensemble learning network based on BERT, employing the concept of enhancing model performance by integrating multiple classifiers. We trained a range of BERT-based learners, which combined using the majority voting method. We collect social network text d… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  22. Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion

    Authors: Xiaoyang Ji, Yuchen Zhou, Haofu Yang, Shiyue Xu, Jiahao Li

    Abstract: Graph clustering, a classical task in graph learning, involves partitioning the nodes of a graph into distinct clusters. This task has applications in various real-world scenarios, such as anomaly detection, social network analysis, and community discovery. Current graph clustering methods commonly rely on module pre-training to obtain a reliable prior distribution for the model, which is then use… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

    Journal ref: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Tianjin, China, 2024, pp. 254-259

  23. arXiv:2408.04313  [pdf, other

    stat.ML cs.LG stat.ME

    Better Locally Private Sparse Estimation Given Multiple Samples Per User

    Authors: Yuheng Ma, Ke Jia, Hanfang Yang

    Abstract: Previous studies yielded discouraging results for item-level locally differentially private linear regression with $s^*$-sparsity assumption, where the minimax rate for $nm$ samples is $\mathcal{O}(s^{*}d / nm\varepsilon^2)$. This can be challenging for high-dimensional data, where the dimension $d$ is extremely large. In this work, we investigate user-level locally differentially private sparse l… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Journal ref: ICML2024 Proceedings

  24. arXiv:2408.04261  [pdf, other

    cs.CV cs.AI cs.CR

    Unveiling Hidden Visual Information: A Reconstruction Attack Against Adversarial Visual Information Hiding

    Authors: Jonggyu Jang, Hyeonsu Lyu, Seongjin Hwang, Hyun Jong Yang

    Abstract: This paper investigates the security vulnerabilities of adversarial-example-based image encryption by executing data reconstruction (DR) attacks on encrypted images. A representative image encryption method is the adversarial visual information hiding (AVIH), which uses type-I adversarial example training to protect gallery datasets used in image recognition tasks. In the AVIH method, the type-I a… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 12 pages

  25. arXiv:2408.04193  [pdf, other

    cs.LG cs.AI

    Uncertainty-Aware Crime Prediction With Spatial Temporal Multivariate Graph Neural Networks

    Authors: Zepu Wang, Xiaobo Ma, Huajie Yang, Weimin Lvu, Peng Sun, Sharath Chandra Guntuku

    Abstract: Crime forecasting is a critical component of urban analysis and essential for stabilizing society today. Unlike other time series forecasting problems, crime incidents are sparse, particularly in small regions and within specific time periods. Traditional spatial-temporal deep learning models often struggle with this sparsity, as they typically cannot effectively handle the non-Gaussian nature of… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  26. arXiv:2408.03806  [pdf, other

    cs.IT cs.LG cs.NI

    Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency

    Authors: Xijun Wang, Dongshan Ye, Chenyuan Feng, Howard H. Yang, Xiang Chen, Tony Q. S. Quek

    Abstract: Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, 2 tables

  27. arXiv:2408.03307  [pdf, other

    stat.ML cs.LG

    Pre-training and in-context learning IS Bayesian inference a la De Finetti

    Authors: Naimeng Ye, Hanming Yang, Andrew Siah, Hongseok Namkoong

    Abstract: Accurately gauging uncertainty on the underlying environment is a longstanding goal of intelligent systems. We characterize which latent concepts pre-trained sequence models are naturally able to reason with. We go back to De Finetti's predictive view of Bayesian reasoning: instead of modeling latent parameters through priors and likelihoods like topic models do, De Finetti has long advocated for… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  28. arXiv:2408.03249  [pdf, other

    cs.HC

    Multi-User Mobile Augmented Reality for Cardiovascular Surgical Planning

    Authors: Pratham Mehta, Rahul O Narayanan, Harsha Karanth, Haoyang Yang, Timothy C Slesnick, Fawwaz Shaw, Duen Horng Chau

    Abstract: Collaborative planning for congenital heart diseases typically involves creating physical heart models through 3D printing, which are then examined by both surgeons and cardiologists. Recent developments in mobile augmented reality (AR) technologies have presented a viable alternative, known for their ease of use and portability. However, there is still a lack of research examining the utilization… ▽ More

    Submitted 7 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  29. arXiv:2408.02074  [pdf

    eess.IV cs.AI cs.CV

    Applying Conditional Generative Adversarial Networks for Imaging Diagnosis

    Authors: Haowei Yang, Yuxiang Hu, Shuyao He, Ting Xu, Jiajie Yuan, Xingxin Gu

    Abstract: This study introduces an innovative application of Conditional Generative Adversarial Networks (C-GAN) integrated with Stacked Hourglass Networks (SHGN) aimed at enhancing image segmentation, particularly in the challenging environment of medical imaging. We address the problem of overfitting, common in deep learning models applied to complex imaging datasets, by augmenting data through rotation a… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

  30. arXiv:2408.01812  [pdf, other

    cs.CV

    SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm

    Authors: Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Jinhua Yu, Haote Yang, Conghui He

    Abstract: Street-to-satellite image synthesis focuses on generating realistic satellite images from corresponding ground street-view images while maintaining a consistent content layout, similar to looking down from the sky. The significant differences in perspectives create a substantial domain gap between the views, making this cross-view generation task particularly challenging. In this paper, we introdu… ▽ More

    Submitted 17 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  31. arXiv:2408.01589  [pdf, other

    cs.RO

    Soil Sample Search in Partially Observable Environments

    Authors: Han Yang, Andrew Dudash

    Abstract: To work in unknown outdoor environments, autonomous sampling machines need the ability to target samples despite limited visibility and robotic arm reach distance. We design a heuristic guided search method to speed up the search process and more efficiently localize the approximate center of soil regions. Through simulation experiments, we assess the effectiveness of the proposed algorithm and di… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  32. arXiv:2408.01562  [pdf

    cs.CY

    Welfare, sustainability, and equity evaluation of the New York City Interborough Express using spatially heterogeneous mode choice models

    Authors: Hai Yang, Hongying Wu, Lauren Whang, Xiyuan Ren, Joseph Y. J. Chow

    Abstract: The Metropolitan Transit Authority (MTA) proposed building a new light rail route called the Interborough Express (IBX) to provide a direct, fast transit linkage between Queens and Brooklyn. An open-access synthetic citywide trip agenda dataset and a block-group-level mode choice model are used to assess the potential impact IBX could bring to New York City (NYC). IBX could save 28.1 minutes to po… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  33. arXiv:2407.20657  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks

    Authors: Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon

    Abstract: Recent vision-language foundation models, such as CLIP, have demonstrated superior capabilities in learning representations that can be transferable across diverse range of downstream tasks and domains. With the emergence of such powerful models, it has become crucial to effectively leverage their capabilities in tackling challenging vision tasks. On the other hand, only a few works have focused o… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024, Project Page: https://PDCL-Attack.github.io

  34. arXiv:2407.20653  [pdf, other

    cs.CV cs.AI cs.LG

    FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks

    Authors: Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon

    Abstract: Deep neural networks are known to be vulnerable to security risks due to the inherent transferable nature of adversarial examples. Despite the success of recent generative model-based attacks demonstrating strong transferability, it still remains a challenge to design an efficient attack strategy in a real-world strict black-box setting, where both the target domain and model architectures are unk… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted to AAAI 2024, Project Page: https://FACL-Attack.github.io

  35. arXiv:2407.19775  [pdf, other

    cs.AI cs.CL cs.CR cs.DC

    Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference

    Authors: Claudio Angione, Yue Zhao, Harry Yang, Ahmad Farhan, Fielding Johnston, James Buban, Patrick Colangelo

    Abstract: The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decent… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  36. arXiv:2407.19401  [pdf, other

    cs.CR cs.AI

    Complete Security and Privacy for AI Inference in Decentralized Systems

    Authors: Hongyang Zhang, Yue Zhao, Claudio Angione, Harry Yang, James Buban, Ahmad Farhan, Fielding Johnston, Patrick Colangelo

    Abstract: The need for data security and model integrity has been accentuated by the rapid adoption of AI and ML in data-driven domains including healthcare, finance, and security. Large models are crucial for tasks like diagnosing diseases and forecasting finances but tend to be delicate and not very scalable. Decentralized systems solve this issue by distributing the workload and reducing central points o… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 25 pages, 5 figures

  37. Alleviating Over-Smoothing via Aggregation over Compact Manifolds

    Authors: Dongzhuoran Zhou, Hui Yang, Bo Xiong, Yue Ma, Evgeny Kharlamov

    Abstract: Graph neural networks (GNNs) have achieved significant success in various applications. Most GNNs learn the node features with information aggregation of its neighbors and feature transformation in each layer. However, the node features become indistinguishable after many layers, leading to performance deterioration: a significant limitation known as over-smoothing. Past work adopted various techn… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by PAKDD 2024 as an oral presentation

  38. arXiv:2407.18390  [pdf, other

    eess.IV cs.CV

    Adapting Mouse Pathological Model to Human Glomerular Lesion Segmentation

    Authors: Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Moving from animal models to human applications in preclinical research encompasses a broad spectrum of disciplines in medical science. A fundamental element in the development of new drugs, treatments, diagnostic methods, and in deepening our understanding of disease processes is the accurate measurement of kidney tissues. Past studies have demonstrated the viability of translating glomeruli segm… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  39. arXiv:2407.16370  [pdf, other

    cs.CL cs.SD eess.AS

    Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

    Authors: Rithik Sachdev, Zhong-Qiu Wang, Chao-Han Huck Yang

    Abstract: Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: in submission

  40. arXiv:2407.16103  [pdf, other

    q-fin.CP cs.LG q-fin.TR

    Reinforcement Learning Pair Trading: A Dynamic Scaling approach

    Authors: Hongshen Yang, Avinash Malik

    Abstract: Cryptocurrency is a cryptography-based digital asset with extremely volatile prices. Around $70 billion worth of crypto-currency is traded daily on exchanges. Trading crypto-currency is difficult due to the inherent volatility of the crypto-market. In this work, we want to test the hypothesis: "Can techniques from artificial intelligence help with algorithmically trading cryptocurrencies?". In ord… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 31 pages

    MSC Class: 91-08

  41. arXiv:2407.14239  [pdf, other

    cs.AI

    KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

    Authors: Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai

    Abstract: Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in coope… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 18 figures

  42. arXiv:2407.12996  [pdf, other

    stat.ML cs.LG

    Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

    Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

    Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  43. arXiv:2407.12973  [pdf, other

    cs.CV cs.AI

    Temporal Label Hierachical Network for Compound Emotion Recognition

    Authors: Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Tianhua Qi, Hao Yang, Yuan Zong, Wenming Zheng

    Abstract: The emotion recognition has attracted more attention in recent decades. Although significant progress has been made in the recognition technology of the seven basic emotions, existing methods are still hard to tackle compound emotion recognition that occurred commonly in practical application. This article introduces our achievements in the 7th Field Emotion Behavior Analysis (ABAW) competition. I… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: draft for abaw7

  44. arXiv:2407.12899  [pdf, other

    cs.CV cs.AI cs.MM

    DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

    Authors: Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

    Abstract: Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subject-consistent frames based solely on a story. To this end, we propose DreamStory, an automatic open-domain story visualization framework by leveragin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  45. arXiv:2407.12703  [pdf, other

    cs.CL

    Subgraph-Aware Training of Text-based Methods for Knowledge Graph Completion

    Authors: Youmin Ko, Hyemin Yang, Taeuk Kim, Hyunjoon Kim

    Abstract: Fine-tuning pre-trained language models (PLMs) has recently shown a potential to improve knowledge graph completion (KGC). However, most PLM-based methods encode only textual information, neglecting various topological structures of knowledge graphs (KGs). In this paper, we empirically validate the significant relations between the structural properties of KGs and the performance of the PLM-based… ▽ More

    Submitted 23 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  46. arXiv:2407.11663  [pdf, other

    cs.CV

    Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network

    Authors: Xiaodong Li, Wenchao Du, Hongyu Yang

    Abstract: In this paper, we present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. We address the research problems of this challenge from three aspects: 1)For learning robust visual feat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  47. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  48. arXiv:2407.11242  [pdf, other

    q-bio.GN cs.CL

    OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models

    Authors: Heng Yang, Ke Li

    Abstract: The structures of RNA sequences play a vital role in various cellular processes, while existing genomic foundation models (FMs) have struggled with precise sequence-structure alignment, due to the complexity of exponential combinations of nucleotide bases. In this study, we introduce OmniGenome, a foundation model that addresses this critical challenge of sequence-structure alignment in RNA FMs. O… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: submitted to NeurIPS 2024, 19 pages

  49. arXiv:2407.10548  [pdf, other

    cs.IT

    Fluid Antenna Multiple Access Assisted Integrated Data and Energy Transfer: Outage and Multiplexing Gain Analysis

    Authors: Xiao Lin, Yizhe Zhao, Halvin Yang, Jie Hu, Kai-Kit Wong

    Abstract: Fluid antenna multiple access (FAMA) exploits the spatial opportunities in wireless channels to overcome multiuser interference by position (a.k.a.~port) switching, which can achieve better performance compared to traditional fixed multiple-input multiple-output (MIMO) systems. Additionally, integrated data and energy transfer (IDET) is capable of providing both wireless data transfer (WDT) and wi… ▽ More

    Submitted 1 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  50. arXiv:2407.10446  [pdf, other

    cs.SD cs.AI cs.DB eess.AS

    DDFAD: Dataset Distillation Framework for Audio Data

    Authors: Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu

    Abstract: Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to comp… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.