Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 408 results for author: Tan, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20920  [pdf, other

    cs.CV

    SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition

    Authors: Hao Tan, Zichang Tan, Jun Li, Jun Wan, Zhen Lei, Stan Z. Li

    Abstract: Multi-label image recognition is a fundamental task in computer vision. Recently, Vision-Language Models (VLMs) have made notable advancements in this area. However, previous methods fail to effectively leverage the rich knowledge in language models and often incorporate label semantics into visual features unidirectionally. To overcome these problems, we propose a Split-and-Synthesize Prompting w… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures

  2. arXiv:2407.20516  [pdf, other

    cs.LG cs.AI cs.CL

    Machine Unlearning in Generative AI: A Survey

    Authors: Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang

    Abstract: Generative AI technologies have been deployed in many places, such as (multimodal) large language models and vision generative models. Their remarkable performance should be attributed to massive training data and emergent reasoning abilities. However, the models would memorize and generate sensitive, biased, or dangerous information originated from the training data especially those from web craw… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.17451  [pdf, other

    cs.SI cs.CY cs.IR

    BlueTempNet: A Temporal Multi-network Dataset of Social Interactions in Bluesky Social

    Authors: Ujun Jeong, Bohan Jiang, Zhen Tan, H. Russell Bernard, Huan Liu

    Abstract: Decentralized social media platforms like Bluesky Social (Bluesky) have made it possible to publicly disclose some user behaviors with millisecond-level precision. Embracing Bluesky's principles of open-source and open-data, we present the first collection of the temporal dynamics of user-driven social interactions. BlueTempNet integrates multiple types of networks into a single multi-network, inc… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: to appear in IEEE Data Description

  4. arXiv:2407.17436  [pdf, other

    cs.CY cs.AI

    AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

    Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.11030  [pdf, other

    cs.LG cs.AI cs.CL

    DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

    Authors: Zhen Tan, Daize Dong, Xinyu Zhao, Jie Peng, Yu Cheng, Tianlong Chen

    Abstract: In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  6. arXiv:2407.10852  [pdf, other

    cs.DS

    Cut-Preserving Vertex Sparsifiers for Planar and Quasi-bipartite Graphs

    Authors: Yu Chen, Zihan Tan

    Abstract: We study vertex sparsification for preserving cuts. Given a graph $G$ with a subset $|T|=k$ of its vertices called terminals, a \emph{quality-$q$ cut sparsifier} is a graph $G'$ that contains $T$, such that, for any partition $(T_1,T_2)$ of $T$ into non-empty subsets, the value of the min-cut in $G'$ separating $T_1$ from $T_2$ is within factor $q$ from the value of the min-cut in $G$ separating… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  7. arXiv:2407.10468  [pdf, other

    cs.SD cs.AI eess.AS

    LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

    Authors: Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang

    Abstract: Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to lo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024; Code: https://github.com/Yuanshi9815/LiteFocus

  8. arXiv:2407.09333  [pdf, other

    cs.CR

    HETOCompiler: An MLIR-based crypTOgraphic Compilation Framework for HEterogeneous Devices

    Authors: Zhiyuan Tan, Liutong Han, Mingjie Xing, Yanjun Wu

    Abstract: Hash algorithms are fundamental tools in cryptography, offering irreversible and sensitive transformations of input data for various security purposes. As computing architectures evolve towards heterogeneous systems, efficiently harnessing diverse computing resources for hash encryption algorithms becomes crucial. This paper presents HETOCompiler, a novel cryptography compilation framework designe… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  9. arXiv:2407.08974  [pdf, other

    q-bio.QM cs.LG math.GN q-bio.BM

    Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

    Authors: Joshua Zhi En Tan, JunJie Wee, Xue Gong, Kelin Xia

    Abstract: Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  10. arXiv:2407.07958  [pdf, other

    cs.CV

    Bayesian Detector Combination for Object Detection with Crowdsourced Annotations

    Authors: Zhi Qin Tan, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, Yunpeng Li

    Abstract: Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  11. arXiv:2407.07053  [pdf, other

    cs.CV

    Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

    Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

  12. arXiv:2407.02408  [pdf, other

    cs.CL cs.LG

    CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

    Authors: Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

    Abstract: As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 37 pages, 32 figures

  13. arXiv:2407.00390  [pdf, other

    cs.CL

    Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

    Authors: Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, t… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  14. arXiv:2406.19417  [pdf, other

    cs.CR cs.AI

    "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models

    Authors: Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, Huan Liu

    Abstract: Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving their performance in applications like fact-checking and information searching. In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases by injecting deceptive content into the retrieval database, intentionall… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Preprint

  15. arXiv:2406.17992  [pdf, other

    cs.CL cs.AI

    Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models

    Authors: Bohan Jiang, Chengshuai Zhao, Zhen Tan, Huan Liu

    Abstract: Despite recent advancements in detecting disinformation generated by large language models (LLMs), current efforts overlook the ever-evolving nature of this disinformation. In this work, we investigate a challenging yet practical research problem of detecting evolving LLM-generated disinformation. Disinformation evolves constantly through the rapid development of LLMs and their variants. As a cons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

  16. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  17. arXiv:2406.16562  [pdf, other

    cs.CV cs.CL

    EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

    Authors: Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li

    Abstract: The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics that can guide the optimization of the models. In this paper, we propose EvalAlign, a metric characterized by its accuracy, stability, and fine granularity. Our ap… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Github Repository: https://github.com/SAIS-FUXI/EvalAlign

  18. arXiv:2406.16260  [pdf, other

    cs.CV cs.AI

    Video-Infinity: Distributed Long Video Generation

    Authors: Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao Wang

    Abstract: Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few seconds. The primary challenges in producing longer videos include the substantial memory requirements and the extended processing time required on a single GPU. A s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  19. arXiv:2406.15992  [pdf, other

    cs.CL

    Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

    Authors: Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov

    Abstract: Large language models (LLMs) demonstrate great potential for problems with implicit graphical structures, while recent works seek to enhance the graph reasoning capabilities of LLMs through specialized instruction tuning. The resulting 'graph LLMs' are evaluated with in-distribution settings only, thus it remains underexplored whether LLMs are learning generalizable graph reasoning skills or merel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures, Code and data will be publicly available at https://github.com/MatthewYZhang/NLGift

    ACM Class: I.2.7

  20. FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

    Authors: Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu, Yinger Zhang, Jinjie Gu

    Abstract: Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the g… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Report number: 30th

    Journal ref: KDD 2024

  21. arXiv:2406.12313  [pdf

    cs.DB

    A framework for developing a knowledge management platform

    Authors: Marie Lisandra Zepeda Mendoza, Sonali Agarwal, James A. Blackshaw, Vanesa Bol, Audrey Fazzi, Filippo Fiorini, Amy Louise Foreman, Nancy George, Brett R. Johnson, Brian Martin, Dave McComb, Euphemia Mutasa-Gottgens, Helen Parkinson, Martin Romacker, Rolf Russell, Valérien Ségard, Shawn Zheng Kai Tan, Wei Kheng Teh, F. P. Winstanley, Benedict Wong, Adrian M. Smith

    Abstract: Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 1 figure

  22. arXiv:2406.10471  [pdf, other

    cs.CL

    Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts

    Authors: Zhaoxuan Tan, Zheyuan Liu, Meng Jiang

    Abstract: Personalized large language models (LLMs) aim to tailor interactions, content, and recommendations to individual user preferences. While parameter-efficient fine-tuning (PEFT) methods excel in performance and generalization, they are costly and limit communal benefits when used individually. To this end, we introduce Personalized Pieces (Per-Pcs), a framework that allows users to safely share and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  23. arXiv:2406.09899  [pdf, other

    cs.LG cs.AI

    Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

    Authors: Zhentao Tan, Yadong Mu

    Abstract: Recently various optimization problems, such as Mixed Integer Linear Programming Problems (MILPs), have undergone comprehensive investigation, leveraging the capabilities of machine learning. This work focuses on learning-based solutions for efficiently solving the Quadratic Assignment Problem (QAPs), which stands as a formidable challenge in combinatorial optimization. While many instances of sim… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  24. arXiv:2406.08785  [pdf, other

    cs.CV

    BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection

    Authors: Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye, Zichang Tan, Jingdong Wang, Gaoang Wang, Xi Li

    Abstract: Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we p… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  25. arXiv:2406.06911  [pdf, other

    cs.CV cs.AI

    AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

    Authors: Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang

    Abstract: Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enable… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Work in progress. Project Page: https://czg1225.github.io/asyncdiff_page/

  26. arXiv:2406.06904  [pdf, other

    cs.RO cs.HC

    Person Transfer in the Field: Examining Real World Sequential Human-Robot Interaction Between Two Robots

    Authors: Xiang Zhi Tan, Elizabeth J. Carter, Aaron Steinfeld

    Abstract: With more robots being deployed in the world, users will likely interact with multiple robots sequentially when receiving services. In this paper, we describe an exploratory field study in which unsuspecting participants experienced a ``person transfer'' -- a scenario in which they first interacted with one stationary robot before another mobile robot joined to complete the interaction. In our 7-h… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to RO-MAN 2024

  27. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  28. arXiv:2406.06140  [pdf, other

    cs.CL cs.LG

    Can I understand what I create? Self-Knowledge Evaluation of Large Language Models

    Authors: Zhiquan Tan, Lai Wei, Jindong Wang, Xing Xie, Weiran Huang

    Abstract: Large language models (LLMs) have achieved remarkable progress in linguistic tasks, necessitating robust evaluation frameworks to understand their capabilities and limitations. Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework that is easy to implement, evaluating models on their ability to comprehend and respond to self-generated… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.05270  [pdf

    physics.med-ph cs.CV cs.LG eess.IV

    fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

    Authors: Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

    Abstract: This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2406.03999  [pdf, other

    cs.LG cs.CV

    Unveiling the Dynamics of Information Interplay in Supervised Learning

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

    Abstract: In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  31. arXiv:2406.03799  [pdf

    cs.CV cs.AI

    Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge

    Authors: Nan Zhang, Xidan Zhang, Jianing Wei, Fangjun Wang, Zhiming Tan

    Abstract: This report describes the winning solution to the WeatherProof Dataset Challenge (CVPR 2024 UG2+ Track 3). Details regarding the challenge are available at https://cvpr2024ug2challenge.github.io/track3.html. We propose an enhanced semantic segmentation pipeline for this challenge. Firstly, we improve semantic segmentation models, using backbone pretrained with Depth Anything to improve UperNet mod… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  32. arXiv:2406.03777  [pdf, other

    cs.LG cs.AI

    Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

    Authors: Ruiyang Qin, Dancheng Liu, Zheyu Yan, Zhaoxuan Tan, Zixuan Pan, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Jinjun Xiong, Yiyu Shi

    Abstract: The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become m… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Benckmarking paper

  33. arXiv:2406.02178  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations

    Authors: Sarthak Yadav, Zheng-Hua Tan

    Abstract: Despite its widespread adoption as the prominent neural architecture, the Transformer has spurred several independent lines of work to address its limitations. One such approach is selective state space models, which have demonstrated promising results for language modelling. However, their feasibility for learning self-supervised, general-purpose audio representations is yet to be investigated. T… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  34. arXiv:2406.01196  [pdf, other

    cs.CV cs.AI

    3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

    Authors: Sihan Wen, Xiantan Zhu, Zhiming Tan

    Abstract: In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global contex… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  35. arXiv:2406.00777  [pdf, other

    cs.CV cs.AI

    Diffusion Features to Bridge Domain Gap for Semantic Segmentation

    Authors: Yuxiang Ji, Boyong He, Chenyuan Qu, Zhuoyue Tan, Chuan Qin, Liaoni Wu

    Abstract: Pre-trained diffusion models have demonstrated remarkable proficiency in synthesizing images across a wide range of scenarios with customizable prompts, indicating their effective capacity to capture universal features. Motivated by this, our study delves into the utilization of the implicit knowledge embedded within diffusion models to address challenges in cross-domain semantic segmentation. Thi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  36. arXiv:2406.00323  [pdf, other

    cs.IR cs.MM

    BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation

    Authors: Qile Fan, Penghang Yu, Zhiyi Tan, Bing-Kun Bao, Guanming Lu

    Abstract: Multimedia recommender systems focus on utilizing behavioral information and content information to model user preferences. Typically, it employs pre-trained feature encoders to extract content features, then fuses them with behavioral features. However, pre-trained feature encoders often extract features from the entire content simultaneously, including excessive preference-irrelevant details. We… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  37. arXiv:2405.20711  [pdf, other

    cs.CV

    Revisiting Mutual Information Maximization for Generalized Category Discovery

    Authors: Zhaorui Tan, Chengrui Zhang, Xi Yang, Jie Sun, Kaizhu Huang

    Abstract: Generalized category discovery presents a challenge in a realistic scenario, which requires the model's generalization ability to recognize unlabeled samples from known and unknown categories. This paper revisits the challenge of generalized category discovery through the lens of information maximization (InfoMax) with a probabilistic parametric classifier. Our findings reveal that ensuring indepe… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint version

  38. arXiv:2405.18756  [pdf, other

    cs.LG cs.AI cs.CV stat.AP stat.ML

    Provable Contrastive Continual Learning

    Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang

    Abstract: Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  39. arXiv:2405.14278  [pdf, other

    cs.CV

    SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation

    Authors: Kai Yao, Zhaorui Tan, Zixian Su, Xi Yang, Jie Sun, Kaizhu Huang

    Abstract: Open compound domain adaptation (OCDA) aims to transfer knowledge from a labeled source domain to a mix of unlabeled homogeneous compound target domains while generalizing to open unseen domains. Existing OCDA methods solve the intra-domain gaps by a divide-and-conquer strategy, which divides the problem into several individual and parallel domain adaptation (DA) tasks. Such approaches often conta… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  40. arXiv:2405.14092  [pdf, other

    cs.CL

    Large Language Models Can Self-Correct with Minimal Effort

    Authors: Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang

    Abstract: Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current… ▽ More

    Submitted 23 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Work in Progress

  41. arXiv:2405.12914  [pdf, other

    cs.CV

    An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation

    Authors: Zhiyu Tan, Mengping Yang, Luozheng Qin, Hao Yang, Ye Qian, Qiang Zhou, Cheng Zhang, Hao Li

    Abstract: One critical prerequisite for faithful text-to-image generation is the accurate understanding of text inputs. Existing methods leverage the text encoder of the CLIP model to represent input prompts. However, the pre-trained CLIP model can merely encode English with a maximum token length of 77. Moreover, the model capacity of the text encoder from CLIP is relatively limited compared to Large Langu… ▽ More

    Submitted 18 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: To appear in ECCV-2024, Project page: https://llm-conditioned-diffusion.github.io/

  42. arXiv:2405.11405  [pdf, ps, other

    cs.IT

    On the Rate-Distortion Function for Sampled Cyclostationary Gaussian Processes with Memory: Extended Version with Proofs

    Authors: Zikun Tan, Ron Dabora, H. Vincent Poor

    Abstract: In this work we study the rate-distortion function (RDF) for lossy compression of asynchronously-sampled continuous-time (CT) wide-sense cyclostationary (WSCS) Gaussian processes with memory. As the case of synchronous sampling, i.e., when the sampling interval is commensurate with the period of the cyclostationary statistics, has already been studied, we focus on discrete-time (DT) processes obta… ▽ More

    Submitted 23 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 11 pages, 0 figures, accepted by the 2024 IEEE International Symposium on Information Theory (ISIT 2024)

  43. arXiv:2405.07027  [pdf, other

    cs.CV cs.AI cs.RO

    TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

    Authors: Zhen Tan, Zongtan Zhou, Yangbing Ge, Zi Wang, Xieyuanli Chen, Dewen Hu

    Abstract: The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncate… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  44. arXiv:2405.05413  [pdf

    cs.DB

    Digital Evolution: Novo Nordisk's Shift to Ontology-Based Data Management

    Authors: Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gade Bjerregaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose

    Abstract: Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transf… ▽ More

    Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures

  45. arXiv:2405.04819  [pdf, other

    cs.CL cs.AI

    DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

    Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

    Abstract: Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: Under Review; Incorrect author name revised

  46. arXiv:2405.03723  [pdf, other

    cs.LG stat.ME stat.ML

    Generative adversarial learning with optimal input dimension and its adaptive generator architecture

    Authors: Zhiyao Tan, Ling Zhou, Huazhen Lin

    Abstract: We investigate the impact of the input dimension on the generalization error in generative adversarial networks (GANs). In particular, we first provide both theoretical and practical evidence to validate the existence of an optimal input dimension (OID) that minimizes the generalization error. Then, to identify the OID, we introduce a novel framework called generalized GANs (G-GANs), which include… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  47. Misaka: Interactive Swarm Testbed for Smart Grid Distributed Algorithm Test and Evaluation

    Authors: Tingliang Zhang, Haiwang Zhong, Zhenfei Tan, Xinfei Yan

    Abstract: In this paper, we present Misaka, a visualized swarm testbed for smart grid algorithm evaluation, also an extendable open-source open-hardware platform for developing tabletop tangible swarm interfaces. The platform consists of a collection of custom-designed 3 omni-directional wheels robots each 10 cm in diameter, high accuracy localization through a microdot pattern overlaid on top of the activi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Journal ref: 2020 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia)

  48. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  49. arXiv:2404.16339  [pdf, other

    cs.CV cs.AI

    Training-Free Unsupervised Prompt for Vision-Language Models

    Authors: Sifan Long, Linbin Wang, Zhen Zhao, Zichang Tan, Yiming Wu, Shengsheng Wang, Jingdong Wang

    Abstract: Prompt learning has become the most effective paradigm for adapting large pre-trained vision-language models (VLMs) to downstream tasks. Recently, unsupervised prompt tuning methods, such as UPL and POUF, directly leverage pseudo-labels as supervisory information to fine-tune additional adaptation modules on unlabeled data. However, inaccurate pseudo labels easily misguide the tuning process and r… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  50. arXiv:2404.15381  [pdf, other

    cs.LG cs.AI

    Advances and Open Challenges in Federated Learning with Foundation Models

    Authors: Chao Ren, Han Yu, Hongyi Peng, Xiaoli Tang, Anran Li, Yulan Gao, Alysa Ziying Tan, Bo Zhao, Xiaoxiao Li, Zengxiang Li, Qiang Yang

    Abstract: The integration of Foundation Models (FMs) with Federated Learning (FL) presents a transformative paradigm in Artificial Intelligence (AI), offering enhanced capabilities while addressing concerns of privacy, data decentralization, and computational efficiency. This paper provides a comprehensive survey of the emerging field of Federated Foundation Models (FedFM), elucidating their synergistic rel… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Survey of Federated Foundation Models (FedFM)