Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 141 results for author: Wei, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20147  [pdf, other

    quant-ph cs.AI cs.ET cs.LG cs.NE

    Quantum Machine Learning Architecture Search via Deep Reinforcement Learning

    Authors: Xin Dai, Tzu-Chieh Wei, Shinjae Yoo, Samuel Yen-Chi Chen

    Abstract: The rapid advancement of quantum computing (QC) and machine learning (ML) has given rise to the burgeoning field of quantum machine learning (QML), aiming to capitalize on the strengths of quantum computing to propel ML forward. Despite its promise, crafting effective QML models necessitates profound expertise to strike a delicate balance between model intricacy and feasibility on Noisy Intermedia… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE International Conference on Quantum Computing and Engineering - QCE 2024

  2. arXiv:2407.19667  [pdf, other

    cs.AI

    Smart Language Agents in Real-World Planning

    Authors: Annabelle Miin, Timothy Wei

    Abstract: Comprehensive planning agents have been a long term goal in the field of artificial intelligence. Recent innovations in Natural Language Processing have yielded success through the advent of Large Language Models (LLMs). We seek to improve the travel-planning capability of such LLMs by extending upon the work of the previous paper TravelPlanner. Our objective is to explore a new method of using LL… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 5 pages, 1 figure

  3. arXiv:2407.19079  [pdf, other

    cs.CV

    UniForensics: Face Forgery Detection via General Facial Representation

    Authors: Ziyuan Fang, Hanqing Zhao, Tianyi Wei, Wenbo Zhou, Ming Wan, Zhanyi Wang, Weiming Zhang, Nenghai Yu

    Abstract: Previous deepfake detection methods mostly depend on low-level textural features vulnerable to perturbations and fall short of detecting unseen forgery methods. In contrast, high-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. Motivated by this, we propose a detection method that utilizes high-level s… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  4. arXiv:2407.15815  [pdf, other

    cs.RO cs.AI cs.CV

    Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

    Authors: Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu

    Abstract: Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning app… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Webpage: https://gemcollector.github.io/maniwhere/

  5. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 24 pages

  6. arXiv:2407.08348  [pdf, other

    cs.AI cs.CL cs.LG

    Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

    Authors: Liang Zeng, Liangjun Zhong, Liang Zhao, Tianwen Wei, Liu Yang, Jujie He, Cheng Cheng, Rui Hu, Yang Liu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.01639  [pdf, other

    cs.LG cs.SE

    ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

    Authors: Tianhao Wei, Luca Marzari, Kai S. Yun, Hanjiang Hu, Peizhi Niu, Xusheng Luo, Changliu Liu

    Abstract: Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the fir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2406.16204  [pdf, other

    cs.CV

    Breaking the Frame: Image Retrieval by Visual Overlap Prediction

    Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

    Abstract: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes by shifting from traditional reliance on global image similarities and local features to image overlap prediction. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  9. arXiv:2406.15863  [pdf, other

    cs.CV

    EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation

    Authors: Tianyu Wei, Shanmin Pang, Qi Guo, Yizhuo Ma, Qing Guo

    Abstract: Text-to-image diffusion models can create realistic images based on input texts. Users can describe an object to convey their opinions visually. In this work, we unveil a previously unrecognized and latent risk of using diffusion models to generate images; we utilize emotion in the input texts to introduce negative contents, potentially eliciting unfavorable emotions in users. Emotions play a cruc… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  10. arXiv:2406.14056  [pdf, other

    cs.CV

    VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

    Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

    Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages

    MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

  11. arXiv:2406.13187  [pdf, other

    cs.LG

    Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning

    Authors: Kai Gan, Tong Wei, Min-Ling Zhang

    Abstract: While long-tailed semi-supervised learning (LTSSL) has received tremendous attention in many real-world classification problems, existing LTSSL algorithms typically assume that the class distributions of labeled and unlabeled data are almost identical. Those LTSSL algorithms built upon the assumption can severely suffer when the class distributions of labeled and unlabeled data are mismatched sinc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.12638  [pdf, other

    cs.CV cs.LG

    Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model

    Authors: Jiang-Xin Shi, Chi Zhang, Tong Wei, Yu-Feng Li

    Abstract: Pre-trained vision-language models like CLIP have shown powerful zero-shot inference ability via image-text matching and prove to be strong few-shot learners in various downstream tasks. However, in real-world scenarios, adapting CLIP to downstream tasks may encounter the following challenges: 1) data may exhibit long-tailed data distributions and might not have abundant samples for all the classe… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  13. arXiv:2406.10801  [pdf, other

    cs.CV

    Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

    Authors: Tianyunxi Wei, Yijin Huang, Li Lin, Pujin Cheng, Sirui Li, Xiaoying Tang

    Abstract: Medical image datasets often exhibit long-tailed distributions due to the inherent challenges in medical data collection and annotation. In long-tailed contexts, some common disease categories account for most of the data, while only a few samples are available in the rare disease categories, resulting in poor performance of deep learning methods. To address this issue, previous approaches have em… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: IEEE ISBI2024

  14. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  16. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  17. arXiv:2406.01884  [pdf, other

    cs.CV

    Rank-based No-reference Quality Assessment for Face Swapping

    Authors: Xinghui Zhou, Wenbo Zhou, Tianyi Wei, Shen Chen, Taiping Yao, Shouhong Ding, Weiming Zhang, Nenghai Yu

    Abstract: Face swapping has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swapping methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  18. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  19. arXiv:2405.11756  [pdf, other

    cs.LG

    Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning

    Authors: Kai Gan, Tong Wei

    Abstract: Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in the emergence of numerous method variations. However, practitioners often encounter challenges when attempting to deploy these methods due to their subpar performance. In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models. We i… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  20. arXiv:2405.11380  [pdf, other

    cs.RO cs.AI eess.SY

    Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills

    Authors: Tianhao Wei, Liqian Ma, Rui Chen, Weiye Zhao, Changliu Liu

    Abstract: The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development… ▽ More

    Submitted 7 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  21. arXiv:2405.11135  [pdf, other

    cs.CR

    AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA

    Authors: Weitao Feng, Wenbo Zhou, Jiyan He, Jie Zhang, Tianyi Wei, Guanlin Li, Tianwei Zhang, Weiming Zhang, Nenghai Yu

    Abstract: Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution a… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/Georgefwt/AquaLoRA

  22. arXiv:2405.05594  [pdf, other

    cs.AI

    Expected Work Search: Combining Win Rate and Proof Size Estimation

    Authors: Owen Randall, Martin Müller, Ting Han Wei, Ryan Hayward

    Abstract: We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  23. arXiv:2404.19141  [pdf, other

    cs.LG

    Micro-Macro Spatial-Temporal Graph-based Encoder-Decoder for Map-Constrained Trajectory Recovery

    Authors: Tonglong Wei, Youfang Lin, Yan Lin, Shengnan Guo, Lan Zhang, Huaiyu Wan

    Abstract: Recovering intermediate missing GPS points in a sparse trajectory, while adhering to the constraints of the road network, could offer deep insights into users' moving behaviors in intelligent transportation systems. Although recent studies have demonstrated the advantages of achieving map-constrained trajectory recovery via an end-to-end manner, they still face two significant challenges. Firstly,… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted as a regular paper at IEEE TKDE

  24. arXiv:2404.12450  [pdf, other

    cs.CV cs.AI cs.LG

    Enhancing AI Diagnostics: Autonomous Lesion Masking via Semi-Supervised Deep Learning

    Authors: Ting-Ruen Wei, Michele Hell, Dang Bich Thuy Le, Aren Vierra, Ran Pang, Mahesh Patel, Young Kang, Yuling Yan

    Abstract: This study presents an unsupervised domain adaptation method aimed at autonomously generating image masks outlining regions of interest (ROIs) for differentiating breast lesions in breast ultrasound (US) imaging. Our semi-supervised learning approach utilizes a primitive model trained on a small public breast US dataset with true annotations. This model is then iteratively refined for the domain a… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  25. arXiv:2403.10667  [pdf, other

    cs.IR cs.AI cs.CL cs.MM

    Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

    Authors: Tianxin Wei, Bowen Jin, Ruirui Li, Hansi Zeng, Zhengyang Wang, Jianhui Sun, Qingyu Yin, Hanqing Lu, Suhang Wang, Jingrui He, Xianfeng Tang

    Abstract: Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater t… ▽ More

    Submitted 27 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  26. arXiv:2403.07279  [pdf, other

    cs.CL

    A Survey of Explainable Knowledge Tracing

    Authors: Yanhong Bai, Jiabao Zhao, Tingjiang Wei, Qing Cai, Liang He

    Abstract: With the long term accumulation of high quality educational data, artificial intelligence has shown excellent performance in knowledge tracing. However, due to the lack of interpretability and transparency of some algorithms, this approach will result in reduced stakeholder trust and a decreased acceptance of intelligent decisions. Therefore, algorithms need to achieve high accuracy, and users nee… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  27. arXiv:2402.14623  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

    Authors: Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

    Abstract: Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 10 pages of main paper, 4 pages of appendix; 10 figures in main paper, 3 figures in appendix

    ACM Class: I.2.7; I.2.8; I.2.9; I.2.10

  28. arXiv:2402.10892  [pdf, other

    cs.CR cs.CL cs.LG

    Proving membership in LLM pretraining data via data watermarks

    Authors: Johnny Tian-Zheng Wei, Ryan Yixiang Wang, Robin Jia

    Abstract: Detecting whether copyright holders' works were used in LLM pretraining is poised to be an important problem. This work proposes using data watermarks to enable principled detection with only black-box model access, provided that the rightholder contributed multiple training documents and watermarked them before public release. By applying a randomly sampled data watermark, detection can be framed… ▽ More

    Submitted 10 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  29. arXiv:2402.09173  [pdf, other

    cs.LG

    Nearly Optimal Regret for Decentralized Online Convex Optimization

    Authors: Yuanyu Wan, Tong Wei, Mingli Song, Lijun Zhang

    Abstract: We investigate decentralized online convex optimization (D-OCO), in which a set of local learners are required to minimize a sequence of global loss functions using only local computations and communications. Previous studies have established $O(n^{5/4}ρ^{-1/2}\sqrt{T})$ and ${O}(n^{3/2}ρ^{-1}\log T)$ regret bounds for convex and strongly convex functions respectively, where $n$ is the number of l… ▽ More

    Submitted 23 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  30. arXiv:2402.07369  [pdf, other

    cs.LG

    Diff-RNTraj: A Structure-aware Diffusion Model for Road Network-constrained Trajectory Generation

    Authors: Tonglong Wei, Youfang Lin, Shengnan Guo, Yan Lin, Yiheng Huang, Chenyang Xiang, Yuqing Bai, Menglu Ya, Huaiyu Wan

    Abstract: Trajectory data is essential for various applications as it records the movement of vehicles. However, publicly available trajectory datasets remain limited in scale due to privacy concerns, which hinders the development of trajectory data mining and trajectory-based applications. To address this issue, some methods for generating synthetic trajectories have been proposed to expand the scale of th… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  31. Monte Carlo Tree Search in the Presence of Transition Uncertainty

    Authors: Farnaz Kohankhaki, Kiarash Aghakasiri, Hongming Zhang, Ting-Han Wei, Chao Gao, Martin Müller

    Abstract: Monte Carlo Tree Search (MCTS) is an immensely popular search-based framework used for decision making. It is traditionally applied to domains where a perfect simulation model of the environment is available. We study and improve MCTS in the context where the environment model is given but imperfect. We show that the discrepancy between the model and the actual environment can lead to significant… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  32. arXiv:2312.08939  [pdf, other

    cs.LG cs.CV

    EAT: Towards Long-Tailed Out-of-Distribution Detection

    Authors: Tong Wei, Bo-Lin Wang, Min-Ling Zhang

    Abstract: Despite recent advancements in out-of-distribution (OOD) detection, most current studies assume a class-balanced in-distribution training dataset, which is rarely the case in real-world scenarios. This paper addresses the challenging task of long-tailed OOD detection, where the in-distribution data follows a long-tailed class distribution. The main difficulty lies in distinguishing OOD data from s… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Paper accepted by AAAI 2024

  33. arXiv:2312.07865  [pdf, other

    cs.CV

    SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

    Authors: Feifei Wang, Zhentao Tan, Tianyi Wei, Yue Wu, Qidong Huang

    Abstract: Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives. To tackle this issue, several anti-customization methods have been proposed in very recent months, predominantly grounded in adversarial attacks. Unfortunately, most of these methods adopt straightforward d… ▽ More

    Submitted 30 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024

  34. arXiv:2312.04584  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Towards Sample-specific Backdoor Attack with Clean Labels via Attribute Trigger

    Authors: Yiming Li, Mingyan Zhu, Junfeng Guo, Tao Wei, Shu-Tao Xia, Zhan Qin

    Abstract: Currently, sample-specific backdoor attacks (SSBAs) are the most advanced and malicious methods since they can easily circumvent most of the current backdoor defenses. In this paper, we reveal that SSBAs are not sufficiently stealthy due to their poisoned-label nature, where users can discover anomalies if they check the image-label relationship. In particular, we demonstrate that it is ineffectiv… ▽ More

    Submitted 10 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: 14 pages

  35. Robust Basket Recommendation via Noise-tolerated Graph Contrastive Learning

    Authors: Xinrui He, Tianxin Wei, Jingrui He

    Abstract: The growth of e-commerce has seen a surge in popularity of platforms like Amazon, eBay, and Taobao. This has given rise to a unique shopping behavior involving baskets - sets of items purchased together. As a less studied interaction mode in the community, the question of how should shopping basket complement personalized recommendation systems remains under-explored. While previous attempts focus… ▽ More

    Submitted 30 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: CIKM 2023

    Journal ref: In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23). Association for Computing Machinery, New York, NY, USA, 709-719 (2023)

  36. arXiv:2311.11898  [pdf, other

    cs.RO

    Multimodal Safe Control for Human-Robot Interaction

    Authors: Ravi Pandya, Tianhao Wei, Changliu Liu

    Abstract: Generating safe behaviors for autonomous systems is important as they continue to be deployed in the real world, especially around people. In this work, we focus on developing a novel safe controller for systems where there are multiple sources of uncertainty. We formulate a novel multimodal safe control method, called the Multimodal Safe Set Algorithm (MMSSA) for the case where the agent has unce… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: American Control Conference (ACC) 2024

  37. arXiv:2311.09134  [pdf, other

    cs.IR

    Scalable and Effective Generative Information Retrieval

    Authors: Hansi Zeng, Chen Luo, Bowen Jin, Sheikh Muhammad Sarwar, Tianxin Wei, Hamed Zamani

    Abstract: Recent research has shown that transformer networks can be used as differentiable search indexes by representing each document as a sequences of document ID tokens. These generative retrieval models cast the retrieval problem to a document ID generation problem for each given query. Despite their elegant design, existing generative retrieval models only perform well on artificially-constructed and… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  38. arXiv:2311.07178  [pdf, other

    cs.AI cs.GT cs.LG

    Game Solving with Online Fine-Tuning

    Authors: Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin, I-Chen Wu

    Abstract: Game solving is a similar, yet more difficult task than mastering a game. Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outcome. The AlphaZero algorithm has demonstrated super-human level play, and its powerful policy and value predictions have also served as heuristics in game solving… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted by the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  39. arXiv:2310.19341  [pdf, other

    cs.CL cs.AI

    Skywork: A More Open Bilingual Foundation Model

    Authors: Tianwen Wei, Liang Zhao, Lichang Zhang, Bo Zhu, Lijie Wang, Haihua Yang, Biye Li, Cheng Cheng, Weiwei Lü, Rui Hu, Chenxia Li, Liu Yang, Xilin Luo, Xuejie Wu, Lunan Liu, Wenjun Cheng, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Lei Lin, Xiaokun Wang, Yutuan Ma, Chuanhai Dong, Yanqi Sun, Yifu Chen , et al. (5 additional authors not shown)

    Abstract: In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose tr… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  40. arXiv:2310.18816  [pdf, other

    cs.LG

    Adaptive Test-Time Personalization for Federated Learning

    Authors: Wenxuan Bao, Tianxin Wei, Haohan Wang, Jingrui He

    Abstract: Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  41. arXiv:2310.18236  [pdf, other

    cs.CV cs.LG

    How Re-sampling Helps for Long-Tail Learning?

    Authors: Jiang-Xin Shi, Tong Wei, Yuke Xiang, Yu-Feng Li

    Abstract: Long-tail learning has received significant attention in recent years due to the challenge it poses with extremely imbalanced datasets. In these datasets, only a few classes (known as the head classes) have an adequate number of training samples, while the rest of the classes (known as the tail classes) are infrequent in the training data. Re-sampling is a classical and widely used approach for ad… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  42. arXiv:2310.16713  [pdf, other

    cs.CL cs.AI

    SkyMath: Technical Report

    Authors: Liu Yang, Haihua Yang, Wenjun Cheng, Lei Lin, Chenxia Li, Yifu Chen, Lunan Liu, Jianfei Pan, Tianwen Wei, Biye Li, Liang Zhao, Lijie Wang, Bo Zhu, Guoliang Li, Xuejie Wu, Xilin Luo, Rui Hu

    Abstract: Large language models (LLMs) have shown great potential to solve varieties of natural language processing (NLP) tasks, including mathematical reasoning. In this work, we present SkyMath, a large language model for mathematics with 13 billion parameters. By applying self-compare fine-tuning, we have enhanced mathematical reasoning abilities of Skywork-13B-Base remarkably. On GSM8K, SkyMath outperfo… ▽ More

    Submitted 26 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

  43. arXiv:2310.13230  [pdf, other

    cs.LG cs.AI cs.RO

    Absolute Policy Optimization

    Authors: Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu

    Abstract: In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a nov… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: published in ICML 2024

  44. arXiv:2310.12111  [pdf, other

    eess.AS cs.AI

    DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

    Authors: Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng

    Abstract: Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNNs) models. Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented samples lack diversity. In this paper, we present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification, which can generate divers… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2023

  45. arXiv:2310.11305  [pdf, other

    cs.AI cs.LG

    MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

    Authors: Ti-Rong Wu, Hung Guei, Pei-Chiun Peng, Po-Wei Huang, Ting Han Wei, Chung-Chin Shih, Yun-Jui Tsai

    Abstract: This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero. While these algorithms have demonstrated super-human performance in many games, it remains unclear which among them is most suitable or efficient for specific tasks. Through MiniZero, we systematically evaluate the perfo… ▽ More

    Submitted 26 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Transactions on Games

  46. arXiv:2310.10651  [pdf, other

    cs.CV cs.GR

    HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

    Authors: Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

    Abstract: Hair editing has made tremendous progress in recent years. Early hair editing methods use well-drawn sketches or masks to specify the editing conditions. Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images. Thanks to the recent breakthrough of cross-m… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: ICCV 2023, code is available at https://github.com/wty-ustc/HairCLIPv2

  47. arXiv:2310.07815  [pdf, other

    cs.IR cs.CL cs.LG

    Language Models As Semantic Indexers

    Authors: Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, Xianfeng Tang

    Abstract: Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs. Previous studies typically adopt a two-stage pipeline to learn semantic IDs by first procuring embeddings using off-the-shelf text encoders and then deriving IDs based on the embeddings. However, each step introduces potential inform… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 appendix pages

  48. arXiv:2310.04992  [pdf, other

    eess.IV cs.CV

    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  49. arXiv:2309.16830  [pdf, other

    cs.RO

    Robust Safe Control with Multi-Modal Uncertainty

    Authors: Tianhao Wei, Liqian Ma, Ravi Pandya, Changliu Liu

    Abstract: Safety in dynamic systems with prevalent uncertainties is crucial. Current robust safe controllers, designed primarily for uni-modal uncertainties, may be either overly conservative or unsafe when handling multi-modal uncertainties. To address the problem, we introduce a novel framework for robust safe control, tailored to accommodate multi-modal Gaussian dynamics uncertainties and control limits.… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  50. arXiv:2309.11930  [pdf, other

    cs.LG cs.CV

    Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning

    Authors: Bo Ye, Kai Gan, Tong Wei, Min-Ling Zhang

    Abstract: In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data while maintaining performance on seen categories from labeled data. The central challenge is the substantial learning gap between seen and novel categories, as the model learns the former faster due to accurate supervisory information. Moreover, capturing the semantics of… ▽ More

    Submitted 17 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.