Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 599 results for author: Lin, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04688  [pdf, other

    cs.CV

    Enhancing Vehicle Re-identification and Matching for Weaving Analysis

    Authors: Mei Qiu, Wei Lin, Stanley Chien, Lauren Christopher, Yaobin Chen, Shu Hu

    Abstract: Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantit… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.02098  [pdf, other

    cs.CV

    DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection

    Authors: Kaixin Xu, Qingtian Feng, Hao Chen, Zhe Wang, Xue Geng, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Applying deep neural networks to 3D point cloud processing has attracted increasing attention due to its advanced performance in many areas, such as AR/VR, autonomous driving, and robotics. However, as neural network models and 3D point clouds expand in size, it becomes a crucial challenge to reduce the computational and memory overhead to meet latency and energy constraints in real-world applicat… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.02068  [pdf, other

    cs.CV

    LPViT: Low-Power Semi-structured Pruning for Vision Transformers

    Authors: Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2407.01621  [pdf, other

    cs.LG q-bio.QM stat.ME stat.ML

    Deciphering interventional dynamical causality from non-intervention systems

    Authors: Jifan Shi, Yang Li, Juan Zhao, Siyang Leng, Kazuyuki Aihara, Luonan Chen, Wei Lin

    Abstract: Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. To address this challenge, we propose a framework named Interventional Dynamical Causality (IntDC) for such non-intervention systems, along with its computational crite… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  5. Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation

    Authors: Yuting Zhang, Yiqing Wu, Ruidong Han, Ying Sun, Yongchun Zhu, Xiang Li, Wei Lin, Fuzhen Zhuang, Zhulin An, Yongjun Xu

    Abstract: Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2407.00599  [pdf, other

    cs.DC cs.LG

    Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

    Authors: Xinglin Pan, Wenxiang Lin, Shaohuai Shi, Xiaowen Chu, Weinong Sun, Bo Li

    Abstract: Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, th… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  7. arXiv:2407.00435  [pdf, other

    cs.GR

    RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering

    Authors: Weikai Lin, Yu Feng, Yuhao Zhu

    Abstract: Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging. This paper proposes RTGS, a PBNR system that for the firs… ▽ More

    Submitted 2 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: I.3; I.2

  8. arXiv:2407.00009  [pdf, other

    cs.DC cs.NI

    An Open-Source Fast Parallel Routing Approach for Commercial FPGAs

    Authors: Xinshi Zang, Wenhao Lin, Shiju Lin, Jinwei Liu, Evangeline F. Y. Young

    Abstract: In the face of escalating complexity and size of contemporary FPGAs and circuits, routing emerges as a pivotal and time-intensive phase in FPGA compilation flows. In response to this challenge, we present an open-source parallel routing methodology designed to expedite routing procedures for commercial FPGAs. Our approach introduces a novel recursive partitioning ternary tree to augment the parall… ▽ More

    Submitted 25 April, 2024; originally announced July 2024.

  9. arXiv:2406.17871  [pdf, other

    cs.DB

    Revisiting the Expressiveness Landscape of Data Graph Queries

    Authors: Michael Benedikt, Anthony Widjaja Lin, Di-De Yen

    Abstract: The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. These languages differ in the mechanisms. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper prov… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.14017  [pdf, other

    cs.IR

    EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

    Authors: Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

    Abstract: Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024. Code available at https://reczoo.github.io/EAGER

  11. GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

    Authors: Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning, Yunhua Hu, Siqiao Xue

    Abstract: Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g.,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.09904  [pdf, other

    cs.LG

    QQQ: Quality Quattuor-Bit Quantization for Large Language Models

    Authors: Ying Zhang, Peng Zhang, Mincong Huang, Jingyang Xiang, Yujie Wang, Chao Wang, Yineng Zhang, Lei Yu, Chuan Liu, Wei Lin

    Abstract: Quantization is a proven effective method for compressing large language models. Although popular techniques like W8A8 and W4A16 effectively maintain model performance, they often fail to concurrently speed up the prefill and decoding stages of inference. W4A8 is a promising strategy to accelerate both of them while usually leads to a significant performance degradation. To address these issues, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  13. arXiv:2406.09356  [pdf, other

    cs.CV eess.IV

    CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

    Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.09240  [pdf, other

    cs.CV

    Comparison Visual Instruction Tuning

    Authors: Wei Lin, Muhammad Jehanzeb Mirza, Sivan Doveh, Rogerio Feris, Raja Giryes, Sepp Hochreiter, Leonid Karlinsky

    Abstract: Comparing two images in terms of Commonalities and Differences (CaD) is a fundamental human capability that forms the basis of advanced visual reasoning and interpretation. It is essential for the generation of detailed and contextually relevant descriptions, performing comparative analysis, novelty detection, and making informed decisions based on visual data. However, surprisingly, little attent… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://wlin-at.github.io/cad_vi ; Huggingface dataset repo: https://huggingface.co/datasets/wlin21at/CaD-Inst

  15. arXiv:2406.08444  [pdf, other

    cs.CV

    PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

    Authors: Wei-Tung Lin, Yong-Xiang Lin, Jyun-Wei Chen, Kai-Lung Hua

    Abstract: Underwater Image Enhancement (UIE) is critical for marine research and exploration but hindered by complex color distortions and severe blurring. Recent deep learning-based methods have achieved remarkable results, yet these methods struggle with high computational costs and insufficient global modeling, resulting in locally under- or over- adjusted regions. We present PixMamba, a novel architectu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  16. arXiv:2406.08164  [pdf, other

    cs.CV

    ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

    Authors: Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob A. Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuhene, Trevor Darrel, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky

    Abstract: Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts a crucial question: have VLMs effectively tackled the CR challenge? We conjecture that existing CR benchmark… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: The first three authors contributed equally

  17. arXiv:2406.06213  [pdf, ps, other

    cs.LG cs.AI stat.AP stat.ML

    A Statistical Theory of Regularization-Based Continual Learning

    Authors: Xuyang Zhao, Huiyuan Wang, Weiran Huang, Wei Lin

    Abstract: We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next, we consider a family of generalized $\ell_2$-regularization algorithms index… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  18. arXiv:2406.04755  [pdf, other

    cs.CR cs.AI cs.HC cs.LG

    Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations

    Authors: Weiran Lin, Anna Gerchanovsky, Omer Akgul, Lujo Bauer, Matt Fredrikson, Zifan Wang

    Abstract: Large language model (LLM) users might rely on others (e.g., prompting services), to write prompts. However, the risks of trusting prompts written by others remain unstudied. In this paper, we assess the risk of using such prompts on brand recommendation tasks when shopping. First, we found that paraphrasing prompts can result in LLMs mentioning given brands with drastically different probabilitie… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  19. arXiv:2406.03243  [pdf, other

    cs.AR cs.DC cs.LG

    Llumnix: Dynamic Scheduling for Large Language Model Serving

    Authors: Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin

    Abstract: Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamen… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: To appear at OSDI '24; open-source repo will be available in June 2024

  20. arXiv:2406.03070  [pdf, other

    cs.CV cs.AI

    A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

    Authors: Zicheng Zhang, Haoning Wu, Chunyi Li, Yingjie Zhou, Wei Sun, Xiongkuo Min, Zijian Chen, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models. Given the high costs and extensive time commitments required for user studies, many researchers have turned towards employing large multi-modal models (LMMs) as AIGI evaluators, the precision and validity of which are still questionable. Furthermore, traditional benchmarks often… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  21. arXiv:2406.02154  [pdf, other

    math-ph cs.LG

    Learning Hamiltonian neural Koopman operator and simultaneously sustaining and discovering conservation law

    Authors: Jingdong Zhang, Qunxi Zhu, Wei Lin

    Abstract: Accurately finding and predicting dynamics based on the observational data with noise perturbations is of paramount significance but still a major challenge presently. Here, for the Hamiltonian mechanics, we propose the Hamiltonian Neural Koopman Operator (HNKO), integrating the knowledge of mathematical physics in learning the Koopman operator, and making it automatically sustain and even discove… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.01954  [pdf, other

    cs.CV

    Plug-and-Play Diffusion Distillation

    Authors: Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte, Wei-An Lin, Hui Qu, Mingi Kwon, Ratheesh Kalarot

    Abstract: Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guidance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We sh… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 project page: https://5410tiffany.github.io/plug-and-play-diffusion-distillation.github.io/

  23. arXiv:2406.00706  [pdf, other

    cs.RO

    MINER-RRT*: A Hierarchical and Fast Trajectory Planning Framework in 3D Cluttered Environments

    Authors: Pengyu Wang, Jiawei Tang, Hin Wang Lin, Fan Zhang, Chaoqun Wang, Jiankun Wang, Ling Shi, Max Q. -H. Meng

    Abstract: Trajectory planning for quadrotors in cluttered environments has been challenging in recent years. While many trajectory planning frameworks have been successful, there still exists potential for improvements, particularly in enhancing the speed of generating efficient trajectories. In this paper, we present a novel hierarchical trajectory planning framework to reduce computational time and memory… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  24. arXiv:2405.20291  [pdf, other

    cs.CR cs.CV cs.LG

    Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

    Authors: Weilin Lin, Li Liu, Shaokui Wei, Jianze Li, Hui Xiong

    Abstract: The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  25. arXiv:2405.19740  [pdf, other

    cs.CL cs.AI cs.CY

    PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

    Authors: Jiatong Li, Renjun Hu, Kunzhe Huang, Yan Zhuang, Qi Liu, Mengxiao Zhu, Xing Shi, Wei Lin

    Abstract: Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through k… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 12 figures, 10 tables

  26. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  27. arXiv:2405.18371  [pdf, other

    quant-ph cs.AR cs.DC

    ML-QLS: Multilevel Quantum Layout Synthesis

    Authors: Wan-Hsuan Lin, Jason Cong

    Abstract: Quantum Layout Synthesis (QLS) plays a crucial role in optimizing quantum circuit execution on physical quantum devices. As we enter the era where quantum computers have hundreds of qubits, we are faced with scalability issues using optimal approaches and degrading heuristic methods' performance due to the lack of global optimization. To this end, we introduce a hybrid design that obtains the much… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  28. arXiv:2405.17890  [pdf, other

    cs.IR cs.CL cs.LG

    SLMRec: Empowering Small Language Models for Sequential Recommendation

    Authors: Wujiang Xu, Zujie Liang, Jiaojiao Han, Xuying Ning, Wenfang Lin, Linxun Chen, Feng Wei, Yongfeng Zhang

    Abstract: The sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as la… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  29. arXiv:2405.16166  [pdf, other

    cs.FL

    The Power of Hard Attention Transformers on Data Sequences: A Formal Language Theoretic Perspective

    Authors: Pascal Bergsträßer, Chris Köcher, Anthony Widjaja Lin, Georg Zetzsche

    Abstract: Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Languange Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer. On certain kinds of data (e.g. time series), we want our transformers to be… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  30. arXiv:2405.16041  [pdf, other

    cs.LG cs.AI

    Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

    Authors: Zhenzhong Wang, Zehui Lin, Wanyu Lin, Ming Yang, Minggang Zeng, Kay Chen Tan

    Abstract: Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop… ▽ More

    Submitted 31 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  31. arXiv:2405.15403  [pdf, other

    cs.LG stat.ML

    Fine-Grained Dynamic Framework for Bias-Variance Joint Optimization on Data Missing Not at Random

    Authors: Mingming Ha, Xuewen Tao, Wenfang Lin, Qionxu Ma, Wujiang Xu, Linxun Chen

    Abstract: In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values and those missing values are generally missing-not-at-random, which deteriorates the prediction performance of models. Some existing estimators and regularizers attempt to achieve unbiased estimation to improve the predictive performance. However, varia… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  32. arXiv:2405.15252  [pdf, other

    cs.LG

    Fast 3D Molecule Generation via Unified Geometric Optimal Transport

    Authors: Haokai Hong, Wanyu Lin, Kay Chen Tan

    Abstract: This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data dist… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  33. arXiv:2405.15095  [pdf, other

    cs.ET quant-ph

    Compilation for Dynamically Field-Programmable Qubit Arrays with Efficient and Provably Near-Optimal Scheduling

    Authors: Daniel Bochen Tan, Wan-Hsuan Lin, Jason Cong

    Abstract: Dynamically field-programmable qubit arrays based on neutral atoms have high fidelity and highly parallel gates for quantum computing. However, it is challenging for compilers to fully leverage the novel flexibility offered by such hardware while respecting its various constraints. In this study, we break down the compilation for this architecture into three tasks: scheduling, placement, and routi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.12186  [pdf, other

    cs.LG

    Training Data Attribution via Approximate Unrolled Differentiation

    Authors: Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse

    Abstract: Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. B… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  35. arXiv:2405.11852  [pdf, other

    cs.CV

    Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models

    Authors: Xiyu Wang, Yufei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot

    Abstract: Diffusion-based models for story visualization have shown promise in generating content-coherent images for storytelling tasks. However, how to effectively integrate new characters into existing narratives while maintaining character consistency remains an open problem, particularly with limited data. Two major limitations hinder the progress: (1) the absence of a suitable benchmark due to potenti… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  36. arXiv:2405.11605  [pdf, other

    cs.LG

    Switched Flow Matching: Eliminating Singularities via Switching ODEs

    Authors: Qunxi Zhu, Wei Lin

    Abstract: Continuous-time generative models, such as Flow Matching (FM), construct probability paths to transport between one distribution and another through the simulation-free learning of the neural ordinary differential equations (ODEs). During inference, however, the learned model often requires multiple neural network evaluations to accurately integrate the flow, resulting in a slow sampling speed. We… ▽ More

    Submitted 23 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted in ICML 2024

  37. arXiv:2405.11542  [pdf, other

    cs.LG physics.ed-ph

    From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems

    Authors: Xin Li, Jingdong Zhang, Qunxi Zhu, Chengli Zhao, Xue Zhang, Xiaojun Duan, Wei Lin

    Abstract: Modeling complex systems using standard neural ordinary differential equations (NODEs) often faces some essential challenges, including high computational costs and susceptibility to local optima. To address these challenges, we propose a simulation-free framework, called Fourier NODEs (FNODEs), that effectively trains NODEs by directly matching the target vector field based on Fourier analysis. S… ▽ More

    Submitted 22 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  38. arXiv:2405.09157  [pdf, other

    math.OC cs.CG cs.DC cs.DS

    A Primal-Dual Framework for Symmetric Cone Programming

    Authors: Jiaqi Zheng, Antonios Varvitsiotis, Tiow-Seng Tan, Wayne Lin

    Abstract: In this paper, we introduce a primal-dual algorithmic framework for solving Symmetric Cone Programs (SCPs), a versatile optimization model that unifies and extends Linear, Second-Order Cone (SOCP), and Semidefinite Programming (SDP). Our work generalizes the primal-dual framework for SDPs introduced by Arora and Kale, leveraging a recent extension of the Multiplicative Weights Update method (MWU)… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  39. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  40. arXiv:2405.08555  [pdf, other

    cs.CV cs.MM

    Dual-Branch Network for Portrait Image Quality Assessment

    Authors: Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing de… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  41. arXiv:2405.06914  [pdf, other

    cs.CV

    Non-confusing Generation of Customized Concepts in Diffusion Models

    Authors: Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang

    Abstract: We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs). It becomes even more pronounced in the generation of customized concepts, due to the scarcity of user-provided concept visual examples. By revisiting the two major stages leading to the success of TGDMs -- 1) contrastive image-language pre-training (CLIP)… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  42. arXiv:2405.00946  [pdf, other

    cs.LG

    SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters

    Authors: Shengsheng Lin, Weiwei Lin, Wentai Wu, Haojun Chen, Junjie Yang

    Abstract: This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity… ▽ More

    Submitted 3 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  43. arXiv:2405.00308  [pdf

    cs.CR stat.AP

    FPGA Digital Dice using Pseudo Random Number Generator

    Authors: Michael Lim Kee Hian, Ten Wei Lin, Zachary Wu Xuan, Stephanie-Ann Loy, Maoyang Xiang, T. Hui Teo

    Abstract: The goal of this project is to design a digital dice that displays dice numbers in real-time. The number is generated by a pseudo-random number generator (PRNG) using XORshift algorithm that is implemented in Verilog HDL on an FPGA. The digital dice is equipped with tilt sensor, display, power management circuit, and rechargeable battery hosted in a 3D printed dice casing. By shaking the digital d… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 15 pages, 5 figures

  44. arXiv:2404.18343  [pdf, other

    cs.MM cs.CV

    G-Refine: A General Quality Refiner for Text-to-Image Generation

    Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  45. arXiv:2404.18203  [pdf, other

    cs.CV cs.AI

    LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

    Authors: Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs thro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  46. arXiv:2404.17164  [pdf, other

    cs.LG

    DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

    Authors: Xindi Zheng, Yuwei Wu, Yu Pan, Wanyu Lin, Lei Ma, Jianjun Zhao

    Abstract: Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a n… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages

  47. arXiv:2404.17128  [pdf, other

    q-bio.NC cs.SI

    Network Structure Trumps Neuron Dynamics: Insights from Drosophila Connectome Simulations

    Authors: Xiaoyu Zhang, Pengcheng Yang, Jiawei Feng, Qiang Luo, Wei Lin, Xin Lu

    Abstract: Despite the success of artificial neural networks, the necessity of real network structures in simulating intelligence remains unclear. Utilizing the largest adult Drosophila connectome data set, we constructed a large-scale network communication model framework based on simple neuronal activation mechanisms to simulate the activation behavior observed in the connectome. The results demonstrate th… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  49. arXiv:2404.15212  [pdf, other

    cs.CV eess.IV

    Real-time Lane-wise Traffic Monitoring in Optimal ROIs

    Authors: Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu

    Abstract: In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automa… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

  50. arXiv:2404.13306  [pdf, other

    cs.CV cs.MM

    FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models

    Authors: Yixuan Li, Xuelin Liu, Xiaoyang Wang, Shiqi Wang, Weisi Lin

    Abstract: Recently, fake images generated by artificial intelligence (AI) models have become indistinguishable from the real, exerting new challenges for fake image detection models. To this extent, simple binary judgments of real or fake seem less convincing and credible due to the absence of human-understandable explanations. Fortunately, Large Multimodal Models (LMMs) bring possibilities to materialize t… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.