Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 224 results for author: Liao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  2. arXiv:2408.11810  [pdf, other

    cs.CV

    Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

    Authors: Chun-Yen Shih, Li-Xuan Peng, Jia-Wei Liao, Ernie Chu, Cheng-Fu Chou, Jun-Cheng Chen

    Abstract: Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imper… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2408.11278  [pdf, other

    cs.CV

    The Key of Parameter Skew in Federated Learning

    Authors: Sifan Wang, Junfeng Liao, Ye Yuan, Riquan Zhang

    Abstract: Federated Learning (FL) has emerged as an excellent solution for performing deep learning on different data owners without exchanging raw data. However, statistical heterogeneity in FL presents a key challenge, leading to a phenomenon of skewness in local model parameter distributions that researchers have largely overlooked. In this work, we propose the concept of parameter skew to describe the p… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.10136  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Robust spectral clustering with rank statistics

    Authors: Joshua Cape, Xianshi Yu, Jonquil Z. Liao

    Abstract: This paper analyzes the statistical performance of a robust spectral clustering method for latent structure recovery in noisy data matrices. We consider eigenvector-based clustering applied to a matrix of nonparametric rank statistics that is derived entrywise from the raw, original data matrix. This approach is robust in the sense that, unlike traditional spectral clustering procedures, it can pr… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 82 pages, 8 figures, 1 table

    MSC Class: 62H12; 62H30; 62G35

  5. arXiv:2408.08902  [pdf, other

    cs.CR cs.AI

    Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection

    Authors: Chengyu Song, Linru Ma, Jianming Zheng, Jinzhi Liao, Hongyu Kuang, Lin Yang

    Abstract: Log-based insider threat detection (ITD) detects malicious user activities by auditing log entries. Recently, large language models (LLMs) with strong common sense knowledge have emerged in the domain of ITD. Nevertheless, diverse activity types and overlong log files pose a significant challenge for LLMs in directly discerning malicious ones within myriads of normal activities. Furthermore, the f… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 12 pages, 5 figures

  6. arXiv:2408.08056  [pdf, other

    cs.LG

    DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World

    Authors: Chuyang Ye, Dongyan Wei, Zhendong Liu, Yuanyi Pang, Yixi Lin, Jiarong Liao, Qinting Jiang, Xianghua Fu, Qing Li, Jingyan Jiang

    Abstract: Test-time adaptation (TTA) effectively addresses distribution shifts between training and testing data by adjusting models on test samples, which is crucial for improving model inference in real-world applications. However, traditional TTA methods typically follow a fixed pattern to address the dynamic data patterns (low-diversity or high-diversity patterns) often leading to performance degradatio… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  7. arXiv:2408.02718  [pdf, other

    cs.CV

    MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

    Authors: Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluatio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Project Page: https://mmiu-bench.github.io/

  8. Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

    Authors: Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information, thereby understanding and creating content of the physical world coherently like human-beings. Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality. Despite of the… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  9. arXiv:2407.21705  [pdf, other

    cs.CV

    Tora: Trajectory-oriented Diffusion Transformer for Video Generation

    Authors: Zhenghao Zhang, Junchao Liao, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang

    Abstract: Recent advancements in Diffusion Transformer (DiT) have demonstrated remarkable proficiency in producing high-quality video content. Nonetheless, the potential of transformer-based diffusion models for effectively generating videos with controllable motion remains an area of limited exploration. This paper introduces Tora, the first trajectory-oriented DiT framework that concurrently integrates te… ▽ More

    Submitted 27 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  10. arXiv:2407.21333  [pdf, other

    cs.CV

    Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM

    Authors: Can Wang, Hongliang Zhong, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

    Abstract: Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the feedback-driven refinement essential for interactive user engagement. We introduce Chat2Layout, a novel interactive furniture layout generation system… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Main paper with supplemental materials

  11. arXiv:2407.07871  [pdf, other

    cs.IR

    Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation

    Authors: Wentao Xiao, Yueyang Zhan, Rui Xi, Mengshu Hou, Jianming Liao

    Abstract: The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small Worl… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.07410  [pdf

    cs.CV cs.GR cs.LG

    Mutual Information calculation on different appearances

    Authors: Jiecheng Liao, Junhao Lu, Jeff Ji, Jiacheng He

    Abstract: Mutual information has many applications in image alignment and matching, mainly due to its ability to measure the statistical dependence between two images, even if the two images are from different modalities (e.g., CT and MRI). It considers not only the pixel intensities of the images but also the spatial relationships between the pixels. In this project, we apply the mutual information formula… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: demo for the work: elucidator.cn/demo-mi/

  13. arXiv:2407.07111  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Diffusion Model-Based Video Editing: A Survey

    Authors: Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Dacheng Tao

    Abstract: The rapid development of diffusion models (DMs) has significantly advanced image and video applications, making "what you want is what you see" a reality. Among these, video editing has gained substantial attention and seen a swift rise in research activity, necessitating a comprehensive and systematic review of the existing literature. This paper reviews diffusion model-based video editing techni… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 23 pages, 12 figures, a project related to this paper can be found at https://github.com/wenhao728/awesome-diffusion-v2v

  14. arXiv:2407.04923  [pdf, other

    cs.CV cs.CL

    OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

    Authors: Tiancheng Zhao, Qianqian Zhang, Kyusong Lee, Peng Liu, Lu Zhang, Chunxin Fang, Jiajia Liao, Kelei Jiang, Yibo Ma, Ruochen Xu

    Abstract: We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision encoding process to effectively handle images of various resolutions, capturing fine details across a range of image qualities. OmChat utilizes an ac… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 14 pages

  15. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

  16. arXiv:2406.18832  [pdf, other

    cs.CL

    OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

    Authors: Jinguang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, Jingyu Wang, Zirui Zhuang, Tingting Yang, Jianxin Liao

    Abstract: Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ)… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  17. arXiv:2406.06626  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications

    Authors: Zhou Zhou, Guohang He, Zheng Zhang, Luziwei Leng, Qinghai Guo, Jianxing Liao, Xuan Song, Ran Cheng

    Abstract: Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2405.18132  [pdf, other

    cs.CV

    EG4D: Explicit Generation of 4D Object without Score Distillation

    Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

    Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  19. RealityEffects: Augmenting 3D Volumetric Videos with Object-Centric Annotation and Dynamic Visual Effects

    Authors: Jian Liao, Kevin Van, Zhijie Xia, Ryo Suzuki

    Abstract: This paper introduces RealityEffects, a desktop authoring interface designed for editing and augmenting 3D volumetric videos with object-centric annotations and visual effects. RealityEffects enhances volumetric capture by introducing a novel method for augmenting captured physical motion with embedded, responsive visual effects, referred to as object-centric augmentation. In RealityEffects, users… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: DIS 2024

  20. arXiv:2405.16414  [pdf, other

    cs.CV

    PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model

    Authors: Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao, Shuhang Gu, Changbo Wang, Chenhui Li

    Abstract: Image steganography can hide information in a host image and obtain a stego image that is perceptually indistinguishable from the original one. This technique has tremendous potential in scenarios like copyright protection, information retrospection, etc. Some previous studies have proposed to enhance the robustness of the methods against image disturbances to increase their applicability. However… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 content pages

  21. arXiv:2405.10317  [pdf, other

    cs.CV cs.GR

    Text-to-Vector Generation with Neural Path Representation

    Authors: Peiying Zhang, Nanxuan Zhao, Jing Liao

    Abstract: Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making it a time-consuming task. Recent advancements in text-to-vector (T2V) generation have aimed to make this process more accessible. However, existing T2V methods… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGGRAPH 2024. Project page: https://intchous.github.io/T2V-NPR

  22. arXiv:2405.10316  [pdf, other

    cs.CV cs.GR

    Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

    Authors: Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

    Abstract: Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual IC… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://analogist2d.github.io

  23. arXiv:2405.04503  [pdf, other

    cs.RO

    Physics-data hybrid dynamic model of a multi-axis manipulator for sensorless dexterous manipulation and high-performance motion planning

    Authors: Wu-Te Yang, Jyun-Ming Liao, Pei-Chun Lin

    Abstract: We report on the development of an implementable physics-data hybrid dynamic model for an articulated manipulator to plan and operate in various scenarios. Meanwhile, the physics-based and data-driven dynamic models are studied in this research to select the best model for planning. The physics-based model is constructed using the Lagrangian method, and the loss terms include inertia loss, viscous… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 26 pages, 16 figures

  24. arXiv:2405.00515  [pdf, other

    cs.RO cs.CV

    GAD-Generative Learning for HD Map-Free Autonomous Driving

    Authors: Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li

    Abstract: Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic progra… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  25. arXiv:2405.00250  [pdf, other

    cs.CV cs.RO

    SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations

    Authors: Narayanan Elavathur Ranganatha, Hengyuan Zhang, Shashank Venkatramani, Jing-Yan Liao, Henrik I. Christensen

    Abstract: Vector maps are essential in autonomous driving for tasks like localization and planning, yet their creation and maintenance are notably costly. While recent advances in online vector map generation for autonomous vehicles are promising, current models lack adaptability to different sensor configurations. They tend to overfit to specific sensor poses, leading to decreased performance and higher re… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, Accepted to IV 2024

  26. arXiv:2404.18112  [pdf, other

    cs.CV cs.RO

    Garbage Segmentation and Attribute Analysis by Robotic Dogs

    Authors: Nuo Xu, Jianfeng Liao, Qiwei Meng, Wei Song

    Abstract: Efficient waste management and recycling heavily rely on garbage exploration and identification. In this study, we propose GSA2Seg (Garbage Segmentation and Attribute Analysis), a novel visual approach that utilizes quadruped robotic dogs as autonomous agents to address waste management and recycling challenges in diverse indoor and outdoor environments. Equipped with advanced visual perception sy… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  27. arXiv:2404.15341  [pdf, other

    eess.SP cs.LG

    Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise

    Authors: Jing-Xiao Liao, Chao He, Jipu Li, Jinwei Sun, Shiping Zhang, Xiaoge Zhang

    Abstract: Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  28. arXiv:2404.14356  [pdf, other

    cs.SE

    Rethinking Legal Compliance Automation: Opportunities with Large Language Models

    Authors: Shabnam Hassani, Mehrdad Sabetzadeh, Daniel Amyot, Jain Liao

    Abstract: As software-intensive systems face growing pressure to comply with laws and regulations, providing automated support for compliance analysis has become paramount. Despite advances in the Requirements Engineering (RE) community on legal compliance analysis, important obstacles remain in developing accurate and generalizable compliance automation solutions. This paper highlights some observed limita… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at the RE@Next! track of RE 2024

  29. arXiv:2404.12888  [pdf, other

    cs.CV cs.GR cs.LG

    Learn2Talk: 3D Talking Face Learns from 2D Talking Face

    Authors: Yixiang Zhuang, Baoping Cheng, Yao Cheng, Yuntao Jin, Renshuai Liu, Chengyang Li, Xuan Cheng, Jing Liao, Juncong Lin

    Abstract: Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we prop… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  30. arXiv:2404.12347  [pdf, other

    cs.CV cs.GR

    AniClipart: Clipart Animation with Text-to-Video Priors

    Authors: Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao

    Abstract: Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nev… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project Page: https://aniclipart.github.io/

  31. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  32. arXiv:2404.04937  [pdf, other

    cs.CR cs.GT

    Optimizing Information Propagation for Blockchain-empowered Mobile AIGC: A Graph Attention Network Approach

    Authors: Jiana Liao, Jinbo Wen, Jiawen Kang, Yang Zhang, Jianbo Du, Qihao Li, Weiting Zhang, Dong Yang

    Abstract: Artificial Intelligence-Generated Content (AIGC) is a rapidly evolving field that utilizes advanced AI algorithms to generate content. Through integration with mobile edge networks, mobile AIGC networks have gained significant attention, which can provide real-time customized and personalized AIGC services and products. Since blockchains can facilitate decentralized and transparent data management… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.13237

  33. arXiv:2404.03654  [pdf, other

    cs.CV

    RaFE: Generative Radiance Fields Restoration

    Authors: Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

    Abstract: NeRF (Neural Radiance Fields) has demonstrated tremendous potential in novel view synthesis and 3D reconstruction, but its performance is sensitive to input image quality, which struggles to achieve high-fidelity rendering when provided with low-quality sparse input viewpoints. Previous methods for NeRF restoration are tailored for specific degradation type, ignoring the generality of restoration.… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://zkaiwu.github.io/RaFE

  34. arXiv:2403.15878  [pdf, other

    cs.CV

    Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance

    Authors: Jia-Wei Liao, Winston Wang, Tzu-Sian Wang, Li-Xuan Peng, Cheng-Fu Chou, Jun-Cheng Chen

    Abstract: QR codes, prevalent in daily applications, lack visual appeal due to their conventional black-and-white design. Integrating aesthetics while maintaining scannability poses a challenge. In this paper, we introduce a novel diffusion-model-based aesthetic QR code generation pipeline, utilizing pre-trained ControlNet and guided iterative refinement via a novel classifier guidance (SRG) based on the pr… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  35. arXiv:2403.13237  [pdf, ps, other

    cs.CR math.OC

    Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

    Authors: Jiana Liao, Jinbo Wen, Jiawen Kang, Changyan Yi, Yang Zhang, Yutao Jiao, Dusit Niyato, Dong In Kim, Shengli Xie

    Abstract: Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and… ▽ More

    Submitted 8 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  36. arXiv:2402.18998  [pdf, other

    cs.CV

    COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection

    Authors: Jingyi Liao, Xun Xu, Manh Cuong Nguyen, Adam Goodge, Chuan Sheng Foo

    Abstract: Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: IEEE Transactions on Image Processing

  37. arXiv:2402.17593  [pdf, other

    cs.RO

    Autonomous Shuttle Operation for Vulnerable Populations: Lessons and Experiences

    Authors: Ren Zhong, Zhaofeng Tian, Jinghui Liao, Weisong Shi

    Abstract: The increasing shortage of drivers poses a significant threat to vulnerable populations, particularly seniors and disabled individuals who heavily depend on public transportation for accessing healthcare services and social events. Autonomous Vehicles (AVs) emerge as a promising alternative, offering potential improvements in accessibility and independence for these groups. However, current design… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  38. Joint Resource Allocation and Trajectory Design for Resilient Multi-UAV Communication Networks

    Authors: Linghui Ge, Xiao Liang, Hua Zhang, Peihao Dong, Jianxin Liao, Jingyu Wang

    Abstract: In contrast to terrestrial wireless networks, dynamic Unmanned Aerial Vehicle (UAV) networks are susceptible to unexpected link failures arising from UAV breakdowns or the depletion of its batteries. Drastic user rate fluctuations and sum rate drops can occur due to the unexpected UAV link failures. Previous research has focused primarily on re-establishing these links to maintain service continui… ▽ More

    Submitted 20 January, 2024; originally announced February 2024.

    Journal ref: IEEE Wireless Communications Letters, 2024

  39. arXiv:2402.16379  [pdf, other

    cs.CL cs.AI

    TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement

    Authors: Zhaopeng Feng, Yan Zhang, Hao Li, Bei Wu, Jiayu Liao, Wenqiang Liu, Jun Lang, Yang Feng, Jian Wu, Zuozhu Liu

    Abstract: Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs still contain multiple errors. Importantly, feeding back such error information into the LLMs can lead to self-refinement and result in improved translation performance. Motivated by these insights, we introduce a systematic… ▽ More

    Submitted 21 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Our code and data are available at https://github.com/fzp0424/self_correct_mt

  40. arXiv:2402.06700  [pdf, other

    cs.LG cs.AI

    Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement

    Authors: Muning Wen, Junwei Liao, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen

    Abstract: Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging di… ▽ More

    Submitted 6 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  41. Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

    Authors: Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao

    Abstract: Recent text-to-video diffusion models have achieved impressive progress. In practice, users often desire the ability to control object motion and camera movement independently for customized video creation. However, current methods lack the focus on separately controlling object motion and camera movement in a decoupled manner, which limits the controllability and flexibility of text-to-video mode… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  42. arXiv:2402.03025  [pdf, other

    cs.IR cs.LG

    Understanding and Guiding Weakly Supervised Entity Alignment with Potential Isomorphism Propagation

    Authors: Yuanyi Wang, Wei Tang, Haifeng Sun, Zirui Zhuang, Xiaoyuan Fu, Jingyu Wang, Qi Qi, Jianxin Liao

    Abstract: Weakly Supervised Entity Alignment (EA) is the task of identifying equivalent entities across diverse knowledge graphs (KGs) using only a limited number of seed alignments. Despite substantial advances in aggregation-based weakly supervised EA, the underlying mechanisms in this setting remain unexplored. In this paper, we present a propagation perspective to analyze weakly supervised EA and explai… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  43. arXiv:2401.17859  [pdf, other

    cs.IR

    Towards Semantic Consistency: Dirichlet Energy Driven Robust Multi-Modal Entity Alignment

    Authors: Yuanyi Wang, Haifeng Sun, Jiabo Wang, Jingyu Wang, Wei Tang, Qi Qi, Shaoling Sun, Jianxin Liao

    Abstract: In Multi-Modal Knowledge Graphs (MMKGs), Multi-Modal Entity Alignment (MMEA) is crucial for identifying identical entities across diverse modal attributes. However, semantic inconsistency, mainly due to missing modal attributes, poses a significant challenge. Traditional approaches rely on attribute interpolation, but this often introduces modality noise, distorting the original semantics. Moreove… ▽ More

    Submitted 19 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.16210 by other authors

  44. arXiv:2401.17807  [pdf, other

    cs.CV cs.GR

    Advances in 3D Generation: A Survey

    Authors: Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, Ying Shan

    Abstract: Generating 3D models lies at the core of computer graphics and has been the focus of decades of research. With the emergence of advanced neural representations and generative models, the field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models. The rapid growth of this field makes it difficult to stay abreast of all recent devel… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 33 pages, 12 figures

  45. arXiv:2401.12798  [pdf, other

    cs.IR cs.CL

    Gradient Flow of Energy: A General and Efficient Approach for Entity Alignment Decoding

    Authors: Yuanyi Wang, Haifeng Sun, Jingyu Wang, Qi Qi, Shaoling Sun, Jianxin Liao

    Abstract: Entity alignment (EA), a pivotal process in integrating multi-source Knowledge Graphs (KGs), seeks to identify equivalent entity pairs across these graphs. Most existing approaches regard EA as a graph representation learning task, concentrating on enhancing graph encoders. However, the decoding process in EA - essential for effective operation and alignment accuracy - has received limited attenti… ▽ More

    Submitted 17 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  46. arXiv:2401.02143  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Graph Neural Networks for Tabular Data Learning: A Survey with Taxonomy and Directions

    Authors: Cheng-Te Li, Yu-Che Tsai, Chih-Yao Chen, Jay Chiehen Liao

    Abstract: In this survey, we dive into Tabular Data Learning (TDL) using Graph Neural Networks (GNNs), a domain where deep learning-based approaches have increasingly shown superior performance in both classification and regression tasks compared to traditional methods. The survey highlights a critical gap in deep neural TDL methods: the underrepresentation of latent correlations among data instances and fe… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Under review, ongoing work, Github page: https://github.com/Roytsai27/awesome-GNN4TDL

  47. arXiv:2401.01491   

    cs.CE

    A Hybrid Neural Network Model For Predicting The Nitrate Concentration In The Recirculating Aquaculture System

    Authors: Xiangyu Fan, Jiaxin Lia, Yingzhe Wang, Yingsha Qu, Hao Li, Keming Qu, Zhengguo Cui

    Abstract: This study was groundbreaking in its application of neural network models for nitrate management in the Recirculating Aquaculture System (RAS). A hybrid neural network model was proposed, which accurately predicted daily nitrate concentration and its trends using six water quality parameters. We conducted a 105-day aquaculture experiment, during which we collected 450 samples from five sets of RAS… ▽ More

    Submitted 15 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: The content of this paper needs to be further filled and improved

  48. arXiv:2312.14389  [pdf, other

    cs.CV

    StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors

    Authors: Wanchao Su, Can Wang, Chen Liu, Hangzhou Han, Hongbo Fu, Jing Liao

    Abstract: Creating fine-retouched portrait images is tedious and time-consuming even for professional artists. There exist automatic retouching methods, but they either suffer from over-smoothing artifacts or lack generalization ability. To address such issues, we present StyleRetoucher, a novel automatic portrait image retouching framework, leveraging StyleGAN's generation and generalization ability to imp… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 15 figures

  49. arXiv:2312.07539  [pdf, other

    cs.CV

    HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

    Authors: Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

    Abstract: This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an imag… ▽ More

    Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Amazing results are shown in https://kumapowerliu.github.io/HeadArtist. Accepted by SIGGRAPH 2024

  50. arXiv:2312.06663  [pdf, other

    cs.CV cs.GR

    CAD: Photorealistic 3D Generation via Adversarial Distillation

    Authors: Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, Jing Liao, Leonidas Guibas

    Abstract: The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: http://raywzy.com/CAD/