Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 104 results for author: Lai, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03192  [pdf, other

    cs.CV

    PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning

    Authors: Bowen Tian, Songning Lai, Lujundong Li, Zhihao Shuai, Runwei Guan, Tian Wu, Yutao Yue

    Abstract: Fine-grained image classification has witnessed significant advancements with the advent of deep learning and computer vision technologies. However, the scarcity of detailed annotations remains a major challenge, especially in scenarios where obtaining high-quality labeled data is costly or time-consuming. To address this limitation, we introduce Precision-Enhanced Pseudo-Labeling(PEPL) approach s… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Under review

  2. arXiv:2409.01867  [pdf, other

    cs.HC

    ASD-Chat: An Innovative Dialogue Intervention System for Children with Autism based on LLM and VB-MAPP

    Authors: Chengyun Deng, Shuzhong Lai, Chi Zhou, Mengyi Bao, Jingwen Yan, Haifeng Li, Lin Yao, Yueming Wang

    Abstract: Early diagnosis and professional intervention can help children with autism spectrum disorder (ASD) return to normal life. However, the scarcity and imbalance of professional medical resources currently prevent many autistic children from receiving the necessary diagnosis and intervention. Therefore, numerous paradigms have been proposed that use computer technology to assist or independently cond… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  3. arXiv:2409.01256  [pdf, other

    cs.CV cs.AI

    Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Songning Lai, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, Chengzhong Xu

    Abstract: The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2408.17443  [pdf, other

    cs.CV cs.AI cs.CL

    Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding

    Authors: Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai

    Abstract: While existing research often treats long-form videos as extended short videos, we propose a novel approach that more accurately reflects human cognition. This paper introduces BREASE: BRidging Episodes And SEmantics for Long-Form Video Understanding, a model that simulates episodic memory accumulation to capture action sequences and reinforces them with semantic knowledge dispersed throughout the… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted to the EVAL-FoMo Workshop at ECCV'24. Project page: https://joslefaure.github.io/assets/html/hermes.html

  5. arXiv:2408.15996  [pdf, other

    cs.CV cs.AI

    Spatio-Temporal Context Prompting for Zero-Shot Action Detection

    Authors: Wei-Jhe Huang, Min-Hung Chen, Shang-Hong Lai

    Abstract: Spatio-temporal action detection encompasses the tasks of localizing and classifying individual actions within a video. Recent works aim to enhance this process by incorporating interaction modeling, which captures the relationship between people and their surrounding context. However, these approaches have primarily focused on fully-supervised learning, and the current limitation lies in the lack… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: Project page: https://webber2933.github.io/ST-CLIP-project-page

  6. arXiv:2408.15628  [pdf, other

    cs.CV

    CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection

    Authors: Yu-Hsuan Hsieh, Shang-Hong Lai

    Abstract: To improve logical anomaly detection, some previous works have integrated segmentation techniques with conventional anomaly detection methods. Although these methods are effective, they frequently lead to unsatisfactory segmentation results and require manual annotations. To address these drawbacks, we develop an unsupervised component segmentation technique that leverages foundation models to aut… ▽ More

    Submitted 1 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  7. arXiv:2408.07889  [pdf, other

    cs.CV

    MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

    Authors: Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

    Abstract: Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal informatio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  8. arXiv:2408.02073  [pdf

    cs.CV cs.AI cs.MM

    Case-based reasoning approach for diagnostic screening of children with developmental delays

    Authors: Zichen Song, Jiakang Li, Songning Lai, Sitan Huang

    Abstract: According to the World Health Organization, the population of children with developmental delays constitutes approximately 6% to 9% of the total population. Based on the number of newborns in Huaibei, Anhui Province, China, in 2023 (94,420), it is estimated that there are about 7,500 cases (suspected cases of developmental delays) of suspicious cases annually. Early identification and appropriate… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

  9. arXiv:2408.00794  [pdf

    cs.NE cs.AI cs.CV

    CCSRP: Robust Pruning of Spiking Neural Networks through Cooperative Coevolution

    Authors: Zichen Song, Jiakang Li, Songning Lai, Sitan Huang

    Abstract: Spiking neural networks (SNNs) have shown promise in various dynamic visual tasks, yet those ready for practical deployment often lack the compactness and robustness essential in resource-limited and safety-critical settings. Prior research has predominantly concentrated on enhancing the compactness or robustness of artificial neural networks through strategies like network pruning and adversarial… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

  10. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing has been increasingly adopted to correct the false or outdated knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation wi… ▽ More

    Submitted 16 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  11. arXiv:2407.10981  [pdf, other

    cs.NI cs.CR

    Systematic Literature Review of AI-enabled Spectrum Management in 6G and Future Networks

    Authors: Bushra Sabir, Shuiqiao Yang, David Nguyen, Nan Wu, Alsharif Abuadbba, Hajime Suzuki, Shangqi Lai, Wei Ni, Ding Ming, Surya Nepal

    Abstract: Artificial Intelligence (AI) has advanced significantly in various domains like healthcare, finance, and cybersecurity, with successes such as DeepMind's medical imaging and Tesla's autonomous vehicles. As telecommunications transition from 5G to 6G, integrating AI is crucial for complex demands like data processing, network optimization, and security. Despite ongoing research, there's a gap in co… ▽ More

    Submitted 12 June, 2024; originally announced July 2024.

    Comments: 35 pages

  12. arXiv:2406.19101  [pdf, other

    cs.CV

    DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

    Authors: Jiaxin Zhang, Wentao Yang, Songxuan Lai, Zecheng Xie, Lianwen Jin

    Abstract: Current multimodal large language models (MLLMs) face significant challenges in visual document understanding (VDU) tasks due to the high resolution, dense text, and complex layouts typical of document images. These characteristics demand a high level of detail perception ability from MLLMs. While increasing input resolution improves detail perception capability, it also leads to longer sequences… ▽ More

    Submitted 2 September, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  13. Detecting Frames in News Headlines and Lead Images in U.S. Gun Violence Coverage

    Authors: Isidora Chara Tourni, Lei Guo, Hengchang Hu, Edward Halim, Prakash Ishwar, Taufiq Daryanto, Mona Jalal, Boqi Chen, Margrit Betke, Fabian Zhafransyah, Sha Lai, Derry Tanti Wijaya

    Abstract: News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called \say{frames} in communication research. We study, for the first time, the value of combining lead i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: published at Findings of the Association for Computational Linguistics: EMNLP 2021

  14. arXiv:2406.16529  [pdf, other

    cs.CL

    Towards Better Graph-based Cross-document Relation Extraction via Non-bridge Entity Enhancement and Prediction Debiasing

    Authors: Hao Yue, Shaopeng Lai, Chengyi Yang, Liang Zhang, Junfeng Yao, Jinsong Su

    Abstract: Cross-document Relation Extraction aims to predict the relation between target entities located in different documents. In this regard, the dominant models commonly retain useful information for relation prediction via bridge entities, which allows the model to elaborately capture the intrinsic interdependence between target entities. However, these studies ignore the non-bridge entities, each of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  15. arXiv:2406.12330  [pdf, other

    cs.CR cs.DC cs.ET cs.LG cs.NI

    Security and Privacy of 6G Federated Learning-enabled Dynamic Spectrum Sharing

    Authors: Viet Vo, Thusitha Dayaratne, Blake Haydon, Xingliang Yuan, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Carsten Rudolph

    Abstract: Spectrum sharing is increasingly vital in 6G wireless communication, facilitating dynamic access to unused spectrum holes. Recently, there has been a significant shift towards employing machine learning (ML) techniques for sensing spectrum holes. In this context, federated learning (FL)-enabled spectrum sensing technology has garnered wide attention, allowing for the construction of an aggregated… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures. The paper is submitted to IEEE Networks for review

  16. arXiv:2406.12299  [pdf, other

    cs.CR cs.NI eess.SY

    Exploiting and Securing ML Solutions in Near-RT RIC: A Perspective of an xApp

    Authors: Thusitha Dayaratne, Viet Vo, Shangqi Lai, Sharif Abuadbba, Blake Haydon, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph

    Abstract: Open Radio Access Networks (O-RAN) are emerging as a disruptive technology, revolutionising traditional mobile network architecture and deployments in the current 5G and the upcoming 6G era. Disaggregation of network architecture, inherent support for AI/ML workflows, cloud-native principles, scalability, and interoperability make O-RAN attractive to network providers for beyond-5G and 6G deployme… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  17. arXiv:2406.05036  [pdf, other

    cs.LG cs.AI

    TimeSieve: Extracting Temporal Dynamics through Information Bottlenecks

    Authors: Ninghui Feng, Songning Lai, Jiayu Yang, Fobao Zhou, Zhenxiao Yin, Hang Zhao

    Abstract: Time series forecasting has become an increasingly popular research area due to its critical applications in various real-world domains such as traffic management, weather prediction, and financial analysis. Despite significant advancements, existing models face notable challenges, including the necessity of manual hyperparameter tuning for different datasets, and difficulty in effectively disting… ▽ More

    Submitted 21 August, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2405.19647  [pdf, other

    cs.LG

    FTS: A Framework to Find a Faithful TimeSieve

    Authors: Songning Lai, Ninghui Feng, Jiechao Gao, Hao Wang, Haochen Sui, Xin Zou, Jiayu Yang, Wenshuo Chen, Hang Zhao, Xuming Hu, Yutao Yue

    Abstract: The field of time series forecasting has garnered significant attention in recent years, prompting the development of advanced models like TimeSieve, which demonstrates impressive performance. However, an analysis reveals certain unfaithfulness issues, including high sensitivity to random seeds, input and layer noise perturbations and parametric perturbations. Recognizing these challenges, we emba… ▽ More

    Submitted 10 August, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Journal ref: IJCAI2024 workshop

  19. arXiv:2405.04009  [pdf, other

    cs.CV cs.AI

    Structured Click Control in Transformer-based Interactive Segmentation

    Authors: Long Xu, Yongquan Chen, Rui Huang, Feng Wu, Shiwu Lai

    Abstract: Click-point-based interactive segmentation has received widespread attention due to its efficiency. However, it's hard for existing algorithms to obtain precise and robust responses after multiple clicks. In this case, the segmentation results tend to have little change or are even worse than before. To improve the robustness of the response, we propose a structured click intent model based on gra… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, submitted to NeurIPS 2024

  20. arXiv:2404.14135  [pdf, other

    cs.CV

    Text in the Dark: Extremely Low-Light Text Image Enhancement

    Authors: Che-Tsung Lin, Chun Chet Ng, Zhi Qin Tan, Wan Jun Nah, Xinyu Wang, Jie Long Kew, Pohao Hsu, Shang Hong Lai, Chee Seng Chan, Christopher Zach

    Abstract: Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally to this work

  21. arXiv:2403.12999  [pdf

    cs.RO cs.AI cs.CL cs.LG

    Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control

    Authors: On Tai Wu, Frodo Kin Sun Chan, Zunhao Zhang, Yan Nei Law, Benny Drescher, Edmond Shiao Bun Lai

    Abstract: Few-shot prompting and step-by-step reasoning have enhanced the capabilities of Large Language Models (LLMs) in tackling complex tasks including code generation. In this paper, we introduce a prompt selection and augmentation algorithm aimed at improving mathematical reasoning and robot arm operations. Our approach incorporates a multi-stage example augmentation scheme combined with an example sel… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 17 pages, 4 figures

  22. arXiv:2402.12590  [pdf, other

    cs.CL cs.CY

    Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation

    Authors: Shiyang Lai, Yujin Potter, Junsol Kim, Richard Zhuang, Dawn Song, James Evans

    Abstract: Large language model behavior is shaped by the language of those with whom they interact. This capacity and their increasing prevalence online portend that they will intentionally or unintentionally "program" one another and form emergent AI subjectivities, relationships, and collectives. Here, we call upon the research community to investigate these "societies" of interacting artificial intellige… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  23. arXiv:2402.04557  [pdf

    physics.chem-ph cs.LG

    An Artificial Intelligence (AI) workflow for catalyst design and optimization

    Authors: Nung Siong Lai, Yi Shen Tew, Xialin Zhong, Jun Yin, Jiali Li, Binhang Yan, Xiaonan Wang

    Abstract: In the pursuit of novel catalyst development to address pressing environmental concerns and energy demand, conventional design and optimization methods often fall short due to the complexity and vastness of the catalyst parameter space. The advent of Machine Learning (ML) has ushered in a new era in the field of catalyst optimization, offering potential solutions to the shortcomings of traditional… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 31 pages, 7 figures

    Journal ref: Ind. Eng. Chem. Res. 2023, 62, 43, 17835-17848

  24. arXiv:2401.04403  [pdf, other

    cs.CV

    MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation

    Authors: Long Xu, Shanghong Li, Yongquan Chen, Jun Luo, Shiwu Lai

    Abstract: Interactive segmentation has gained significant attention for its application in human-computer interaction and data annotation. To address the target scale variation issue in interactive segmentation, a novel multi-scale token adaptation algorithm is proposed. By performing top-k operations across multi-scale tokens, the computational complexity is greatly simplified while ensuring performance. T… ▽ More

    Submitted 2 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: 11 pages, 10 figures

  25. arXiv:2312.16044  [pdf, other

    cs.AI

    LLMLight: Large Language Models as Traffic Signal Control Agents

    Authors: Siqi Lai, Zhao Xu, Weijia Zhang, Hao Liu, Hui Xiong

    Abstract: Traffic Signal Control (TSC) is a crucial component in urban traffic management, aiming to optimize road network efficiency and reduce congestion. Traditional methods in TSC, primarily based on transportation engineering and reinforcement learning (RL), often exhibit limitations in generalization across varied traffic scenarios and lack interpretability. This paper presents LLMLight, a novel frame… ▽ More

    Submitted 5 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  26. arXiv:2312.09480  [pdf, other

    cs.CV

    TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks

    Authors: Ho-Weng Lee, Shang-Hong Lai

    Abstract: In recent years, the focus on anomaly detection and localization in industrial inspection tasks has intensified. While existing studies have demonstrated impressive outcomes, they often rely heavily on extensive training datasets or robust features extracted from pre-trained models trained on diverse datasets like ImageNet. In this work, we propose a novel framework leveraging the visual-linguisti… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  27. arXiv:2310.12846  [pdf, other

    math.NA cs.AI

    Physical Information Neural Networks for Solving High-index Differential-algebraic Equation Systems Based on Radau Methods

    Authors: Jiasheng Chen, Juan Tang, Ming Yan, Shuai Lai, Kun Liang, Jianguang Lu, Wenqiang Yang

    Abstract: As is well known, differential algebraic equations (DAEs), which are able to describe dynamic changes and underlying constraints, have been widely applied in engineering fields such as fluid dynamics, multi-body dynamics, mechanical systems and control theory. In practical physical modeling within these domains, the systems often generate high-index DAEs. Classical implicit numerical methods typic… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  28. arXiv:2310.05056  [pdf, other

    cs.CV

    Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

    Authors: Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang

    Abstract: Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though… ▽ More

    Submitted 11 December, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  29. arXiv:2309.11798  [pdf, other

    cs.SI cs.LG

    A Comprehensive Review of Community Detection in Graphs

    Authors: Jiakang Li, Songning Lai, Zhihao Shuai, Yuan Tan, Yifan Jia, Mianyang Yu, Zichen Song, Xiaokang Peng, Ziyang Xu, Yongxin Ni, Haifeng Qiu, Jiayu Yang, Yutong Liu, Yonggang Lu

    Abstract: The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been… ▽ More

    Submitted 12 July, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  30. arXiv:2309.10641  [pdf, other

    cs.CV

    KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning

    Authors: Jia Luo Peng, Keng Wei Chang, Shang-Hong Lai

    Abstract: Kinship verification is an emerging task in computer vision with multiple potential applications. However, there's no large enough kinship dataset to train a representative and robust model, which is a limitation for achieving better performance. Moreover, face verification is known to exhibit bias, which has not been dealt with by previous kinship verification works and sometimes even results in… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted by BMVC 2023

  31. arXiv:2308.13229  [pdf, other

    cs.CV

    ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking

    Authors: Cheng-Che Cheng, Min-Xuan Qiu, Chen-Kuo Chiang, Shang-Hong Lai

    Abstract: Multi-Camera Multi-Object Tracking (MC-MOT) utilizes information from multiple views to better handle problems with occlusion and crowded scenes. Recently, the use of graph-based approaches to solve tracking problems has become very popular. However, many current graph-based methods do not effectively utilize information regarding spatial and temporal consistency. Instead, they rely on single-came… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  32. arXiv:2306.17020   

    cs.CL cs.AI

    Classifying Crime Types using Judgment Documents from Social Media

    Authors: Haoxuan Xu, Zeyu He, Mengfan Shen, Songning Lai, Ziqiang Han, Yifan Peng

    Abstract: The task of determining crime types based on criminal behavior facts has become a very important and meaningful task in social science. But the problem facing the field now is that the data samples themselves are unevenly distributed, due to the nature of the crime itself. At the same time, data sets in the judicial field are less publicly available, and it is not practical to produce large data s… ▽ More

    Submitted 21 October, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: The paper has no errors; it just needs to be supplemented to become a new article

  33. A Preference-aware Meta-optimization Framework for Personalized Vehicle Energy Consumption Estimation

    Authors: Siqi Lai, Weijia Zhang, Hao Liu

    Abstract: Vehicle Energy Consumption (VEC) estimation aims to predict the total energy required for a given trip before it starts, which is of great importance to trip planning and transportation sustainability. Existing approaches mainly focus on extracting statistically significant factors from typical trips to improve the VEC estimation. However, the energy consumption of each vehicle may diverge widely… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  34. arXiv:2306.14313  [pdf, other

    cs.CV cs.LG

    A Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing

    Authors: Chih-Jung Chang, Yaw-Chern Lee, Shih-Hsuan Yao, Min-Hung Chen, Chien-Yi Wang, Shang-Hong Lai, Trista Pei-Chun Chen

    Abstract: Face anti-spoofing (FAS) is indispensable for a face recognition system. Many texture-driven countermeasures were developed against presentation attacks (PAs), but the performance against unseen domains or unseen spoofing types is still unsatisfactory. Instead of exhaustively collecting all the spoofing variations and making binary decisions of live/spoof, we offer a new perspective on the FAS tas… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 2023 CVPR Biometrics Workshop, Best Paper Award

  35. arXiv:2306.10351  [pdf, other

    cs.LG cs.AI cs.CR

    Bkd-FedGNN: A Benchmark for Classification Backdoor Attacks on Federated Graph Neural Network

    Authors: Fan Liu, Siqi Lai, Yansong Ning, Hao Liu

    Abstract: Federated Graph Neural Network (FedGNN) has recently emerged as a rapidly growing research topic, as it integrates the strengths of graph neural networks and federated learning to enable advanced machine learning applications without direct access to sensitive data. Despite its advantages, the distributed nature of FedGNN introduces additional vulnerabilities, particularly backdoor attacks stemmin… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  36. arXiv:2306.05310  [pdf, other

    cs.LG

    A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: While Deep Reinforcement Learning has been widely researched in medical imaging, the training and deployment of these models usually require powerful GPUs. Since imaging environments evolve rapidly and can be generated by edge devices, the algorithm is required to continually learn and adapt to changing environments, and adjust to low-compute devices. To this end, we developed three image coreset… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  37. arXiv:2306.04032  [pdf, other

    cs.CV cs.LG eess.IV

    BokehOrNot: Transforming Bokeh Effect with Image Transformer and Lens Metadata Embedding

    Authors: Zhihao Yang, Wenyi Lian, Siyuan Lai

    Abstract: Bokeh effect is an optical phenomenon that offers a pleasant visual experience, typically generated by high-end cameras with wide aperture lenses. The task of bokeh effect transformation aims to produce a desired effect in one set of lenses and apertures based on another combination. Current models are limited in their ability to render a specific set of bokeh effects, primarily transformations fr… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  38. arXiv:2306.00188  [pdf, other

    cs.LG cs.CV eess.IV

    Multi-environment lifelong deep reinforcement learning for medical imaging

    Authors: Guangyao Zheng, Shuhao Lai, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh

    Abstract: Deep reinforcement learning(DRL) is increasingly being explored in medical imaging. However, the environments for medical imaging tasks are constantly evolving in terms of imaging orientations, imaging sequences, and pathologies. To that end, we developed a Lifelong DRL framework, SERIL to continually learn new tasks in changing imaging environments without catastrophic forgetting. SERIL was devel… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  39. arXiv:2305.20055  [pdf, other

    cs.CV

    Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism

    Authors: Haoxuan Xu, Songning Lai, Xianyang Li, Yang Yang

    Abstract: Car detection, particularly through camera vision, has become a major focus in the field of computer vision and has gained widespread adoption. While current car detection systems are capable of good detection, reliable detection can still be challenging due to factors such as proximity between the car, light intensity, and environmental visibility. To address these issues, we propose cross-domain… ▽ More

    Submitted 29 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: It needs to be returned for major modifications

  40. arXiv:2305.09221  [pdf, other

    cs.CR

    A Multi-Client Searchable Encryption Scheme for IoT Environment

    Authors: Nazatul H. Sultan, Shabnam Kasra-Kermanshahi, Yen Tran, Shangqi Lai, Vijay Varadharajan, Surya Nepal, Xun Yi

    Abstract: The proliferation of connected devices through Internet connectivity presents both opportunities for smart applications and risks to security and privacy. It is vital to proactively address these concerns to fully leverage the potential of the Internet of Things. IoT services where one data owner serves multiple clients, like smart city transportation, smart building management and healthcare can… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 22 pages, 5 figures, this version was submitted to ESORICS 2023

  41. arXiv:2305.08473  [pdf, other

    cs.CL cs.CV

    Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

    Authors: Songning Lai, Jiakang Li, Guinan Guo, Xifeng Hu, Yulong Li, Yuan Tan, Zichen Song, Yutong Liu, Zhaoxia Ren, Chun Wan, Danmin Miao, Zhi Liu

    Abstract: Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is difficult with uniform multimodal labels and a raw feature fusion approach. In this work, we propose a deep modal shared information learning module based on the… ▽ More

    Submitted 19 March, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Journal ref: International Joint Conference on Neural Networks (IJCNN) 2024

  42. arXiv:2305.07611  [pdf, other

    cs.CL cs.CV

    Multimodal Sentiment Analysis: A Survey

    Authors: Songning Lai, Xifeng Hu, Haoxuan Xu, Zhaoxia Ren, Zhi Liu

    Abstract: Multimodal sentiment analysis has become an important research area in the field of artificial intelligence. With the latest advances in deep learning, this technology has reached new heights. It has great potential for both application and research, making it a popular research topic. This review provides an overview of the definition, background, and development of multimodal sentiment analysis.… ▽ More

    Submitted 3 July, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: It needs to be returned for major modifications

  43. arXiv:2304.04688  [pdf, other

    cs.CV cs.AI

    Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

    Authors: Wei-Jhe Huang, Jheng-Hsien Yeh, Min-Hung Chen, Gueter Josmy Faure, Shang-Hong Lai

    Abstract: The goal of spatial-temporal action detection is to determine the time and place where each person's action occurs in a video and classify the corresponding action category. Most of the existing methods adopt fully-supervised learning, which requires a large amount of training data, making it very difficult to achieve zero-shot learning. In this paper, we propose to utilize a pre-trained visual-la… ▽ More

    Submitted 20 September, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted by ICCV Workshop 2023 (What is Next in Multimodal Foundation Models?)

  44. arXiv:2304.04546  [pdf, other

    cs.CV

    Kinship Representation Learning with Face Componential Relation

    Authors: Weng-Tai Su, Min-Hung Chen, Chien-Yi Wang, Shang-Hong Lai, Trista Pei-Chun Chen

    Abstract: Kinship recognition aims to determine whether the subjects in two facial images are kin or non-kin, which is an emerging and challenging problem. However, most previous methods focus on heuristic designs without considering the spatial correlation between face images. In this paper, we aim to learn discriminative kinship representations embedded with the relation information between face component… ▽ More

    Submitted 29 September, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 Workshop (Analysis and Modeling of Faces and Gestures)

  45. arXiv:2303.10826  [pdf, other

    cs.CV

    Visual Prompt Multi-Modal Tracking

    Authors: Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu

    Abstract: Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspi… ▽ More

    Submitted 24 March, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  46. arXiv:2212.05786  [pdf, other

    cs.CV cs.AI

    Multi-scale Feature Imitation for Unsupervised Anomaly Localization

    Authors: Chao Hu, Shengxin Lai

    Abstract: The unsupervised anomaly localization task faces the challenge of missing anomaly sample training, detecting multiple types of anomalies, and dealing with the proportion of the area of multiple anomalies. A separate teacher-student feature imitation network structure and a multi-scale processing strategy combining an image and feature pyramid are proposed to solve these problems. A network module… ▽ More

    Submitted 12 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: International Joint Conference on Neural Networks 2023

    Journal ref: International Joint Conference on Neural Networks 2023

  47. arXiv:2212.04611  [pdf

    cs.LG cs.CL stat.ME

    Multidimensional Service Quality Scoring System

    Authors: Shiyang Lai

    Abstract: This supplementary paper aims to introduce the Multidimensional Service Quality Scoring System (MSQs), a review-based method for quantifying host service quality mentioned and employed in the paper Exit and transition: Exploring the survival status of Airbnb listings in a time of professionalization. MSQs is not an end-to-end implementation and is essentially composed of three pipelines, namely Da… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Journal ref: Tourism Management, 95, 104665 (2022)

  48. arXiv:2211.15955  [pdf, other

    cs.CV

    Generalized Face Anti-Spoofing via Multi-Task Learning and One-Side Meta Triplet Loss

    Authors: Chu-Chun Chuang, Chien-Yi Wang, Shang-Hong Lai

    Abstract: With the increasing variations of face presentation attacks, model generalization becomes an essential challenge for a practical face anti-spoofing system. This paper presents a generalized face anti-spoofing framework that consists of three tasks: depth estimation, face parsing, and live/spoof classification. With the pixel-wise supervision from the face parsing and depth estimation tasks, the re… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 2023 IEEE International Conference on Automatic Face and Gesture Recognition (FG)

  49. arXiv:2211.15940  [pdf, other

    cs.CV cs.AI

    PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

    Authors: Zhihao Zhang, Siwen Luo, Junyi Chen, Sijia Lai, Siqu Long, Hyunsuk Chung, Soyeon Caren Han

    Abstract: We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily. The PiggyBack supports the full stack of visual question answering tasks, specifically data processing, model fine-tuning, and result visualisation. We integrate visual-language models, pretrained by HuggingFace, an open-source API platform of deep… ▽ More

    Submitted 30 November, 2022; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted by WSDM 2023

  50. arXiv:2211.15181  [pdf, other

    cs.CV

    MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition

    Authors: Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai

    Abstract: Although significant progress has been made in face recognition, demographic bias still exists in face recognition systems. For instance, it usually happens that the face recognition performance for a certain demographic group is lower than the others. In this paper, we propose MixFairFace framework to improve the fairness in face recognition models. First of all, we argue that the commonly used a… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted in AAAI-23; Code: https://github.com/fuenwang/MixFairFace