Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 56 results for author: Hou, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14868  [pdf, other

    cs.CV

    ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

    Authors: Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

    Abstract: Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive f… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  2. arXiv:2408.01998  [pdf, other

    cs.CV

    What Happens Without Background? Constructing Foreground-Only Data for Fine-Grained Tasks

    Authors: Yuetian Wang, Wenjin Hou, Qinmu Peng, Xinge You

    Abstract: Fine-grained recognition, a pivotal task in visual signal processing, aims to distinguish between similar subclasses based on discriminative information present in samples. However, prevailing methods often erroneously focus on background areas, neglecting the capture of genuinely effective discriminative information from the subject, thus impeding practical application. To facilitate research int… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  3. arXiv:2406.13960  [pdf, other

    cs.CL cs.AI

    Evolving to be Your Soulmate: Personalized Dialogue Agents with Dynamically Adapted Personas

    Authors: Yi Cheng, Wenge Liu, Kaishuai Xu, Wenjun Hou, Yi Ouyang, Chak Tou Leong, Xian Wu, Yefeng Zheng

    Abstract: Previous research on persona-based dialogue agents typically preset the agent's persona before deployment, which remains static thereafter. In this paper, we take a step further and explore a new paradigm called Self-evolving Personalized Dialogue Agents (SPDA), where the agent continuously evolves during the conversation to better align with the user's anticipation by dynamically adapting its per… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Work in progress

  4. arXiv:2406.13934  [pdf, other

    cs.CL cs.AI

    Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment

    Authors: Kaishuai Xu, Yi Cheng, Wenjun Hou, Qiaoyu Tan, Wenjie Li

    Abstract: Medical dialogue systems have attracted significant attention for their potential to act as medical assistants. Enabling these medical systems to emulate clinicians' diagnostic reasoning process has been the long-standing research focus. Previous studies rudimentarily realized the simulation of clinicians' diagnostic process by fine-tuning language models on high-quality dialogue datasets. Nonethe… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 Findings

  5. arXiv:2406.02147  [pdf, other

    cs.CV

    UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

    Authors: Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang

    Abstract: 3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2404.14808  [pdf, other

    cs.CV

    Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

    Authors: Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

    Abstract: Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor gene… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  7. arXiv:2404.07713  [pdf, other

    cs.CV cs.LG

    Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

    Authors: Shiming Chen, Wenjin Hou, Salman Khan, Fahad Shahbaz Khan

    Abstract: Zero-shot learning (ZSL) recognizes the unseen classes by conducting visual-semantic interactions to transfer semantic knowledge from seen classes to unseen ones, supported by semantic information (e.g., attributes). However, existing ZSL methods simply extract visual features using a pre-trained network backbone (i.e., CNN or ViT), which fail to learn matched visual-semantic correspondences for r… ▽ More

    Submitted 22 July, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR'24

  8. arXiv:2404.04617  [pdf, other

    cs.CV

    Empowering Image Recovery_ A Multi-Attention Approach

    Authors: Juan Wen, Yawei Li, Chao Zhang, Weiyan Hou, Radu Timofte, Luc Van Gool

    Abstract: We propose Diverse Restormer (DART), a novel image restoration method that effectively integrates information from various sources (long sequences, local and global regions, feature dimensions, and positional dimensions) to address restoration challenges. While Transformer models have demonstrated excellent performance in image restoration due to their self-attention mechanism, they face limitatio… ▽ More

    Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, 12 tables

    MSC Class: 68T07 (Primary) 168T45 (Secondary) ACM Class: I.4.4

  9. arXiv:2403.00894  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A systematic evaluation of large language models for generating programming code

    Authors: Wenpin Hou, Zhicheng Ji

    Abstract: We systematically evaluated the performance of seven large language models in generating programming code using various prompt strategies, programming languages, and task difficulties. GPT-4 substantially outperforms other large language models, including Gemini Ultra and Claude 2. The coding performance of GPT-4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGe… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  10. arXiv:2402.12844  [pdf, other

    cs.CV cs.CL

    ICON: Improving Inter-Report Consistency of Radiology Report Generation via Lesion-aware Mix-up Augmentation

    Authors: Wenjun Hou, Yi Cheng, Kaishuai Xu, Yan Hu, Wenjie Li, Jiang Liu

    Abstract: Previous research on radiology report generation has made significant progress in terms of increasing the clinical accuracy of generated reports. In this paper, we emphasize another crucial quality that it should possess, i.e., inter-report consistency, which refers to the capability of generating consistent reports for semantically equivalent radiographs. This quality is even of greater significa… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  11. arXiv:2401.06541  [pdf, other

    cs.CL cs.AI

    Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis

    Authors: Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

    Abstract: Medical dialogue systems have attracted growing research attention as they have the potential to provide rapid diagnoses, treatment plans, and health consultations. In medical dialogues, a proper diagnosis is crucial as it establishes the foundation for future consultations. Clinicians typically employ both intuitive and analytic reasoning to formulate a differential diagnosis. This reasoning proc… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Work in progress

  12. arXiv:2312.11111  [pdf, other

    cs.AI cs.CL cs.HC

    The Good, The Bad, and Why: Unveiling Emotions in Generative AI

    Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

    Abstract: Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifica… ▽ More

    Submitted 7 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: International Conference on Machine Learning (ICML) 2024; an extension to EmotionPrompt (arXiv:2307.11760)

  13. arXiv:2311.16832  [pdf, other

    cs.CL cs.AI

    CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

    Authors: Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can custom… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Work in progress

  14. arXiv:2311.05050  [pdf, other

    cs.LG quant-ph

    Quantum Generative Modeling of Sequential Data with Trainable Token Embedding

    Authors: Wanda Hou, Miao Li, Yi-Zhuang You

    Abstract: Generative models are a class of machine learning models that aim to learn the underlying probability distribution of data. Unlike discriminative models, generative models focus on capturing the data's inherent structure, allowing them to generate new samples that resemble the original data. To fully exploit the potential of modeling probability distributions using quantum physics, a quantum-inspi… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures

  15. arXiv:2310.13864  [pdf, other

    cs.CL

    RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

    Authors: Wenjun Hou, Yi Cheng, Kaishuai Xu, Wenjie Li, Jiang Liu

    Abstract: Automating radiology report generation can significantly alleviate radiologists' workloads. Previous research has primarily focused on realizing highly concise observations while neglecting the precise attributes that determine the severity of diseases (e.g., small pleural effusion). Since incorrect attributes will lead to imprecise radiology reports, strengthening the generation process with prec… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023

  16. Using a Nearest-Neighbour, BERT-Based Approach for Scalable Clone Detection

    Authors: Muslim Chochlov, Gul Aftab Ahmed, James Vincent Patten, Guoxian Lu, Wei Hou, David Gregg, Jim Buckley

    Abstract: Code clones can detrimentally impact software maintenance and manually detecting them in very large codebases is impractical. Additionally, automated approaches find detection of Type 3 and Type 4 (inexact) clones very challenging. While the most recent artificial deep neural networks (for example BERT-based artificial neural networks) seem to be highly effective in detecting such clones, their pa… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 10 pages, 2 figures, 38th IEEE International Conference on Software Maintenance and Evolution

  17. arXiv:2308.09915  [pdf, other

    cs.CV cs.LG

    EGANS: Evolutionary Generative Adversarial Network Search for Zero-Shot Learning

    Authors: Shiming Chen, Shihuang Chen, Wenjin Hou, Weiping Ding, Xinge You

    Abstract: Zero-shot learning (ZSL) aims to recognize the novel classes which cannot be collected for training a prediction model. Accordingly, generative models (e.g., generative adversarial network (GAN)) are typically used to synthesize the visual samples conditioned by the class semantic vectors and achieve remarkable progress for ZSL. However, existing GAN-based generative ZSL methods are based on hand-… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted to TEVC

  18. arXiv:2308.05421  [pdf, other

    cs.CV cs.MM

    Progressive Spatio-temporal Perception for Audio-Visual Question Answering

    Authors: Guangyao Li, Wenxuan Hou, Di Hu

    Abstract: Audio-Visual Question Answering (AVQA) task aims to answer questions about different visual objects, sounds, and their associations in videos. Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where most of which could be unrelated to the given questions, or even play as interference in answering the content of interest. Oppositely, only focusing o… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  19. arXiv:2308.04370  [pdf, other

    cs.CV

    When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study

    Authors: Juan Wen, Shupeng Cheng, Peng Xu, Bowen Zhou, Radu Timofte, Weiyan Hou, Luc Van Gool

    Abstract: Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications. For instance, low-resolution surveillance images can be successively processed by super-resolution techniques and camouflaged object detection. However, in previous work, these two areas are always studied in isolation. In this paper, we, for the first time, conduct a… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 23 pages with 8 figures

    MSC Class: 68T45 ACM Class: I.4.3

  20. arXiv:2307.11760  [pdf, other

    cs.CL cs.AI cs.HC

    Large Language Models Understand and Can be Enhanced by Emotional Stimuli

    Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

    Abstract: Emotional intelligence significantly impacts our daily behaviors and interactions. Although Large Language Models (LLMs) are increasingly viewed as a stride toward artificial general intelligence, exhibiting impressive performance in numerous tasks, it is still uncertain if LLMs can genuinely grasp psychological emotional stimuli. Understanding and responding to emotional cues gives humans a disti… ▽ More

    Submitted 12 November, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Technical report; updated the std error for human study; short version (v1) was accepted by LLM@IJCAI'23; 32 pages; more work: https://llm-enhance.github.io/

  21. arXiv:2307.04427  [pdf, other

    astro-ph.HE astro-ph.GA cs.LG

    Observation of high-energy neutrinos from the Galactic plane

    Authors: R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., S. W. Barwick, V. Basu, S. Baur, R. Bay, J. J. Beatty, K. -H. Becker, J. Becker Tjus , et al. (364 additional authors not shown)

    Abstract: The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrin… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Submitted on May 12th, 2022; Accepted on May 4th, 2023

    Journal ref: Science 380, 6652, 1338-1343 (2023)

  22. arXiv:2306.09431  [pdf, other

    cs.MM

    Towards Long Form Audio-visual Video Understanding

    Authors: Wenxuan Hou, Guangyao Li, Yapeng Tian, Di Hu

    Abstract: We live in a world filled with never-ending streams of multimodal information. As a more natural recording of the real scenario, long form audio-visual videos are expected as an important bridge for better exploring and understanding the world. In this paper, we propose the multisensory temporal event localization task in long form videos and strive to tackle the associated challenges. To facilita… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  23. arXiv:2306.06931  [pdf, other

    cs.LG cs.CV

    Evolving Semantic Prototype Improves Generative Zero-Shot Learning

    Authors: Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang

    Abstract: In zero-shot learning (ZSL), generative methods synthesize class-related sample features based on predefined semantic prototypes. They advance the ZSL performance by synthesizing unseen class sample features for better training the classifier. We observe that each class's predefined semantic prototype (also referred to as semantic embedding or condition) does not accurately match its real semantic… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML'23

  24. arXiv:2306.06466  [pdf, other

    cs.CL

    ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning

    Authors: Wenjun Hou, Kaishuai Xu, Yi Cheng, Wenjie Li, Jiang Liu

    Abstract: This paper explores the task of radiology report generation, which aims at generating free-text descriptions for a set of radiographs. One significant challenge of this task is how to correctly maintain the consistency between the images and the lengthy report. Previous research explored solving this issue through planning-based methods, which generate reports only based on high-level plans. Howev… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023

  25. arXiv:2306.04893  [pdf, other

    cs.CV

    Coping with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization

    Authors: Shuo Ye, Shujian Yu, Wenjin Hou, Yu Wang, Xinge You

    Abstract: Fine-grained visual categorization (FGVC) is a challenging task due to similar visual appearances between various species. Previous studies always implicitly assume that the training and test data have the same underlying distributions, and that features extracted by modern backbone architectures remain discriminative and generalize well to unseen test data. However, we empirically justify that th… ▽ More

    Submitted 9 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Manuscript accepted by CVIU, code is available at Github

  26. arXiv:2305.18109  [pdf, other

    cs.CL cs.AI

    Medical Dialogue Generation via Dual Flow Modeling

    Authors: Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

    Abstract: Medical dialogue systems (MDS) aim to provide patients with medical services, such as diagnosis and prescription. Since most patients cannot precisely describe their symptoms, dialogue understanding is challenging for MDS. Previous studies mainly addressed this by extracting the mentioned medical entities as critical dialogue history information. In this work, we argue that it is also essential to… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted as Findings of ACL 2023

  27. arXiv:2302.12095  [pdf, other

    cs.AI cs.CL cs.LG

    On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

    Authors: Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie

    Abstract: ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct… ▽ More

    Submitted 29 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Highlighted paper at ICLR 2023 workshop on Trustworthy and Reliable Large-Scale Machine Learning Models; code is at: https://github.com/microsoft/robustlearn; more works: https://llm-eval.github.io/

  28. arXiv:2209.03042  [pdf, other

    hep-ex astro-ph.IM cs.LG physics.data-an physics.ins-det

    Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

    Authors: R. Abbasi, M. Ackermann, J. Adams, N. Aggarwal, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, K. -H. Becker , et al. (359 additional authors not shown)

    Abstract: IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen… ▽ More

    Submitted 11 October, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: Prepared for submission to JINST

  29. arXiv:2208.10833  [pdf, other

    cs.SE cs.AI cs.LG

    LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction

    Authors: Hongcheng Guo, Yuhui Guo, Renjie Chen, Jian Yang, Jiaheng Liu, Zhoujun Li, Tieqiao Zheng, Weichao Hou, Liangfan Zheng, Bo Zhang

    Abstract: Fully supervised log anomaly detection methods suffer the heavy burden of annotating massive unlabeled log data. Recently, many semi-supervised methods have been proposed to reduce annotation costs with the help of parsed templates. However, these methods consider each keyword independently, which disregards the correlation between keywords and the contextual relationships among log sequences. In… ▽ More

    Submitted 11 April, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: 12 pages

  30. arXiv:2208.08280  [pdf, other

    cs.CL

    Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction

    Authors: Yidong Wang, Hao Wu, Ao Liu, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki, Manabu Okumura, Yue Zhang

    Abstract: Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentiment analysis task that aims to extract the corresponding opinion words of a given opinion target from the sentence. Recently, deep learning approaches have made remarkable progress on this task. Nevertheless, the TOWE task still suffers from the scarcity of training data due to the expensive data annotation process. Limited la… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: Accepted by COLING 2022

  31. arXiv:2208.07204  [pdf, other

    cs.LG cs.AI cs.CV

    USB: A Unified Semi-supervised Learning Benchmark for Classification

    Authors: Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang

    Abstract: Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issu… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted by NeurIPS'22 dataset and benchmark track; code at https://github.com/microsoft/Semi-supervised-learning

  32. arXiv:2206.12169  [pdf, other

    cs.LG cs.AI

    AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

    Authors: Wenzheng Hou, Qianqian Xu, Zhiyong Yang, Shilong Bao, Yuan He, Qingming Huang

    Abstract: It is well-known that deep learning models are vulnerable to adversarial examples. Existing studies of adversarial training have made great progress against this challenge. As a typical trait, they often assume that the class distribution is overall balanced. However, long-tail datasets are ubiquitous in a wide spectrum of applications, where the amount of head class instances is larger than the t… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  33. arXiv:2206.09783  [pdf, other

    eess.AS cs.CL cs.SD

    Boosting Cross-Domain Speech Recognition with Self-Supervision

    Authors: Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

    Abstract: The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (S… ▽ More

    Submitted 30 July, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

  34. arXiv:2205.07246  [pdf, other

    cs.LG cs.CV

    FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

    Authors: Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie

    Abstract: Semi-supervised Learning (SSL) has witnessed great success owing to the impressive performances brought by various methods based on pseudo labeling and consistency regularization. However, we argue that existing methods might fail to utilize the unlabeled data more effectively since they either use a pre-defined / fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior perfo… ▽ More

    Submitted 31 January, 2023; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: Accepted by ICLR 2023. Code: https://github.com/microsoft/Semi-supervised-learning

  35. arXiv:2205.04121  [pdf, other

    cs.CV cs.HC

    Identifying Fixation and Saccades in Virtual Reality

    Authors: Xiao-lin Chen, Wen-jun Hou

    Abstract: Gaze recognition can significantly reduce the amount of eye movement data for a better understanding of cognitive and visual processing. Gaze recognition is an essential precondition for eye-based interaction applications in virtual reality. However, the three-dimensional characteristics of virtual reality environments also pose new challenges to existing recognition algorithms. Based on seven eva… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  36. arXiv:2112.08643  [pdf, other

    cs.CV cs.AI

    TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

    Authors: Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao

    Abstract: Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferability and discriminative attribute localization of visual features. In this paper, we propose… ▽ More

    Submitted 13 December, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: This is an extention of AAAI'22 paper (TransZero). Accepted to TPAMI. arXiv admin note: substantial text overlap with arXiv:2112.01683

  37. arXiv:2112.07225  [pdf, other

    cs.CV cs.AI cs.LG

    Margin Calibration for Long-Tailed Visual Recognition

    Authors: Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki

    Abstract: The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. W… ▽ More

    Submitted 7 October, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Accepted by Asian Conference on Machine Learning (ACML) 2022; 16 pages

  38. arXiv:2110.08263  [pdf, other

    cs.LG cs.CV

    FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

    Authors: Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki

    Abstract: The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue… ▽ More

    Submitted 28 January, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021; camera-ready version; 16 pages with appendix; code: https://github.com/TorchSSL/TorchSSL

  39. arXiv:2105.11905  [pdf, other

    cs.CL cs.SD eess.AS

    Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

    Authors: Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki

    Abstract: Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. In this paper, we propose to use adapters to investigate the performance of multiple adapters for parameter-efficient cross-lingual speech… ▽ More

    Submitted 17 December, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE Transactions on Audio, Speech, and Language Processing (TASLP) as a full paper; 12 pages; code at https://github.com/jindongwang/transferlearning/tree/master/code/ASR/Adapter

  40. arXiv:2104.07491  [pdf, other

    cs.SD cs.LG eess.AS

    Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

    Authors: Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki

    Abstract: End-to-end automatic speech recognition (ASR) can achieve promising performance with large-scale training data. However, it is known that domain mismatch between training and testing data often leads to a degradation of recognition accuracy. In this work, we focus on the unsupervised domain adaptation for ASR and propose CMatch, a Character-level distribution matching method to perform fine-graine… ▽ More

    Submitted 8 June, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted to INTERSPEECH 2021; code available at https://github.com/jindongwang/transferlearning/tree/master/code/ASR/CMatch

  41. arXiv:2104.05752  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

    Authors: Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson Morais

    Abstract: A major focus of recent research in spoken language understanding (SLU) has been on the end-to-end approach where a single model can predict intents directly from speech inputs without intermediate transcripts. However, this approach presents some challenges. First, since speech can be considered as personally identifiable information, in some cases only automatic speech recognition (ASR) transcri… ▽ More

    Submitted 14 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021

  42. arXiv:2012.03729  [pdf, ps, other

    cs.LG

    TRACE: Early Detection of Chronic Kidney Disease Onset with Transformer-Enhanced Feature Embedding

    Authors: Yu Wang, Ziqiao Guan, Wei Hou, Fusheng Wang

    Abstract: Chronic kidney disease (CKD) has a poor prognosis due to excessive risk factors and comorbidities associated with it. The early detection of CKD faces challenges of insufficient medical histories of positive patients and complicated risk factors. In this paper, we propose the TRACE (Transformer-RNN Autoencoder-enhanced CKD Detector) framework, an end-to-end prediction model using patients' medical… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  43. arXiv:2012.02006  [pdf, other

    cs.DB cs.LG cs.SI stat.ML

    AugSplicing: Synchronized Behavior Detection in Streaming Tensors

    Authors: Jiabao Zhang, Shenghua Liu, Wenting Hou, Siddharth Bhatia, Huawei Shen, Wenjian Yu, Xueqi Cheng

    Abstract: How can we track synchronized behavior in a stream of time-stamped tuples, such as mobile devices installing and uninstalling applications in the lockstep, to boost their ranks in the app store? We model such tuples as entries in a streaming tensor, which augments attribute sizes in its modes over time. Synchronized behavior tends to form dense blocks (i.e. subtensors) in such a tensor, signaling… ▽ More

    Submitted 30 March, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: AAAI Conference on Artificial Intelligence (AAAI), 2021

  44. arXiv:2010.04589  [pdf

    cs.LG cs.CY stat.ML

    Identifying Risk of Opioid Use Disorder for Patients Taking Opioid Medications with Deep Learning

    Authors: Xinyu Dong, Jianyuan Deng, Sina Rashidian, Kayley Abell-Hart, Wei Hou, Richard N Rosenthal, Mary Saltz, Joel Saltz, Fusheng Wang

    Abstract: The United States is experiencing an opioid epidemic, and there were more than 10 million opioid misusers aged 12 or older each year. Identifying patients at high risk of Opioid Use Disorder (OUD) can help to make early clinical interventions to reduce the risk of OUD. Our goal is to predict OUD patients among opioid prescription users through analyzing electronic health records with machine learn… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 20 pages, 6 figures

  45. arXiv:2009.11260  [pdf, other

    cs.CL

    A Token-wise CNN-based Method for Sentence Compression

    Authors: Weiwei Hou, Hanna Suominen, Piotr Koniusz, Sabrina Caldwell, Tom Gedeon

    Abstract: Sentence compression is a Natural Language Processing (NLP) task aimed at shortening original sentences and preserving their key information. Its applications can benefit many fields e.g. one can build tools for language education. However, current methods are largely based on Recurrent Neural Network (RNN) models which suffer from poor processing speed. To address this issue, in this paper, we pr… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  46. arXiv:2004.04955  [pdf, other

    cs.CV cs.LG eess.IV

    Boosting Semantic Human Matting with Coarse Annotations

    Authors: Jinlin Liu, Yuan Yao, Wendi Hou, Miaomiao Cui, Xuansong Xie, Changshui Zhang, Xian-sheng Hua

    Abstract: Semantic human matting aims to estimate the per-pixel opacity of the foreground human regions. It is quite challenging and usually requires user interactive trimaps and plenty of high quality annotated data. Annotating such kind of data is labor intensive and requires great skills beyond normal users, especially considering the very detailed hair part of humans. In contrast, coarse annotated human… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

  47. Fixed-Symbol Aided Random Access Scheme for Machine-to-Machine Communications

    Authors: Zhaoji Zhang, Ying Li, Lei Liu, Wei Hou

    Abstract: The massiveness of devices in crowded Machine-to-Machine (M2M) communications brings new challenges to existing random-access (RA) schemes, such as heavy signaling overhead and severe access collisions. In order to reduce the signaling overhead, we propose a fixed-symbol aided RA scheme where active devices access the network in a grant-free method, i.e., data packets are directly transmitted in r… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: 15 pages, 9 figures

    Journal ref: IEEE Access, vol. 7, pp. 52913-52928, 2019

  48. arXiv:1904.06883  [pdf, other

    cs.CV

    DuBox: No-Prior Box Objection Detection via Residual Dual Scale Detectors

    Authors: Shuai Chen, Jinpeng Li, Chuanqi Yao, Wenbo Hou, Shuo Qin, Wenyao Jin, Xu Tang

    Abstract: Traditional neural objection detection methods use multi-scale features that allow multiple detectors to perform detecting tasks independently and in parallel. At the same time, with the handling of the prior box, the algorithm's ability to deal with scale invariance is enhanced. However, too many prior boxes and independent detectors will increase the computational redundancy of the detection alg… ▽ More

    Submitted 16 April, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

  49. arXiv:1903.12473  [pdf, other

    cs.CV

    Shape Robust Text Detection with Progressive Scale Expansion Network

    Authors: Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao

    Abstract: Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text in… ▽ More

    Submitted 29 July, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: Accepted by CVPR 2019. arXiv admin note: substantial text overlap with arXiv:1806.02559

  50. arXiv:1901.06757  [pdf, other

    cs.IT

    A construction of UD $k$-ary multi-user codes from $(2^m(k-1)+1)$-ary codes for MAAC

    Authors: SHAN Lu, Wei Hou, Jun Cheng, Hiroshi Kamabe

    Abstract: In this paper, we proposed a construction of a UD $k$-ary $T$-user coding scheme for MAAC. We first give a construction of $k$-ary $T^{f+g}$-user UD code from a $k$-ary $T^{f}$-user UD code and a $k^{\pm}$-ary $T^{g}$-user difference set with its two component sets $\mathcal{D}^{+}$ and $\mathcal{D}^{-}$ {\em a priori}. Based on the $k^{\pm}$-ary $T^{g}$-user difference set constructed from a… ▽ More

    Submitted 20 January, 2019; originally announced January 2019.