Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–48 of 48 results for author: Zong, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.14800  [pdf, other

    eess.AS cs.SD eess.SP

    Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity

    Authors: Tianhua Qi, Shiyan Wang, Cheng Lu, Yan Zhao, Yuan Zong, Wenming Zheng

    Abstract: Realistic emotional voice conversion (EVC) aims to enhance emotional diversity of converted audios, making the synthesized voices more authentic and natural. To this end, we propose Emotional Intensity-aware Network (EINet), dynamically adjusting intonation and rhythm by incorporating controllable emotional intensity. To better capture nuances in emotional intensity, we go beyond mere distance mea… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH2024

  2. arXiv:2407.12973  [pdf, other

    cs.CV cs.AI

    Temporal Label Hierachical Network for Compound Emotion Recognition

    Authors: Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Tianhua Qi, Hao Yang, Yuan Zong, Wenming Zheng

    Abstract: The emotion recognition has attracted more attention in recent decades. Although significant progress has been made in the recognition technology of the seven basic emotions, existing methods are still hard to tackle compound emotion recognition that occurred commonly in practical application. This article introduces our achievements in the 7th Field Emotion Behavior Analysis (ABAW) competition. I… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: draft for abaw7

  3. arXiv:2407.04617  [pdf, other

    cs.LG

    Randomized Physics-Informed Neural Networks for Bayesian Data Assimilation

    Authors: Yifei Zong, David Barajas-Solano, Alexandre M. Tartakovsky

    Abstract: We propose a randomized physics-informed neural network (PINN) or rPINN method for uncertainty quantification in inverse partial differential equation (PDE) problems with noisy data. This method is used to quantify uncertainty in the inverse PDE PINN solutions. Recently, the Bayesian PINN (BPINN) method was proposed, where the posterior distribution of the PINN parameters was formulated using the… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 38 pages, 8 figures

  4. arXiv:2406.18566  [pdf, other

    cs.CV cs.AI cs.LG

    Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted

    Authors: Ruchika Chavhan, Ondrej Bohdal, Yongshuo Zong, Da Li, Timothy Hospedales

    Abstract: Large-scale text-to-image diffusion models excel in generating high-quality images from textual inputs, yet concerns arise as research indicates their tendency to memorize and replicate training data, raising We also addressed the issue of memorization in diffusion models, where models tend to replicate exact training samples raising copyright infringement and privacy issues. Efforts within the te… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  5. arXiv:2406.12742  [pdf, other

    cs.CV cs.AI cs.CL

    Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

    Authors: Bingchen Zhao, Yongshuo Zong, Letian Zhang, Timothy Hospedales

    Abstract: The advancement of large language models (LLMs) has significantly broadened the scope of applications in natural language processing, with multi-modal LLMs extending these capabilities to integrate and interpret visual data. However, existing benchmarks for visual language models (VLMs) predominantly focus on single-image inputs, neglecting the crucial aspect of multi-image understanding. In this… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: First three authors contributed equally. Dataset: https://huggingface.co/datasets/VLLMs/MIRB

  6. arXiv:2405.00574  [pdf, other

    cs.CV cs.MM

    EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

    Authors: Deng Li, Xin Liu, Bohao Xing, Baiqiang Xia, Yuan Zong, Bihan Wen, Heikki Kälviäinen

    Abstract: Emotion AI is the ability of computers to understand human emotional states. Existing works have achieved promising progress, but two limitations remain to be solved: 1) Previous studies have been more focused on short sequential video emotion analysis while overlooking long sequential video. However, the emotions in short sequential videos only reflect instantaneous emotions, which may be deliber… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  7. arXiv:2403.13164  [pdf, other

    cs.LG

    VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning

    Authors: Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

    Abstract: Large language models (LLMs) famously exhibit emergent in-context learning (ICL) -- the ability to rapidly adapt to new tasks using few-shot examples provided as a prompt, without updating the model's weights. Built on top of LLMs, vision large language models (VLLMs) have advanced significantly in areas such as recognition, reasoning, and grounding. However, investigations into \emph{multimodal I… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  8. arXiv:2403.01494  [pdf, other

    eess.AS cs.SD eess.SP

    PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

    Authors: Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

    Abstract: In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception. To improve the content naturalness of converted audio, we have developed an end-to-end EVC architecture inspired by the high audio quality of… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP2024

  9. arXiv:2402.16718  [pdf

    physics.med-ph cs.AI

    An Overview of the Development of Stereotactic Body Radiation Therapy

    Authors: Yanqi Zong, Zhengrong Cui, Luqi Lin, Sihao Wang, Yizhi Chen

    Abstract: Stereotactic body radiation therapy (SBRT) refers to focusing high-energy rays in three-dimensional space on the tumor lesion area, reducing the dose received by surrounding normal tissues, which can effectively improve the local control rate of the tumor and reduce the probability of complications. With the comprehensive development of medical imaging, radiation biology and other disciplines, thi… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  10. arXiv:2402.15745  [pdf, other

    cs.CL cs.AI cs.CV

    GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

    Authors: Yi Zong, Xipeng Qiu

    Abstract: The Large Vision-Language Models (LVLMs) have demonstrated great abilities in image perception and language understanding. However, existing multimodal benchmarks focus on primary perception abilities and commonsense knowledge which are insufficient to reflect the comprehensive capabilities of LVLMs. We propose GAOKAO-MM, a multimodal benchmark based on the Chinese College Entrance Examination (GA… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  11. arXiv:2402.02207  [pdf, other

    cs.LG

    Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

    Authors: Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, Timothy Hospedales

    Abstract: Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the un… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  12. arXiv:2401.12925  [pdf, other

    cs.SD eess.AS

    Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition

    Authors: Yan Zhao, Jincen Wang, Cheng Lu, Sunan Li, Björn Schuller, Yuan Zong, Wenming Zheng

    Abstract: Cross-corpus speech emotion recognition (SER) aims to transfer emotional knowledge from a labeled source corpus to an unlabeled corpus. However, prior methods require access to source data during adaptation, which is unattainable in real-life scenarios due to data privacy protection concerns. This paper tackles a more practical task, namely source-free cross-corpus SER, where a pre-trained source… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  13. Simple Domain Adaptation for Sparse Retrievers

    Authors: Mathias Vast, Yuxuan Zong, Basile Van Cooten, Benjamin Piwowarski, Laure Soulier

    Abstract: In Information Retrieval, and more generally in Natural Language Processing, adapting models to specific domains is conducted through fine-tuning. Despite the successes achieved by this method and its versatility, the need for human-curated and labeled data makes it impractical to transfer to new tasks, domains, and/or languages when training data doesn't exist. Using the model without training (z… ▽ More

    Submitted 5 July, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted at ECIR 2024

    Journal ref: Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610

  14. arXiv:2401.10536  [pdf, other

    cs.CL

    Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition

    Authors: Yong Wang, Cheng Lu, Hailun Lian, Yan Zhao, Björn Schuller, Yuan Zong, Wenming Zheng

    Abstract: Swin-Transformer has demonstrated remarkable success in computer vision by leveraging its hierarchical feature representation based on Transformer. In speech signals, emotional information is distributed across different scales of speech features, e.\,g., word, phrase, and utterance. Drawing above inspiration, this paper presents a hierarchical speech Transformer with shifted windows to aggregate… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  15. arXiv:2401.09752  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

    Authors: Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Björn Schuller, Wenming Zheng

    Abstract: In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers. Consequently, when the trained model is confronted with data from new speakers, its performance tends to degrade. To address the issue, we propose a Dynamic Joint Distribu… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  16. arXiv:2312.06466  [pdf, other

    cs.SD eess.AS

    Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach

    Authors: Yan Zhao, Yuan Zong, Hailun Lian, Cheng Lu, Jingang Shi, Wenming Zheng

    Abstract: Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch, potentially degrading the performance of established SER methods. In this paper, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledgeguided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific know… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  17. arXiv:2312.06177  [pdf, other

    cs.LG

    Randomized Physics-Informed Machine Learning for Uncertainty Quantification in High-Dimensional Inverse Problems

    Authors: Yifei Zong, David Barajas-Solano, Alexandre M. Tartakovsky

    Abstract: We propose a physics-informed machine learning method for uncertainty quantification in high-dimensional inverse problems. In this method, the states and parameters of partial differential equations (PDEs) are approximated with truncated conditional Karhunen-Loève expansions (CKLEs), which, by construction, match the measurements of the respective variables. The maximum a posteriori (MAP) solution… ▽ More

    Submitted 23 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    MSC Class: 60H15; 68T07; 60J10

  18. arXiv:2311.05199  [pdf, other

    cs.CV

    BrainNetDiff: Generative AI Empowers Brain Network Generation via Multimodal Diffusion Model

    Authors: Yongcheng Zong, Shuqiang Wang

    Abstract: Brain network analysis has emerged as pivotal method for gaining a deeper understanding of brain functions and disease mechanisms. Despite the existence of various network construction approaches, shortcomings persist in the learning of correlations between structural and functional brain imaging data. In light of this, we introduce a novel method called BrainNetDiff, which combines a multi-head T… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  19. arXiv:2311.03205  [pdf, other

    cs.CV

    PainSeeker: An Automated Method for Assessing Pain in Rats Through Facial Expressions

    Authors: Liu Liu, Guang Li, Dingfan Deng, Jinhua Yu, Yuan Zong

    Abstract: In this letter, we aim to investigate whether laboratory rats' pain can be automatically assessed through their facial expressions. To this end, we began by presenting a publicly available dataset called RatsPain, consisting of 1,138 facial images captured from six rats that underwent an orthodontic treatment operation. Each rat' facial images in RatsPain were carefully selected from videos record… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  20. arXiv:2310.06627  [pdf, other

    cs.CL cs.CV cs.LG

    What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models

    Authors: Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, Bingchen Zhao

    Abstract: Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal large language models, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  21. arXiv:2310.04664  [pdf, other

    cs.CV

    Learning to Rank Onset-Occurring-Offset Representations for Micro-Expression Recognition

    Authors: Jie Zhu, Yuan Zong, Jingang Shi, Cheng Lu, Hongli Chang, Wenming Zheng

    Abstract: This paper focuses on the research of micro-expression recognition (MER) and proposes a flexible and reliable deep learning method called learning to rank onset-occurring-offset representations (LTR3O). The LTR3O method introduces a dynamic and reduced-size sequence structure known as 3O, which consists of onset, occurring, and offset frames, for representing micro-expressions (MEs). This structur… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  22. arXiv:2310.03992  [pdf, other

    cs.SD eess.AS

    Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

    Authors: Yan Zhao, Yuan Zong, Jincen Wang, Hailun Lian, Cheng Lu, Li Zhao, Wenming Zheng

    Abstract: In this paper, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDAN) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDAN), whose key contribution lies in the introduction of a novel regularization term called… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  23. arXiv:2310.03318  [pdf

    cs.SE

    On Metaverse Application Dependability Analysis

    Authors: Yingfan Zong, Jing Bai, Xiaolin Chang, Fumio Machida, Yingsi Zhao

    Abstract: Metaverse as-a-Service (MaaS) enables Metaverse tenants to execute their APPlications (MetaAPP) by allocating Metaverse resources in the form of Metaverse service functions (MSF). Usually, each MSF is deployed in a virtual machine (VM) for better resiliency and security. However, these MSFs along with VMs and virtual machine monitors (VMM) running them may encounter software aging after prolonged… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  24. arXiv:2310.01651  [pdf, other

    cs.LG

    Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

    Authors: Yongshuo Zong, Tingyang Yu, Ruchika Chavhan, Bingchen Zhao, Timothy Hospedales

    Abstract: Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a… ▽ More

    Submitted 16 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  25. arXiv:2309.14761  [pdf, other

    eess.AS cs.SD

    Optimization Techniques for a Physical Model of Human Vocalisation

    Authors: Mateo Cámara, Zhiyuan Xu, Yisu Zong, José Luis Blanco, Joshua D. Reiss

    Abstract: We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between rea… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to DAFx 2023

  26. arXiv:2309.08963  [pdf, other

    cs.CL

    Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

    Authors: Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

    Abstract: Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging. Our study assesses LLMs' proficiency in structuring tables and introduces a novel fine-tuning method, cognizant of data structures, to bolster their performance. We unveil Struc-Bench, a comprehensive benchmark featuring prominent LLMs (GPT-NeoX-20B, GPT-3… ▽ More

    Submitted 4 April, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

  27. arXiv:2308.14568  [pdf, other

    cs.SD eess.AS

    Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

    Authors: Yong Wang, Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Sunan Li

    Abstract: In this paper, we propose a novel time-frequency joint learning method for speech emotion recognition, called Time-Frequency Transformer. Its advantage is that the Time-Frequency Transformer can excavate global emotion patterns in the time-frequency domain of speech signal while modeling the local emotional correlations in the time domain and frequency domain respectively. For the purpose, we firs… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by International Conference on Neural Information Processing (ICONIP2023)

  28. arXiv:2306.01491  [pdf, other

    cs.SD

    Learning Local to Global Feature Aggregation for Speech Emotion Recognition

    Authors: Cheng Lu, Hailun Lian, Wenming Zheng, Yuan Zong, Yan Zhao, Sunan Li

    Abstract: Transformer has emerged in speech emotion recognition (SER) at present. However, its equal patch division not only damages frequency information but also ignores local emotion correlations across frames, which are key cues to represent emotion. To handle the issue, we propose a Local to Global Feature Aggregation learning (LGFA) for SER, which can aggregate longterm emotion correlations at differe… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted on INTERSPEECH 2023

  29. arXiv:2305.12474  [pdf, other

    cs.CL cs.AI

    Evaluating the Performance of Large Language Models on GAOKAO Benchmark

    Authors: Xiaotian Zhang, Chunyang Li, Yi Zong, Zhengyu Ying, Liang He, Xipeng Qiu

    Abstract: Large Language Models(LLMs) have demonstrated remarkable performance across various natural language processing tasks; however, how to comprehensively and accurately assess their performance becomes an urgent issue to be addressed. This paper introduces GAOKAO-Bench, an intuitive benchmark that employs questions from the Chinese GAOKAO examination as test samples, including both subjective and obj… ▽ More

    Submitted 24 February, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

  30. arXiv:2305.07625  [pdf, other

    cs.CV cs.LG stat.ML

    Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn

    Authors: Ondrej Bohdal, Yinbing Tian, Yongshuo Zong, Ruchika Chavhan, Da Li, Henry Gouk, Li Guo, Timothy Hospedales

    Abstract: Meta-learning and other approaches to few-shot learning are widely studied for image recognition, and are increasingly applied to other vision tasks such as pose estimation and dense prediction. This naturally raises the question of whether there is any few-shot meta-learning algorithm capable of generalizing across these diverse task types? To support the community in answering this question, we… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted at CVPR 2023. Project page: https://edi-meta-learning.github.io/meta-omnium

  31. arXiv:2304.14095  [pdf, other

    cs.NI

    Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI

    Authors: Louise Axon, Dimitrios Panagiotakopoulos, Samuel Ayo, Carolina Sanchez-Hernandez, Yan Zong, Simon Brown, Lei Zhang, Michael Goldsmith, Sadie Creese, Weisi Guo

    Abstract: Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: under review in IEEE

  32. arXiv:2304.01008  [pdf, other

    cs.LG cs.AI cs.CL

    Self-Supervised Multimodal Learning: A Survey

    Authors: Yongshuo Zong, Oisin Mac Aodha, Timothy Hospedales

    Abstract: Multimodal learning, which aims to understand and analyze information from multiple modalities, has achieved substantial progress in the supervised regime in recent years. However, the heavy dependence on data paired with expensive human annotations impedes scaling up models. Meanwhile, given the availability of large-scale unannotated data in the wild, self-supervised learning has become an attra… ▽ More

    Submitted 4 August, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

  33. arXiv:2302.08921  [pdf, other

    cs.SD cs.CL eess.AS

    Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

    Authors: Yan Zhao, Jincen Wang, Yuan Zong, Wenming Zheng, Hailun Lian, Li Zhao

    Abstract: In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. Specifically, DIDAN first adopts a simple deep regression network consisting of a set of conv… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  34. arXiv:2210.12430  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Speech Emotion Recognition via an Attentive Time-Frequency Neural Network

    Authors: Cheng Lu, Wenming Zheng, Hailun Lian, Yuan Zong, Chuangao Tang, Sunan Li, Yan Zhao

    Abstract: Spectrogram is commonly used as the input feature of deep neural networks to learn the high(er)-level time-frequency pattern of speech signal for speech emotion recognition (SER). \textcolor{black}{Generally, different emotions correspond to specific energy activations both within frequency bands and time frames on spectrogram, which indicates the frequency and time domains are both essential to r… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: This paper has been accepted as a regular paper on IEEE Transactions on Computational Social Systems

  35. arXiv:2210.01725  [pdf, other

    cs.LG cs.AI eess.IV

    MEDFAIR: Benchmarking Fairness for Medical Imaging

    Authors: Yongshuo Zong, Yongxin Yang, Timothy Hospedales

    Abstract: A multitude of work has shown that machine learning-based medical diagnosis systems can be biased against certain subgroups of people. This has motivated a growing number of bias mitigation algorithms that aim to address fairness issues in machine learning. However, it is difficult to compare their effectiveness in medical imaging for two reasons. First, there is little consensus on the criteria t… ▽ More

    Submitted 17 February, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023

  36. arXiv:2209.08445  [pdf, ps, other

    cs.CV

    SDFE-LV: A Large-Scale, Multi-Source, and Unconstrained Database for Spotting Dynamic Facial Expressions in Long Videos

    Authors: Xiaolin Xu, Yuan Zong, Wenming Zheng, Yang Li, Chuangao Tang, Xingxun Jiang, Haolin Jiang

    Abstract: In this paper, we present a large-scale, multi-source, and unconstrained database called SDFE-LV for spotting the onset and offset frames of a complete dynamic facial expression from long videos, which is known as the topic of dynamic facial expression spotting (DFES) and a vital prior step for lots of facial expression analysis tasks. Specifically, SDFE-LV consists of 1,191 long videos, each of w… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

  37. arXiv:2208.08878  [pdf, other

    cs.LG cs.AI

    Towards Learning in Grey Spatiotemporal Systems: A Prophet to Non-consecutive Spatiotemporal Dynamics

    Authors: Zhengyang Zhou, Yang Kuo, Wei Sun, Binwu Wang, Min Zhou, Yunan Zong, Yang Wang

    Abstract: Spatiotemporal forecasting is an imperative topic in data science due to its diverse and critical applications in smart cities. Existing works mostly perform consecutive predictions of following steps with observations completely and continuously obtained, where nearest observations can be exploited as key knowledge for instantaneous status estimation. However, the practical issues of early activi… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 13 pages, 6 figures and 4 tables

  38. Physics-Informed Neural Network Method for Parabolic Differential Equations with Sharply Perturbed Initial Conditions

    Authors: Yifei Zong, QiZhi He, Alexandre M. Tartakovsky

    Abstract: In this paper, we develop a physics-informed neural network (PINN) model for parabolic problems with a sharply perturbed initial condition. As an example of a parabolic problem, we consider the advection-dispersion equation (ADE) with a point (Gaussian) source initial condition. In the $d$-dimensional ADE, perturbations in the initial condition decay with time $t$ as $t^{-d/2}$, which can cause a… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    MSC Class: 35K99

  39. arXiv:2111.15361  [pdf, other

    cs.CV cs.AI

    Seeking Salient Facial Regions for Cross-Database Micro-Expression Recognition

    Authors: Xingxun Jiang, Yuan Zong, Wenming Zheng, Jiateng Liu, Mengting Wei

    Abstract: Cross-Database Micro-Expression Recognition (CDMER) aims to develop the Micro-Expression Recognition (MER) methods with strong domain adaptability, i.e., the ability to recognize the Micro-Expressions (MEs) of different subjects captured by different imaging devices in different scenes. The development of CDMER is faced with two key problems: 1) the severe feature distribution gap between the sour… ▽ More

    Submitted 5 July, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

  40. arXiv:2010.09342  [pdf, other

    cs.CV cs.AI

    SMA-STN: Segmented Movement-Attending Spatiotemporal Network forMicro-Expression Recognition

    Authors: Jiateng Liu, Wenming Zheng, Yuan Zong

    Abstract: Correctly perceiving micro-expression is difficult since micro-expression is an involuntary, repressed, and subtle facial expression, and efficiently revealing the subtle movement changes and capturing the significant segments in a micro-expression sequence is the key to micro-expression recognition (MER). To handle the crucial issue, in this paper, we firstly propose a dynamic segmented sparse im… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: 9 pages

  41. arXiv:2009.01427  [pdf, other

    cs.CV

    Spatial Transformer Point Convolution

    Authors: Yuan Fang, Chunyan Xu, Zhen Cui, Yuan Zong, Jian Yang

    Abstract: Point clouds are unstructured and unordered in the embedded 3D space. In order to produce consistent responses under different permutation layouts, most existing methods aggregate local spatial points through maximum or summation operation. But such an aggregation essentially belongs to the isotropic filtering on all operated points therein, which tends to lose the information of geometric structu… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  42. arXiv:2008.05924  [pdf, ps, other

    cs.CV cs.MM

    DFEW: A Large-Scale Database for Recognizing Dynamic Facial Expressions in the Wild

    Authors: Xingxun Jiang, Yuan Zong, Wenming Zheng, Chuangao Tang, Wanchuang Xia, Cheng Lu, Jiateng Liu

    Abstract: Recently, facial expression recognition (FER) in the wild has gained a lot of researchers' attention because it is a valuable topic to enable the FER techniques to move from the laboratory to the real applications. In this paper, we focus on this challenging but interesting topic and make contributions from three aspects. First, we present a new large-scale 'in-the-wild' dynamic facial expression… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  43. arXiv:1906.08467  [pdf, other

    cs.CV

    GAN-Knowledge Distillation for one-stage Object Detection

    Authors: Wei Hong, Jin ke Yu Fan Zong

    Abstract: Convolutional neural networks have a significant improvement in the accuracy of Object detection. As convolutional neural networks become deeper, the accuracy of detection is also obviously improved, and more floating-point calculations are needed. Many researchers use the knowledge distillation method to improve the accuracy of student networks by transferring knowledge from a deeper and larger t… ▽ More

    Submitted 3 July, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

  44. arXiv:1812.07742  [pdf, other

    cs.CV

    Cross-Database Micro-Expression Recognition: A Benchmark

    Authors: Yuan Zong, Tong Zhang, Wenming Zheng, Xiaopeng Hong, Chuangao Tang, Zhen Cui, Guoying Zhao

    Abstract: Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problem in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in the inconsistency of the feature distributions between the training and… ▽ More

    Submitted 11 November, 2019; v1 submitted 18 December, 2018; originally announced December 2018.

    Comments: 13 pages

  45. arXiv:1811.12774  [pdf, ps, other

    cs.CV

    Cross-database non-frontal facial expression recognition based on transductive deep transfer learning

    Authors: Keyu Yan, Wenming Zheng, Tong Zhang, Yuan Zong, Zhen Cui

    Abstract: Cross-database non-frontal expression recognition is a very meaningful but rather difficult subject in the fields of computer vision and affect computing. In this paper, we proposed a novel transductive deep transfer learning architecture based on widely used VGGface16-Net for this problem. In this framework, the VGGface16-Net is used to jointly learn an common optimal nonlinear discriminative fea… ▽ More

    Submitted 30 November, 2018; originally announced November 2018.

  46. arXiv:1710.03131  [pdf, other

    cs.AI

    MSC: A Dataset for Macro-Management in StarCraft II

    Authors: Huikai Wu, Yanqi Zong, Junge Zhang, Kaiqi Huang

    Abstract: Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There're neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and t… ▽ More

    Submitted 3 April, 2023; v1 submitted 9 October, 2017; originally announced October 2017.

    Comments: Homepage: https://github.com/wuhuikai/MSC

  47. Learning a Target Sample Re-Generator for Cross-Database Micro-Expression Recognition

    Authors: Yuan Zong, Xiaohua Huang, Wenming Zheng, Zhen Cui, Guoying Zhao

    Abstract: In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

    Comments: To appear at ACM Multimedia 2017

  48. Spatial-Temporal Recurrent Neural Network for Emotion Recognition

    Authors: Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Yang Li

    Abstract: Emotion analysis is a crucial problem to endow artifact machines with real intelligence in many large potential applications. As external appearances of human emotions, electroencephalogram (EEG) signals and video face signals are widely used to track and analyze human's affective information. According to their common characteristics of spatial-temporal volumes, in this paper we propose a novel d… ▽ More

    Submitted 12 May, 2017; originally announced May 2017.