Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 82 results for author: Gao, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.20893  [pdf, other

    cs.LG cs.AI eess.SP

    MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network

    Authors: Yinlong Xu, Xiaoqiang Liu, Zitai Kong, Yixuan Wu, Yue Wang, Yingzhou Lu, Honghao Gao, Jian Wu, Hongxia Xu

    Abstract: Cardiac arrhythmia, a condition characterized by irregular heartbeats, often serves as an early indication of various heart ailments. With the advent of deep learning, numerous innovative models have been introduced for diagnosing arrhythmias using Electrocardiogram (ECG) signals. However, recent studies solely focus on the performance of models, neglecting the interpretation of their results. Thi… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.18338  [pdf, other

    cs.CV eess.IV q-bio.BM

    SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images

    Authors: Ching Ting Leung, Yufan Chen, Hanyu Gao

    Abstract: Optical chemical structure recognition (OCSR) systems aim to extract the molecular structure information, usually in the form of molecular graph or SMILES, from images of chemical molecules. While many tools have been developed for this purpose, challenges still exist due to different types of noises that might exist in the images. Specifically, we focus on the 'arrow-pushing' diagrams, a typical… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Under Submission

  3. arXiv:2407.08950  [pdf, other

    cs.CV eess.IV

    Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration

    Authors: Hu Gao, Depeng Dang

    Abstract: Image restoration aims to recover high-quality images from their corrupted counterparts. Many existing methods primarily focus on the spatial domain, neglecting the understanding of frequency variations and ignoring the impact of implicit noise in skip connections. In this paper, we introduce a multi-scale frequency selection network (MSFSNet) that seamlessly integrates spatial and frequency domai… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.20106

  4. arXiv:2407.02264  [pdf, other

    cs.CV cs.SD eess.AS

    SOAF: Scene Occlusion-aware Neural Acoustic Field

    Authors: Huiyu Gao, Jiahao Ma, David Ahmedt-Aristizabal, Chuong Nguyen, Miaomiao Liu

    Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach… ▽ More

    Submitted 2 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  5. arXiv:2406.10236  [pdf, other

    eess.IV cs.AI

    Lightening Anything in Medical Images

    Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, Jingyi Xu, Lipeng Ma, Yatian Yang, Pinghong Zhou

    Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 23 pages, 6 figures

  6. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  7. arXiv:2406.01795  [pdf, other

    eess.IV

    Video Coding with Cross-Component Sample Offset

    Authors: Han Gao, Xin Zhao, Tianqi Liu, Shan Liu

    Abstract: Beyond the exploration of traditional spatial, temporal and subjective visual signal redundancy in image and video compression, recent research has focused on leveraging cross-color component redundancy to enhance coding efficiency. Cross-component coding approaches are motivated by the statistical correlations among different color components, such as those in the Y'CbCr color space, where luma (… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  8. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  9. arXiv:2405.00056  [pdf, other

    eess.SY cs.GT

    Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

    Authors: Yousef Emami, Hao Gao, Kai Li, Luis Almeida, Eduardo Tovar, Zhu Han

    Abstract: Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs… ▽ More

    Submitted 2 May, 2024; v1 submitted 24 April, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2312.09953

    MSC Class: 00 ACM Class: C.2

  10. arXiv:2404.10640  [pdf, other

    eess.IV

    Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos

    Authors: Jieming Yu, Long Bai, Guankun Wang, An Wang, Xiaoxiao Yang, Huxin Gao, Hongliang Ren

    Abstract: The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instr… ▽ More

    Submitted 8 August, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: IEEE ICRA 2024 C4SR+ Workshop

  11. arXiv:2404.09500  [pdf

    physics.optics eess.IV

    On-chip Real-time Hyperspectral Imager with Full CMOS Resolution Enabled by Massively Parallel Neural Network

    Authors: Junren Wen, Haiqi Gao, Weiming Shi, Shuaibo Feng, Lingyun Hao, Yujie Liu, Liang Xu, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

    Abstract: Traditional spectral imaging methods are constrained by the time-consuming scanning process, limiting the application in dynamic scenarios. One-shot spectral imaging based on reconstruction has been a hot research topic recently and the primary challenges still lie in both efficient fabrication techniques suitable for mass production and the high-speed, high-accuracy reconstruction algorithm for r… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  12. arXiv:2404.08490  [pdf, other

    eess.SP

    SemHARQ: Semantic-Aware HARQ for Multi-task Semantic Communications

    Authors: Jiangjing Hu, Fengyu Wang, Wenjun Xu, Hui Gao, Ping Zhang

    Abstract: Intelligent task-oriented semantic communications (SemComs) have witnessed great progress with the development of deep learning (DL). In this paper, we propose a semantic-aware hybrid automatic repeat request (SemHARQ) framework for the robust and efficient transmissions of semantic features. First, to improve the robustness and effectiveness of semantic coding, a multi-task semantic encoder is pr… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  13. arXiv:2403.12028  [pdf, other

    cs.CV cs.AI eess.IV

    Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

    Authors: Mingjin Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao

    Abstract: 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the re… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project Page: https://air-discover.github.io/Ultraman/

  14. arXiv:2403.03489  [pdf, other

    eess.SY cs.CY

    Global Geolocated Realtime Data of Interfleet Urban Transit Bus Idling

    Authors: Nicholas Kunz, H. Oliver Gao

    Abstract: Urban transit bus idling is a contributor to ecological stress, economic inefficiency, and medically hazardous health outcomes due to emissions. The global accumulation of this frequent pattern of undesirable driving behavior is enormous. In order to measure its scale, we propose GRD-TRT- BUF-4I (Ground Truth Buffer for Idling) an extensible, realtime detection system that records the geolocation… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 34 pages, 12 figures, 36 tables, 100 data sources (including links). Under Review at Nature Scientific Data

  15. arXiv:2402.16581  [pdf, other

    eess.IV

    Rate Splitting Multiple Access-Enabled Adaptive Panoramic Video Semantic Transmission

    Authors: Haixiao Gao, Mengying Sun, Xiaodong Xu, Shujun Han, Bizhu Wang, Jingxuan Zhang, Ping Zhang

    Abstract: In this paper, we propose an adaptive panoramic video semantic transmission (APVST) framework enabled by rate splitting multiple access (RSMA). The APVST framework consists of a semantic transmitter and receiver, utilizing a deep joint source-channel coding structure to adaptively extract and encode semantic features from panoramic frames. To achieve higher spectral efficiency and conserve bandwid… ▽ More

    Submitted 23 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  16. arXiv:2401.11449  [pdf, other

    eess.SP cs.NI

    Energy Consumption Analysis for Continuous Phase Modulation in Smart-Grid Internet of Things of beyond 5G

    Authors: Hongjian Gao, Yang Lu, Shaoshi Yang, Jingsheng Tan, Longlong Nie, Xinyi Qu

    Abstract: Wireless sensor network (WSN) underpinning the smart-grid Internet of Things (SG-IoT) has been a popular research topic in recent years due to its great potential for enabling a wide range of important applications. However, the energy consumption (EC) characteristic of sensor nodes is a key factor that affects the operational performance (e.g., lifetime of sensors) and the total cost of ownership… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 7 figures, 2 tables

    Journal ref: Sensors, vol. 24, no. 2, pp. 1-14, article number 533, Jan. 2024

  17. arXiv:2310.14165  [pdf, other

    cs.LG cs.AI eess.SP

    Graph Convolutional Network with Connectivity Uncertainty for EEG-based Emotion Recognition

    Authors: Hongxiang Gao, Xiangyao Wang, Zhenghua Chen, Min Wu, Zhipeng Cai, Lulu Zhao, Jianqing Li, Chengyu Liu

    Abstract: Automatic emotion recognition based on multichannel Electroencephalography (EEG) holds great potential in advancing human-computer interaction. However, several significant challenges persist in existing research on algorithmic emotion recognition. These challenges include the need for a robust model to effectively learn discriminative node attributes over long paths, the exploration of ambiguous… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 10 pages

  18. arXiv:2310.10856  [pdf

    eess.SY cs.LG cs.MA

    Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

    Authors: Xianyue Peng, Hang Gao, Gengyue Han, Hao Wang, Michael Zhang

    Abstract: Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performan… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  19. arXiv:2310.05999  [pdf

    eess.SY

    Two stage Robust Nash Bargaining based Energy Trading between Hydrogen-enriched Gas and Active Distribution Networks

    Authors: Wenwen Zhang, Gao Qiu, Hongjun Gao, Tingjian Liu, Junyong Liu, Yaping Li, Shengchun Yang, Jiahao Yan, Wenbo Mao

    Abstract: Integration of emerging hydrogen-enriched compressed natural gas (HCNG) distribution network with active distribution net-work (ADN) provides huge latent flexibility on consuming re-newable energies. However, paucity of energy trading mechanism risks the stable earnings of the flexibility for both entities, especially when rising highly-efficient solid oxide fuel cells (SOFCs) are pioneered to int… ▽ More

    Submitted 22 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  20. arXiv:2309.02609  [pdf, other

    cs.RO eess.SY

    Directionality-Aware Mixture Model Parallel Sampling for Efficient Linear Parameter Varying Dynamical System Learning

    Authors: Sunan Sun, Haihui Gao, Tianyu Li, Nadia Figueroa

    Abstract: The Linear Parameter Varying Dynamical System (LPV-DS) is an effective approach that learns stable, time-invariant motion policies using statistical modeling and semi-definite optimization to encode complex motions for reactive robot control. Despite its strengths, the LPV-DS learning approach faces challenges in achieving a high model accuracy without compromising the computational efficiency. To… ▽ More

    Submitted 24 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

  21. arXiv:2308.10522  [pdf, other

    cs.CV cs.LG eess.IV

    Information Theory-Guided Heuristic Progressive Multi-View Coding

    Authors: Jiangmeng Li, Hang Gao, Wenwen Qiang, Changwen Zheng

    Abstract: Multi-view representation learning aims to capture comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning to different views in a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as… ▽ More

    Submitted 23 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: This paper is accepted by the jourcal of Neural Networks (Elsevier) by 2023. arXiv admin note: substantial text overlap with arXiv:2109.02344

  22. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  23. arXiv:2308.03806  [pdf, other

    cs.CR cs.SD eess.AS

    SoK: Acoustic Side Channels

    Authors: Ping Wang, Shishir Nagaraja, Aurélien Bourquard, Haichang Gao, Jeff Yan

    Abstract: We provide a state-of-the-art analysis of acoustic side channels, cover all the significant academic research in the area, discuss their security implications and countermeasures, and identify areas for future research. We also make an attempt to bridge side channels and inverse problems, two fields that appear to be completely isolated from each other but have deep connections.

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 16 pages

  24. arXiv:2307.09279  [pdf, other

    cs.CV eess.IV

    Regression-free Blind Image Quality Assessment with Content-Distortion Consistency

    Authors: Xiaoqi Wang, Jian Xiong, Hao Gao, Weisi Lin

    Abstract: The optimization objective of regression-based blind image quality assessment (IQA) models is to minimize the mean prediction error across the training dataset, which can lead to biased parameter estimation due to potential training data biases. To mitigate this issue, we propose a regression-free framework for image quality evaluation, which is based upon retrieving locally similar instances by i… ▽ More

    Submitted 21 October, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  25. arXiv:2306.00812  [pdf, other

    eess.AS cs.SD

    Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

    Authors: Xiaohuai Le, Tong Lei, Li Chen, Yiqing Guo, Chao He, Cheng Chen, Xianjun Xia, Hua Gao, Yijian Xiao, Piao Ding, Shenyi Song, Jing Lu

    Abstract: With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccura… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: accepted by Interspeech 2023

  26. arXiv:2304.10215  [pdf

    eess.SY

    Dynamic Security Region of Natural Gas Systems in Integrated Electricity-Gas Systems

    Authors: Han Gao, Peiyao Zhao, Zhengshuo Li

    Abstract: In an integrated electricity-gas system (IEGS), the tight coupling of power and natural gas systems is embodied by frequent changes in gas withdrawal from gas-fired units to provide regulation services for the power system to handle uncertainty, which may in turn endanger the secure operation of the natural gas system and ultimately affect the safety of the whole IEGS. Hence, it is necessary to ac… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  27. arXiv:2304.05882  [pdf, other

    eess.SP

    Scalable Multi-task Semantic Communication System with Feature Importance Ranking

    Authors: Jiangjing Hu, Fengyu Wang, Wenjun Xu, Hui Gao, Ping Zhang

    Abstract: Semantic communications are expected to be an innovative solution to the emerging intelligent applications in the era of connected intelligence. In this paper, a novel scalable multitask semantic communication system with feature importance ranking (SMSC-FIR) is explored. Firstly, the multi-task correlations are investigated by a joint semantic encoder to extract relevant features. Then, a new sca… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  28. ECG-CL: A Comprehensive Electrocardiogram Interpretation Method Based on Continual Learning

    Authors: Hongxiang Gao, Xingyao Wang, Zhenghua Chen, Min Wu, Jianqing Li, Chengyu Liu

    Abstract: Electrocardiogram (ECG) monitoring is one of the most powerful technique of cardiovascular disease (CVD) early identification, and the introduction of intelligent wearable ECG devices has enabled daily monitoring. However, due to the need for professional expertise in the ECGs interpretation, general public access has once again been restricted, prompting the need for the development of advanced d… ▽ More

    Submitted 21 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: 10 pages

  29. arXiv:2303.15206  [pdf, other

    cs.CV eess.IV

    Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views

    Authors: Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli

    Abstract: Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS metho… ▽ More

    Submitted 24 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  30. arXiv:2303.04643  [pdf

    eess.SY

    Robust Adaptive Control of STATCOMs to Mitigate Inverter-Based-Resource (IBR)-Induced Oscillations

    Authors: Hui Yuan, Linbin Huang, Huisheng Gao, Jikui Xing, Di Zheng, Ruisheng Diao

    Abstract: The interaction among inverter-based resources (IBRs) and power network may cause small-signal stability issues, especially in low short-circuit-level grids. Besides, integrating static synchronous compensators (STATCOMs) in a multi-IBR system for voltage support can deteriorate small-signal stability. However, it is still challenging to fully understand the impact mechanism of STATCOMs on IBR-ind… ▽ More

    Submitted 4 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

  31. arXiv:2301.13283  [pdf, other

    cs.RO eess.SY

    Online Learning Based Mobile Robot Controller Adaptation for Slip Reduction

    Authors: Huidong Gao, Rui Zhou, Masayoshi Tomizuka, Zhuo Xu

    Abstract: Slip is a very common phenomena present in wheeled mobile robotic systems. It has undesirable consequences such as wasting energy and impeding system stability. To tackle the challenge of mobile robot trajectory tracking under slippery conditions, we propose a hierarchical framework that learns and adapts gains of the tracking controllers simultaneously online. Concretely, a reinforcement learning… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  32. arXiv:2212.13587  [pdf, other

    cs.LG eess.SY

    Variance Reduction for Score Functions Using Optimal Baselines

    Authors: Ronan Keane, H. Oliver Gao

    Abstract: Many problems involve the use of models which learn probability distributions or incorporate randomness in some way. In such problems, because computing the true expected gradient may be intractable, a gradient estimator is used to update the model parameters. When the model parameters directly affect a probability distribution, the gradient estimator will involve score function terms. This paper… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

  33. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  34. arXiv:2211.00700  [pdf

    eess.IV

    MHITNet: a minimize network with a hierarchical context-attentional filter for segmenting medical ct images

    Authors: Hongyang He, Feng Ziliang, Yuanhang Zheng, Shudong Huang, HaoBing Gao

    Abstract: In the field of medical CT image processing, convolutional neural networks (CNNs) have been the dominant technique.Encoder-decoder CNNs utilise locality for efficiency, but they cannot simulate distant pixel interactions properly.Recent research indicates that self-attention or transformer layers can be stacked to efficiently learn long-range dependencies.By constructing and processing picture pat… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  35. arXiv:2210.14974  [pdf, other

    eess.IV cs.CV

    SINCO: A Novel structural regularizer for image compression using implicit neural representations

    Authors: Harry Gao, Weijie Gan, Zhixin Sun, Ulugbek S. Kamilov

    Abstract: Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  36. arXiv:2209.09018  [pdf, other

    eess.SP cs.LG stat.AP

    A Causal Intervention Scheme for Semantic Segmentation of Quasi-periodic Cardiovascular Signals

    Authors: Xingyao Wang, Yuwen Li, Hongxiang Gao, Xianghong Cheng, Jianqing Li, Chengyu Liu

    Abstract: Precise segmentation is a vital first step to analyze semantic information of cardiac cycle and capture anomaly with cardiovascular signals. However, in the field of deep semantic segmentation, inference is often unilaterally confounded by the individual attribute of data. Towards cardiovascular signals, quasi-periodicity is the essential characteristic to be learned, regarded as the synthesize of… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: submitted to IEEE Journal of Biomedical and Health Informatics (J-BHI)

  37. arXiv:2208.00165  [pdf

    eess.IV cs.CV

    Temporal extrapolation of heart wall segmentation in cardiac magnetic resonance images via pixel tracking

    Authors: Arash Rabbani, Hao Gao, Dirk Husmeier

    Abstract: In this study, we have tailored a pixel tracking method for temporal extrapolation of the ventricular segmentation masks in cardiac magnetic resonance images. The pixel tracking process starts from the end-diastolic frame of the heart cycle using the available manually segmented images to predict the end-systolic segmentation mask. The superpixels approach is used to divide the raw images into sma… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

  38. arXiv:2204.09278  [pdf

    eess.IV

    Bone marrow sparing for cervical cancer radiotherapy on multimodality medical images

    Authors: Yuening Wang, Ying Sun, Jie Yuan, Kexin Gan, Hanzi Xu, Han Gao, Xiuming Zhang

    Abstract: Cervical cancer threatens the health of women seriously. Radiotherapy is one of the main therapy methods but with high risk of acute hematologic toxicity. Delineating the bone marrow (BM) for sparing using computer tomography (CT) images to plan before radiotherapy can effectively avoid this risk. Comparing with magnetic resonance (MR) images, CT lacks the ability to express the activity of BM. Th… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

  39. arXiv:2204.09224  [pdf, other

    cs.SD cs.AI eess.AS

    ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

    Authors: Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

    Abstract: Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted va… ▽ More

    Submitted 23 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  40. arXiv:2204.01645  [pdf, other

    eess.IV cs.CV

    Three-dimensional Microstructural Image Synthesis from 2D Backscattered Electron Image of Cement Paste

    Authors: Xin Zhao, Lin Wang, Qinfei Li, Heng Chen, Shuangrong Liu, Pengkun Hou, Xu Wu, Jianfeng Yuan, Haozhong Gao, Bo Yang

    Abstract: This paper proposes a deep learning-based method for generating 3D microstructures from a single two-dimensional (2D) image, capable of producing high-quality, realistic 3D images at low cost. In the method, a framework (CEM3DMG) is designed to synthesize 3D images by learning microstructural information from a 2D backscattered electron (BSE) image. Experimental results show that CEM3DMG can gener… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

  41. arXiv:2203.15863  [pdf, other

    eess.AS cs.AI cs.CL

    WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

    Authors: Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

    Abstract: Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks with only a few text examples, without the need for fine-tuning. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings functioning lik… ▽ More

    Submitted 13 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: submitted to INTERSPEECH 2022

  42. arXiv:2203.15796  [pdf, other

    eess.AS cs.AI

    Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

    Authors: Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

    Abstract: An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Developing such a system can significantly improve the availability of speech techno… ▽ More

    Submitted 15 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: INTERSPEECH 2022

  43. arXiv:2203.04402  [pdf

    eess.SP physics.comp-ph

    High Noise Immune Time-domain Inversion via Cascade Network (TICaN) for Complex Scatterers

    Authors: Hongyu Gao, Yinpeng Wang, Qiang Ren, Zixi Wang, Liangcheng Deng, Chenyu Shi

    Abstract: In this paper, a high noise immune time-domain inversion cascade network (TICaN) is proposed to reconstruct scatterers from the measured electromagnetic fields. The TICaN is comprised of a denoising block aiming at improving the signal-to-noise ratio, and an inversion block to reconstruct the electromagnetic properties from the raw time-domain measurements. The scatterers investigated in this stud… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 9 pages, 11 figures

  44. Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

    Authors: A. Burakhan Koyuncu, Han Gao, Atanas Boev, Georgii Gaikov, Elena Alshina, Eckehard Steinbach

    Abstract: Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adapti… ▽ More

    Submitted 20 July, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted at ECCV 2022; 31 pages (14 main paper + References + 13 Appendix)

  45. Shallow Network Based on Depthwise Over-Parameterized Convolution for Hyperspectral Image Classification

    Authors: Hongmin Gao, Zhonghao Chen, Chenming Li

    Abstract: Recently, convolutional neural network (CNN) techniques have gained popularity as a tool for hyperspectral image classification (HSIC). To improve the feature extraction efficiency of HSIC under the condition of limited samples, the current methods generally use deep models with plenty of layers. However, deep network models are prone to overfitting and gradient vanishing problems when samples are… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  46. Experimental Investigation on the Friction-induced Vibration with Periodic Characteristics in a Running-in Process under Lubrication

    Authors: Di Sun, Pengfei Xing, Guobin Li, Hongtao Gao, Sifan Yang, Honglin Gao, Hongpeng Zhang

    Abstract: This paper investigated the friction-induced vibration (FIV) behavior under the running-in process with oil lubrication. The FIV signal with periodic characteristics under lubrication was identified with the help of the squeal signal induced in an oil-free wear experiment and then extracted by the harmonic wavelet packet transform (HWPT). The variation of the FIV signal from running-in wear stage… ▽ More

    Submitted 23 November, 2021; v1 submitted 15 November, 2021; originally announced November 2021.

  47. arXiv:2108.00190  [pdf, ps, other

    cs.SD cs.HC eess.AS

    Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language

    Authors: Huiyan Li, Haohong Lin, You Wang, Hengyang Wang, Ming Zhang, Han Gao, Qing Ai, Zhiyuan Luo, Guang Li

    Abstract: Silent Speech Decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of Brain-Computer Interface (BCI) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimi… ▽ More

    Submitted 1 June, 2022; v1 submitted 31 July, 2021; originally announced August 2021.

  48. arXiv:2104.01884  [pdf

    eess.SY

    Nodal Frequency Performance of Power Networks

    Authors: Huisheng Gao, Hui Yuan, Huanhai Xin, Linbin Huang, Chaoyou Feng

    Abstract: This paper investigates how a disturbance in the power network affects the nodal frequencies of certain network buses. To begin with, we show that the inertia of a single generator is in inverse proportion to the initial rate of change of frequency (RoCoF) under disturbances. Then, we present how the initial RoCoF of the nodal frequencies are related to the inertia constants of multiple generators… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  49. arXiv:2103.06549  [pdf, other

    eess.IV

    Advanced Geometry Surface Coding for Dynamic Point Cloud Compression

    Authors: Jian Xiong, Hao Gao, Miaohui Wang, Hongliang Li, King Ngi Ngan, Weisi Lin

    Abstract: In video-based dynamic point cloud compression (V-PCC), 3D point clouds are projected onto 2D images for compressing with the existing video codecs. However, the existing video codecs are originally designed for natural visual signals, and it fails to account for the characteristics of point clouds. Thus, there are still problems in the compression of geometry information generated from the point… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  50. arXiv:2102.03540  [pdf, other

    eess.SY cs.RO

    Practical Fractional-Order Variable-Gain Super-Twisting Control with Application to Wafer Stages of Photolithography Systems

    Authors: Zhian Kuang, Liting Sun, Huijun Gao, Masayoshi Tomizuka

    Abstract: In this paper, a practical fractional-order variable-gain super-twisting algorithm (PFVSTA) is proposed to improve the tracking performance of wafer stages for semiconductor manufacturing. Based on the sliding mode control (SMC), the proposed PFVSTA enhances the tracking performance from three aspects: 1) alleviating the chattering phenomenon via super-twisting algorithm and a novel fractional-ord… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

    Comments: This paper has been accepted by IEEE Trans. Mechatronics