Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 102 results for author: Tang, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.19224  [pdf, other

    cs.SD cs.MM eess.AS

    RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

    Authors: Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu

    Abstract: While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts ac… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by MM 2024

  2. arXiv:2407.15903  [pdf, other

    eess.IV

    Semantics Guided Disentangled GAN for Chest X-ray Image Rib Segmentation

    Authors: Lili Huang, Dexin Ma, Xiaowei Zhao, Chenglong Li, Haifeng Zhao, Jin Tang, Chuanfu Li

    Abstract: The label annotations for chest X-ray image rib segmentation are time consuming and laborious, and the labeling quality heavily relies on medical knowledge of annotators. To reduce the dependency on annotated data, existing works often utilize generative adversarial network (GAN) to generate training data. However, GAN-based methods overlook the nuanced information specific to individual organs, w… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.14553  [pdf, other

    physics.comp-ph eess.IV

    Machine Learning for Improved Current Density Reconstruction from 2D Vector Magnetic Images

    Authors: Niko R. Reed, Danyal Bhutto, Matthew J. Turner, Declan M. Daly, Sean M. Oliver, Jiashen Tang, Kevin S. Olsson, Nicholas Langellier, Mark J. H. Ku, Matthew S. Rosen, Ronald L. Walsworth

    Abstract: The reconstruction of electrical current densities from magnetic field measurements is an important technique with applications in materials science, circuit design, quality control, plasma physics, and biology. Analytic reconstruction methods exist for planar currents, but break down in the presence of high spatial frequency noise or large standoff distance, restricting the types of systems that… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 17 pages, 10 figures. Includes Supplemental Information

  4. arXiv:2406.13979  [pdf, other

    eess.IV cs.CV cs.LG

    Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning

    Authors: Yupei Zhang, Xiaofei Wang, Fangliangzi Meng, Jin Tang, Chao Li

    Abstract: Multi-modal learning plays a crucial role in cancer diagnosis and prognosis. Current deep learning based multi-modal approaches are often limited by their abilities to model the complex correlations between genomics and histology data, addressing the intrinsic complexity of tumour ecosystem where both tumour and microenvironment contribute to malignancy. We propose a biologically interpretative an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2405.19685  [pdf

    eess.IV

    Identifying Functional Brain Networks of Spatiotemporal Wide-Field Calcium Imaging Data via a Long Short-Term Memory Autoencoder

    Authors: Xiaohui Zhang, Eric C Landsness, Lindsey M Brier, Wei Chen, Michelle J. Tang, Hanyang Miao, Jin-Moo Lee, Mark A. Anastasio, Joseph P. Culver

    Abstract: Wide-field calcium imaging (WFCI) that records neural calcium dynamics allows for identification of functional brain networks (FBNs) in mice that express genetically encoded calcium indicators. Estimating FBNs from WFCI data is commonly achieved by use of seed-based correlation (SBC) analysis and independent component analysis (ICA). These two methods are conceptually distinct and each possesses l… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  6. arXiv:2404.17926  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Pre-training on High Definition X-ray Images: An Experimental Study

    Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang

    Abstract: Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficul… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Technology Report

  7. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  8. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  9. arXiv:2404.08366  [pdf, other

    eess.SP

    Intelligent Reflecting Surface-Enabled Anti-Detection for Secure Sensing and Communications

    Authors: Beixiong Zheng, Xue Xiong, Tiantian Ma, Jie Tang, Derrick Wing Kwan Ng, A. Lee Swindlehurst, Rui Zhang

    Abstract: The ever-increasing reliance on wireless communication and sensing has led to growing concerns over the vulnerability of sensitive information to unauthorized detection and interception. Traditional anti-detection methods are often inadequate, suffering from limited adaptability and diminished effectiveness against advanced detection technologies. To overcome these challenges, this article present… ▽ More

    Submitted 21 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures

  10. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  11. arXiv:2403.19996  [pdf, other

    cs.LG eess.SP

    DeepHeteroIoT: Deep Local and Global Learning over Heterogeneous IoT Sensor Data

    Authors: Muhammad Sakib Khan Inan, Kewen Liao, Haifeng Shen, Prem Prakash Jayaraman, Dimitrios Georgakopoulos, Ming Jian Tang

    Abstract: Internet of Things (IoT) sensor data or readings evince variations in timestamp range, sampling frequency, geographical location, unit of measurement, etc. Such presented sequence data heterogeneity makes it difficult for traditional time series classification algorithms to perform well. Therefore, addressing the heterogeneity challenge demands learning not only the sub-patterns (local features) b… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted for Publication and Presented in EAI MobiQuitous 2023 - 20th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

  12. arXiv:2403.12352  [pdf, other

    eess.SP cs.IT

    A New Intelligent Reflecting Surface-Aided Electromagnetic Stealth Strategy

    Authors: Xue Xiong, Beixiong Zheng, A. Lee Swindlehurst, Jie Tang, Wen Wu

    Abstract: Electromagnetic wave absorbing material (EWAM) plays an essential role in manufacturing stealth aircraft, which can achieve the electromagnetic stealth (ES) by reducing the strength of the signal reflected back to the radar system. However, the stealth performance is limited by the coating thickness, incident wave angles, and working frequencies. To tackle these limitations, we propose a new intel… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 5 pages, 4 figures

  13. arXiv:2403.07317  [pdf, other

    eess.SY

    GMPC: Geometric Model Predictive Control for Wheeled Mobile Robot Trajectory Tracking

    Authors: Jiawei Tang, Shuang Wu, Bo Lan, Yahui Dong, Yuqiang Jin, Guangjian Tian, Wen-An Zhang, Ling Shi

    Abstract: The configuration of most robotic systems lies in continuous transformation groups. However, in mobile robot trajectory tracking, many recent works still naively utilize optimization methods for elements in vector space without considering the manifold constraint of the robot configuration. In this letter, we propose a geometric model predictive control (MPC) framework for wheeled mobile robot tra… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  14. arXiv:2403.06167  [pdf, other

    eess.SY

    Direct Shooting Method for Numerical Optimal Control: A Modified Transcription Approach

    Authors: Jiawei Tang, Yuxing Zhong, Pengyu Wang, Xingzhou Chen, Shuang Wu, Ling Shi

    Abstract: Direct shooting is an efficient method to solve numerical optimal control. It utilizes the Runge-Kutta scheme to discretize a continuous-time optimal control problem making the problem solvable by nonlinear programming solvers. However, conventional direct shooting raises a contradictory dynamics issue when using an augmented state to handle {high-order} systems. This paper fills the research gap… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by ECC24

  15. arXiv:2403.04269  [pdf, other

    cs.IT eess.SP

    Secure MIMO Communication Relying on Movable Antennas

    Authors: Jun Tang, Cunhua Pan, Yang Zhang, Hong Ren, Kezhi Wang

    Abstract: This paper considers a movable antenna (MA)-aided secure multiple-input multiple-output (MIMO) communication system consisting of a base station (BS), a legitimate information receiver (IR) and an eavesdropper (Eve), where the BS is equipped with MAs to enhance the system's physical layer security (PLS). Specifically, we aim to maximize the secrecy rate (SR) by jointly optimizing the transmit prec… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  16. arXiv:2401.14130  [pdf, other

    eess.IV cs.CV cs.LG

    Attention-based Efficient Classification for 3D MRI Image of Alzheimer's Disease

    Authors: Yihao Lin, Ximeng Li, Yan Zhang, Jinshan Tang

    Abstract: Early diagnosis of Alzheimer Diagnostics (AD) is a challenging task due to its subtle and complex clinical symptoms. Deep learning-assisted medical diagnosis using image recognition techniques has become an important research topic in this field. The features have to accurately capture main variations of anatomical brain structures. However, time-consuming is expensive for feature extraction by de… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  17. arXiv:2401.08835  [pdf, other

    cs.CL eess.AS

    Improving ASR Contextual Biasing with Guided Attention

    Authors: Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe

    Abstract: In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the effectiveness and robustness of automatic speech recognition (ASR) contextual biasing without introducing additional parameters. A common challenge in previous literature is that the word error rate (WER) reduction brought by contextual biasing diminishes as the number of bias phrases increases. To addres… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  18. arXiv:2401.04976  [pdf, other

    eess.AS cs.SD

    Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

    Authors: Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, Jin Tang

    Abstract: Recently, 2D convolution has been found unqualified in sound event detection (SED). It enforces translation equivariance on sound events along frequency axis, which is not a shift-invariant dimension. To address this issue, dynamic convolution is used to model the frequency dependency of sound events. In this paper, we proposed the first full-dynamic method named \emph{full-frequency dynamic convo… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 6 pages, 4 figures, submitted to ICME2024

  19. arXiv:2401.03664  [pdf

    eess.IV cs.CV cs.LG

    Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

    Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

    Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  20. arXiv:2401.03078  [pdf, other

    eess.AS cs.LG cs.SD

    StreamVC: Real-Time Low-Latency Voice Conversion

    Authors: Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

    Abstract: We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing,… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  21. arXiv:2312.05930  [pdf, other

    eess.IV cs.CV cs.LG

    A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

    Authors: Linxi Zhao, Jiankai Tang, Dongyu Chen, Xiaohong Liu, Yong Zhou, Yuanchun Shi, Guangyu Wang, Yuntao Wang

    Abstract: Nailfold capillaroscopy is widely used in assessing health conditions, highlighting the pressing need for an automated nailfold capillary analysis system. In this study, we present a pioneering effort in constructing a comprehensive nailfold capillary dataset-321 images, 219 videos from 68 subjects, with clinic reports and expert annotations-that serves as a crucial resource for training deep-lear… ▽ More

    Submitted 14 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Dataset, code, pretrained models: https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillary

  22. arXiv:2312.01940  [pdf, ps, other

    eess.SP

    Intelligent Reflecting Surface-Aided Electromagnetic Stealth Against Radar Detection

    Authors: Beixiong Zheng, Xue Xiong, Jie Tang, Rui Zhang

    Abstract: While traditional electromagnetic stealth materials/metasurfaces can render a target virtually invisible to some extent, they lack flexibility and adaptability, and can only operate within a limited frequency and angle/direction range, making it challenging to ensure the expected stealth performance. In view of this, we propose in this paper a new intelligent reflecting surface (IRS)-aided electro… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 13 pages (double-column), 10 figures, submitted in October

  23. arXiv:2312.01662  [pdf

    cond-mat.mes-hall cs.LG eess.IV eess.SY

    Universal Deoxidation of Semiconductor Substrates Assisted by Machine-Learning and Real-Time-Feedback-Control

    Authors: Chao Shen, Wenkang Zhan, Jian Tang, Zhaofeng Wu, Bo Xu, Chao Zhao, Zhanguo Wang

    Abstract: Thin film deposition is an essential step in the semiconductor process. During preparation or loading, the substrate is exposed to the air unavoidably, which has motivated studies of the process control to remove the surface oxide before thin film deposition. Optimizing the deoxidation process in molecular beam epitaxy (MBE) for a random substrate is a multidimensional challenge and sometimes cont… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 5 figures

  24. arXiv:2310.17171  [pdf, other

    eess.SY cs.SI math.DS math.OC

    Estimating True Beliefs in Opinion Dynamics with Social Pressure

    Authors: Jennifer Tang, Aviv Adler, Amir Ajorlou, Ali Jadbabaie

    Abstract: Social networks often exert social pressure, causing individuals to adapt their expressed opinions to conform to their peers. An agent in such systems can be modeled as having a (true and unchanging) inherent belief while broadcasting a declared opinion at each time step based on her inherent belief and the past declared opinions of her neighbors. An important question in this setting is parameter… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  25. arXiv:2310.14044  [pdf, other

    cs.SD cs.AI eess.AS

    Composer Style-specific Symbolic Music Generation Using Vector Quantized Discrete Diffusion Models

    Authors: Jincheng Zhang, Jingjing Tang, Charalampos Saitis, György Fazekas

    Abstract: Emerging Denoising Diffusion Probabilistic Models (DDPM) have become increasingly utilised because of promising results they have achieved in diverse generative tasks with continuous data, such as image and sound synthesis. Nonetheless, the success of diffusion models has not been fully extended to discrete symbolic music. We propose to combine a vector quantized variational autoencoder (VQ-VAE) a… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  26. arXiv:2310.07649  [pdf, other

    cs.RO eess.SY

    Automated Layout Design and Control of Robust Cooperative Grasped-Load Aerial Transportation Systems

    Authors: Carlo Bosio, Jerry Tang, Ting-Hao Wang, Mark W. Mueller

    Abstract: We present a novel approach to cooperative aerial transportation through a team of drones, using optimal control theory and a hierarchical control strategy. We assume the drones are connected to the payload through rigid attachments, essentially transforming the whole system into a larger flying object with "thrust modules" at the attachment locations of the drones. We investigate the optimal arra… ▽ More

    Submitted 28 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 7 pages, 7 figures, conference paper

  27. arXiv:2310.00699  [pdf, other

    cs.SD eess.AS

    Pianist Identification Using Convolutional Neural Networks

    Authors: Jingjing Tang, Geraint Wiggins, Gyorgy Fazekas

    Abstract: This paper presents a comprehensive study of automatic performer identification in expressive piano performances using convolutional neural networks (CNNs) and expressive features. Our work addresses the challenging multi-class classification task of identifying virtuoso pianists, which has substantial implications for building dynamic musical instruments with intelligence and smart musical system… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: 6 pages, 3 figures, accepted by the 4th International Symposium on the Internet of Sounds, IS2 2023

  28. arXiv:2309.14367  [pdf, other

    eess.IV

    Design of Novel Loss Functions for Deep Learning in X-ray CT

    Authors: Obaidullah Rahman, Ken D. Sauer, Madhuri Nagare, Charles A. Bouman, Roman Melnyk, Jie Tang, Brian Nett

    Abstract: Deep learning (DL) shows promise of advantages over conventional signal processing techniques in a variety of imaging applications. The networks' being trained from examples of data rather than explicitly designed allows them to learn signal and noise characteristics to most effectively construct a mapping from corrupted data to higher quality representations. In inverse problems, one has options… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  29. arXiv:2309.13399  [pdf, other

    eess.IV

    MBIR Training for a 2.5D DL network in X-ray CT

    Authors: Obaidullah Rahman, Madhuri Nagare, Ken D. Sauer, Charles A. Bouman, Roman Melnyk, Brian Nett, Jie Tang

    Abstract: In computed tomographic imaging, model based iterative reconstruction methods have generally shown better image quality than the more traditional, faster filtered backprojection technique. The cost we have to pay is that MBIR is computationally expensive. In this work we train a 2.5D deep learning (DL) network to mimic MBIR quality image. The network is realized by a modified Unet, and trained usi… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  30. arXiv:2308.12481  [pdf

    eess.SP cs.LG

    Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices

    Authors: Hannah Zhou, Allison Chen, Celine Buer, Emily Chen, Kayleen Tang, Lauryn Gong, Zhiqi Liu, Jianbin Tang

    Abstract: This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors i… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 4 pages

  31. arXiv:2308.09275  [pdf, other

    eess.SY cs.SI math.DS math.OC

    Stochastic Opinion Dynamics under Social Pressure in Arbitrary Networks

    Authors: Jennifer Tang, Aviv Adler, Amir Ajorlou, Ali Jadbabaie

    Abstract: Social pressure is a key factor affecting the evolution of opinions on networks in many types of settings, pushing people to conform to their neighbors' opinions. To study this, the interacting Polya urn model was introduced by Jadbabaie et al., in which each agent has two kinds of opinion: inherent beliefs, which are hidden from the other agents and fixed; and declared opinions, which are randoml… ▽ More

    Submitted 25 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: fixed typos

  32. arXiv:2307.16426  [pdf, other

    eess.IV cs.AI cs.CV

    High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation

    Authors: Jiaqi Tang, Xiaogang Xu, Sixing Hu, Ying-Cong Chen

    Abstract: Due to limited camera capacities, digital images usually have a narrower dynamic illumination range than real-world scene radiance. To resolve this problem, High Dynamic Range (HDR) reconstruction is proposed to recover the dynamic range to better represent real-world scenes. However, due to different physical imaging parameters, the tone-mapping functions between images and real radiance are high… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  33. arXiv:2307.09670  [pdf, other

    cs.SD cs.LG eess.AS

    JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting

    Authors: Eleanor Row, Jingjing Tang, George Fazekas

    Abstract: Jazz pianists often uniquely interpret jazz standards. Passages from these interpretations can be viewed as sections of variation. We manually extracted such variations from solo jazz piano performances. The JAZZVAR dataset is a collection of 502 pairs of Variation and Original MIDI segments. Each Variation in the dataset is accompanied by a corresponding Original segment containing the melody and… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Pre-print accepted for publication at CMMR2023, 12 pages, 4 figures

  34. arXiv:2306.12898  [pdf

    cond-mat.mes-hall cs.LG eess.IV

    Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots

    Authors: Chao Shen, Wenkang Zhan, Kaiyao Xin, Manyang Li, Zhenyu Sun, Hui Cong, Chi Xu, Jian Tang, Zhaofeng Wu, Bo Xu, Zhongming Wei, Chunlai Xue, Chao Zhao, Zhanguo Wang

    Abstract: Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for developing various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam… ▽ More

    Submitted 11 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: 5 figures

  35. arXiv:2306.07505  [pdf

    q-bio.TO eess.IV

    Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

    Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

    Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  36. arXiv:2306.06040  [pdf, other

    cs.SD cs.LG eess.AS

    Reconstructing Human Expressiveness in Piano Performances with a Transformer Network

    Authors: Jingjing Tang, Geraint Wiggins, Gyorgy Fazekas

    Abstract: Capturing intricate and subtle variations in human expressiveness in music performance using computational approaches is challenging. In this paper, we propose a novel approach for reconstructing human expressiveness in piano performance with a multi-layer bi-directional Transformer encoder. To address the needs for large amounts of accurately captured and score-aligned performance data in trainin… ▽ More

    Submitted 1 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 12 pages, 5 figures, accepted by CMMR2023, the 16th International Symposium on Computer Music Multidisciplinary Research

  37. arXiv:2305.13331  [pdf, ps, other

    eess.AS cs.CL

    A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

    Authors: Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney

    Abstract: Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously. Our… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023. Code: https://github.com/espnet/espnet

  38. arXiv:2305.11073  [pdf, other

    cs.CL cs.SD eess.AS

    A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

    Authors: Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

    Abstract: Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023. Code: https://github.com/espnet/espnet

  39. arXiv:2304.11267  [pdf, other

    cs.CV cs.LG eess.IV

    Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

    Authors: Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann

    Abstract: The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, commo… ▽ More

    Submitted 16 June, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 4 pages (not including references), 2 figures, 2 tables. Accepted to Efficient Deep Learning for Computer Vision workshop 2023

  40. arXiv:2304.08345  [pdf, other

    cs.LG cs.CL cs.CV cs.MM eess.AS

    VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

    Authors: Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu

    Abstract: In this paper, we propose a Vision-Audio-Language Omni-peRception pretraining model (VALOR) for multi-modal understanding and generation. Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner. It contains three separate encoders for single modality representations, and a decoder for multimodal cond… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Preprint version w/o audio files embeded in PDF. Audio embeded version can be found on project page or github

  41. arXiv:2211.16791  [pdf, other

    cs.CV cs.LG eess.IV

    Adaptive adversarial training method for improving multi-scale GAN based on generalization bound theory

    Authors: Jing Tang, Bo Tao, Zeyu Gong, Zhouping Yin

    Abstract: In recent years, multi-scale generative adversarial networks (GANs) have been proposed to build generalized image processing models based on single sample. Constraining on the sample size, multi-scale GANs have much difficulty converging to the global optimum, which ultimately leads to limitations in their capabilities. In this paper, we pioneered the introduction of PAC-Bayes generalized bound th… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

  42. arXiv:2211.14313  [pdf, other

    eess.IV cs.AI cs.CY

    AICOM-MP: an AI-based Monkeypox Detector for Resource-Constrained Environments

    Authors: Tim Tianyi Yang, Tom Tianze Yang, Andrew Liu, Jie Tang, Na An, Shaoshan Liu, Xue Liu

    Abstract: Under the Autonomous Mobile Clinics (AMCs) initiative, we are developing, open sourcing, and standardizing health AI technologies to enable healthcare access in least developed countries (LDCs). We deem AMCs as the next generation of health care delivery platforms, whereas health AI engines are applications on these platforms, similar to how various applications expand the usage scenarios of smart… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  43. arXiv:2211.10287  [pdf, other

    eess.IV

    Generative Model Based Highly Efficient Semantic Communication Approach for Image Transmission

    Authors: Tianxiao Han, Jiancheng Tang, Qianqian Yang, Yiping Duan, Zhaoyang Zhang, Zhiguo Shi

    Abstract: Deep learning (DL) based semantic communication methods have been explored to transmit images efficiently in recent years. In this paper, we propose a generative model based semantic communication to further improve the efficiency of image transmission and protect private information. In particular, the transmitter extracts the interpretable latent representation from the original image by a gener… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: submitted to ICASSP 2023

  44. arXiv:2210.06298  [pdf, other

    eess.SP cs.AI cs.LG

    Cross Task Neural Architecture Search for EEG Signal Classifications

    Authors: Yiqun Duan, Zhen Wang, Yi Li, Jianhang Tang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: Electroencephalograms (EEGs) are brain dynamics measured outside the brain, which have been widely utilized in non-invasive brain-computer interface applications. Recently, various neural network approaches have been proposed to improve the accuracy of EEG signal recognition. However, these approaches severely rely on manually designed network structures for different tasks which generally are not… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  45. Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals

    Authors: Alejandro Lancho, Amir Weiss, Gary C. F. Lee, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separati… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: 9 pages, 6 figures, accepted at IEEE GLOBECOM 2022 (this version contains extended proofs)

  46. arXiv:2208.14784  [pdf, other

    eess.IV cs.CV cs.LG math.OC

    Accelerating Deep Unrolling Networks via Dimensionality Reduction

    Authors: Junqi Tang, Subhadip Mukherjee, Carola-Bibiane Schönlieb

    Abstract: In this work we propose a new paradigm for designing efficient deep unrolling networks using dimensionality reduction schemes, including minibatch gradient approximation and operator sketching. The deep unrolling networks are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially X-ray CT and MRI imaging, the deep unrolling sc… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

  47. arXiv:2208.11184  [pdf, other

    eess.IV cs.CV

    AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

    Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, Jingzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

    Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More

    Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Camera-ready version

  48. arXiv:2208.10933  [pdf

    cond-mat.mtrl-sci eess.SY

    Large-Scale Integrated Flexible Tactile Sensor Array for Sensitive Smart Robotic Touch

    Authors: Zhenxuan Zhao, Jianshi Tang, Jian Yuan, Yijun Li, Yuan Dai, Jian Yao, Qingtian Zhang, Sanchuan Ding, Tingyu Li, Ruirui Zhang, Yu Zheng, Zhengyou Zhang, Song Qiu, Qingwen Li, Bin Gao, Ning Deng, He Qian, Fei Xing, Zheng You, Huaqiang Wu

    Abstract: In the long pursuit of smart robotics, it has been envisioned to empower robots with human-like senses, especially vision and touch. While tremendous progress has been made in image sensors and computer vision over the past decades, the tactile sense abilities are lagging behind due to the lack of large-scale flexible tactile sensor array with high sensitivity, high spatial resolution, and fast re… ▽ More

    Submitted 3 November, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Correction in Methods: The weight ratio of TPU:DMF was set to be 1:5

    Journal ref: ACS Nano 2022, 16, 16784

  49. Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation

    Authors: Gary C. F. Lee, Amir Weiss, Alejandro Lancho, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian cons… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  50. arXiv:2208.01631  [pdf, ps, other

    math.OC cs.LG eess.IV

    Stochastic Primal-Dual Three Operator Splitting with Arbitrary Sampling and Preconditioning

    Authors: Junqi Tang, Matthias Ehrhardt, Carola-Bibiane Schönlieb

    Abstract: In this work we propose a stochastic primal-dual preconditioned three-operator splitting algorithm for solving a class of convex three-composite optimization problems. Our proposed scheme is a direct three-operator splitting extension of the SPDHG algorithm [Chambolle et al. 2018]. We provide theoretical convergence analysis showing ergodic O(1/K) convergence rate, and demonstrate the effectivenes… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.