Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–20 of 20 results for author: Ju, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  2. arXiv:2404.10643  [pdf, other

    cs.NI eess.SP

    A Calibrated and Automated Simulator for Innovations in 5G

    Authors: Conrado Boeira, Antor Hasan, Khaleda Papry, Yue Ju, Zhongwen Zhu, Israat Haque

    Abstract: The rise of 5G deployments has created the environment for many emerging technologies to flourish. Self-driving vehicles, Augmented and Virtual Reality, and remote operations are examples of applications that leverage 5G networks' support for extremely low latency, high bandwidth, and increased throughput. However, the complex architecture of 5G hinders innovation due to the lack of accessibility… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  3. arXiv:2401.07532  [pdf, other

    cs.SD cs.AI eess.AS

    Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation

    Authors: Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng

    Abstract: Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still re… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  4. arXiv:2309.07293  [pdf

    cs.CV eess.IV

    GAN-based Algorithm for Efficient Image Inpainting

    Authors: Zhengyang Han, Zehao Jiang, Yuan Ju

    Abstract: Global pandemic due to the spread of COVID-19 has post challenges in a new dimension on facial recognition, where people start to wear masks. Under such condition, the authors consider utilizing machine learning in image inpainting to tackle the problem, by complete the possible face that is originally covered in mask. In particular, autoencoder has great potential on retaining important, general… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 6 pages, 3 figures

    MSC Class: 68U10

    Journal ref: The 3rd International Conference on Artificial Intelligence and Computer Engineering(ICAICE 2022)

  5. arXiv:2306.16250  [pdf, other

    cs.SD eess.AS

    MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

    Authors: Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu

    Abstract: The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new effective speaker extraction system with multi-scale interfusion and conditional speaker modulation (ConSM), which is called MC-SpEx. First of all, we d… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by InterSpeech 2023

  6. arXiv:2303.07704  [pdf, other

    eess.AS cs.SD

    TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

    Authors: Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang

    Abstract: This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) struct… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  7. arXiv:2303.06404  [pdf, other

    eess.AS

    Multi-Task Sub-Band Network For Deep Residual Echo Suppression

    Authors: Jiayao Sun, Dawei Luo, Zhaoxia Li, Jindong Li, Yukai Ju, Yang Li

    Abstract: This paper introduces the SWANT team entry to the ICASSP 2023 AEC Challenge. We submit a system that cascades a linear filter with a neural post-filter. Particularly, we adopt sub-band processing to handle full-band signals and shape the network with multi-task learning, where dual signal voice activity detection (DSVAD) and echo estimation are adopted as auxiliary tasks. Moreover, we particularly… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

  8. arXiv:2302.14370  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

    Authors: Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim

    Abstract: While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in cross-lingual TTS. In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and… ▽ More

    Submitted 12 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  9. Personalized local heating neutralizing individual, spatial and temporal thermo-physiological variances in extreme cold environments

    Authors: Yi Ju, Xinyuan Ju, Hui Zhang, Bin Cao, Bin Liu, Yingxin Zhu

    Abstract: In this paper, we investigate the feasibility, robustness and optimization of introducing personal comfort systems (PCS), apparatuses that promises in energy saving and comfort improvement, into a broader range of environments. We report a series of laboratory experiments systematically examining the effect of personalized heating in neutralizing individual, spatial and temporal variations of ther… ▽ More

    Submitted 27 December, 2022; v1 submitted 11 December, 2022; originally announced December 2022.

    Journal ref: Building and Environment, 109950 (2022)

  10. Robo-Chargers: Optimal Operation and Planning of a Robotic Charging System to Alleviate Overstay

    Authors: Yi Ju, Teng Zeng, Zaid Allybokus, Scott Moura

    Abstract: Charging infrastructure availability is a major concern for plug-in electric vehicle users. Nowadays, the limited public chargers are commonly occupied by vehicles which have already been fully charged. Such phenomenon, known as overstay, hinders other vehicles' accessibility to charging resources. In this paper, we analyze a charging facility innovation to tackle the challenge of overstay, levera… ▽ More

    Submitted 18 June, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Journal ref: IEEE Transactions on Smart Grid

  11. arXiv:2210.15853  [pdf, other

    cs.SD eess.AS

    Speech Enhancement with Intelligent Neural Homomorphic Synthesis

    Authors: Shulin He, Wei Rao, Jinjiang Liu, Jun Chen, Yukai Ju, Xueliang Zhang, Yannan Wang, Shidong Shang

    Abstract: Most neural network speech enhancement models ignore speech production mathematical models by directly mapping Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we use homomorphic signal processing and cepstral analysis to obtain noisy speech's excitation and vocal tract. Unlike traditional signal processing, we… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  12. arXiv:2210.15849  [pdf, ps, other

    cs.SD eess.AS

    Hierarchical speaker representation for target speaker extraction

    Authors: Shulin He, Huaiwen Zhang, Wei Rao, Kanghao Zhang, Yukai Ju, Yang Yang, Xueliang Zhang

    Abstract: Target speaker extraction aims to isolate a specific speaker's voice from a composite of multiple sound sources, guided by an enrollment utterance or called anchor. Current methods predominantly derive speaker embeddings from the anchor and integrate them into the separation network to separate the voice of the target speaker. However, the representation of the speaker embedding is too simplistic,… ▽ More

    Submitted 4 January, 2024; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted to ICASSP 2024

  13. arXiv:2210.03027  [pdf, other

    cs.SD eess.AS

    AnimeTAB: A new guitar tablature dataset of anime and game music

    Authors: Yuecheng Zhou, Yaolong Ju, Lingyun Xie

    Abstract: While guitar tablature has become a popular topic in MIR research, there exists no such a guitar tablature dataset that focuses on the soundtracks of anime and video games, which have a surprisingly broad and growing audience among the youths. In this paper, we present AnimeTAB, a fingerstyle guitar tablature dataset in MusicXML format, which provides more high-quality guitar tablature for both re… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  14. arXiv:2209.12565  [pdf, other

    eess.SY

    An Efficient Implementation for Spatial-Temporal Gaussian Process Regression and Its Applications

    Authors: Junpeng Zhang, Yue Ju, Biqiang Mu, Renxin Zhong, Tianshi Chen

    Abstract: Spatial-temporal Gaussian process regression is a popular method for spatial-temporal data modeling. Its state-of-art implementation is based on the state-space model realization of the spatial-temporal Gaussian process and its corresponding Kalman filter and smoother, and has computational complexity $\mathcal{O}(NM^3)$, where $N$ and $M$ are the number of time instants and spatial input location… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  15. arXiv:2209.12231  [pdf, other

    eess.SY math.ST

    Asymptotic Theory for Regularized System Identification Part I: Empirical Bayes Hyper-parameter Estimator

    Authors: Yue Ju, Biqiang Mu, Lennart Ljung, Tianshi Chen

    Abstract: Regularized system identification is the major advance in system identification in the last decade. Although many promising results have been achieved, it is far from complete and there are still many key problems to be solved. One of them is the asymptotic theory, which is about convergence properties of the model estimators as the sample size goes to infinity. The existing related results for re… ▽ More

    Submitted 4 April, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

  16. arXiv:2205.15195  [pdf, other

    cs.SD eess.AS

    Personalized Acoustic Echo Cancellation for Full-duplex Communications

    Authors: Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie

    Abstract: Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background n… ▽ More

    Submitted 29 June, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: submitted to INTERSPEECH 22

  17. arXiv:2112.10319  [pdf, ps, other

    math.ST eess.SY

    Tutorial on Asymptotic Properties of Regularized Least Squares Estimator for Finite Impulse Response Model

    Authors: Yue Ju, Tianshi Chen, Biqiang Mu, Lennart Ljung

    Abstract: In this paper, we give a tutorial on asymptotic properties of the Least Square (LS) and Regularized Least Squares (RLS) estimators for the finite impulse response model with filtered white noise inputs. We provide three perspectives: the almost sure convergence, the convergence in distribution and the boundedness in probability. On one hand, these properties deepen our understanding of the LS and… ▽ More

    Submitted 30 December, 2021; v1 submitted 19 December, 2021; originally announced December 2021.

  18. arXiv:2110.07840  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet2-TTS: Extending the Edge of TTS Research

    Authors: Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance T… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP2022. Demo HP: https://espnet.github.io/icassp2022-tts/

  19. arXiv:2003.13435  [pdf, other

    math.ST eess.SY

    Supplementary Material for CDC Submission No. 1461

    Authors: Yue Ju, Tianshi Chen, Biqiang Mu, Lennart Ljung

    Abstract: In this paper, we focus on the influences of the condition number of the regression matrix upon the comparison between two hyper-parameter estimation methods: the empirical Bayes (EB) and the Stein's unbiased estimator with respect to the mean square error (MSE) related to output prediction (SUREy). We firstly show that the greatest power of the condition number of the regression matrix of SUREy c… ▽ More

    Submitted 21 April, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

  20. arXiv:1906.11330  [pdf, ps, other

    eess.SP

    Sparsity-Assisted Signal Denoising and Pattern Recognition in Time-Series Data

    Authors: G. V. Prateek, Yo-El Ju, Arye Nehorai

    Abstract: We address the problem of signal denoising and pattern recognition in processing batch-mode time-series data by combining linear time-invariant filters, orthogonal multiresolution representations, and sparsity-based methods. We propose a novel approach to designing higher-order zero-phase low-pass, high-pass, and band-pass infinite impulse response filters as matrices, using spectral transformatio… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

    Comments: 22 pages, 16 figures, submitted to IEEE Transactions on Signal Processing