Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 138 results for author: Yang, C H

.
  1. arXiv:2407.16370  [pdf, other

    cs.CL cs.SD eess.AS

    Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

    Authors: Rithik Sachdev, Zhong-Qiu Wang, Chao-Han Huck Yang

    Abstract: Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: in submission

  2. arXiv:2407.15778  [pdf, other

    cond-mat.mes-hall quant-ph

    Violating Bell's inequality in gate-defined quantum dots

    Authors: Paul Steinacker, Tuomo Tanttu, Wee Han Lim, Nard Dumoulin Stuyck, MengKe Feng, Santiago Serrano, Ensar Vahapoglu, Rocky Y. Su, Jonathan Y. Huang, Cameron Jones, Kohei M. Itoh, Fay E. Hudson, Christopher C. Escott, Andrea Morello, Andre Saraiva, Chih Hwan Yang, Andrew S. Dzurak, Arne Laucht

    Abstract: The superior computational power promised by quantum computers utilises the fundamental quantum mechanical principle of entanglement. However, achieving entanglement and verifying that the generated state does not follow the principle of local causality has proven difficult for spin qubits in gate-defined quantum dots, as it requires simultaneously high concurrence values and readout fidelities to… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 main figures, 9 extended data figures

    MSC Class: 81P68; 81-05

  3. arXiv:2407.15151  [pdf, other

    quant-ph cond-mat.mes-hall

    Spin Qubits with Scalable milli-kelvin CMOS Control

    Authors: Samuel K. Bartee, Will Gilbert, Kun Zuo, Kushal Das, Tuomo Tanttu, Chih Hwan Yang, Nard Dumoulin Stuyck, Sebastian J. Pauka, Rocky Y. Su, Wee Han Lim, Santiago Serrano, Christopher C. Escott, Fay E. Hudson, Kohei M. Itoh, Arne Laucht, Andrew S. Dzurak, David J. Reilly

    Abstract: A key virtue of spin qubits is their sub-micron footprint, enabling a single silicon chip to host the millions of qubits required to execute useful quantum algorithms with error correction. With each physical qubit needing multiple control lines however, a fundamental barrier to scale is the extreme density of connections that bridge quantum devices to their external control and readout hardware.… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  4. arXiv:2407.06103  [pdf, other

    quant-ph

    QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train

    Authors: Chen-Yu Liu, Chu-Hsuan Abraham Lin, Chao-Han Huck Yang, Kuan-Cheng Chen, Min-Hsiu Hsieh

    Abstract: Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 6 pages, 1 figure

  5. arXiv:2406.13912  [pdf, other

    cs.CV

    From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

    Authors: Yusuke Hirota, Ryo Hachiuma, Chao-Han Huck Yang, Yuta Nakashima

    Abstract: Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-form… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2405.14161  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

    Abstract: We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifica… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 23 pages, Preprint

  7. arXiv:2405.06573  [pdf, other

    cs.SD cs.AI eess.AS

    An Investigation of Incorporating Mamba for Speech Enhancement

    Authors: Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao

    Abstract: This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the properties of Mamba by integrating it as the core model in both basic and advanced SE systems, along with utilizing signal-level distances as well as metric… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  8. arXiv:2404.14716  [pdf, other

    cs.CL cs.AI cs.CV cs.SD eess.AS

    Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

    Authors: Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

    Abstract: Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update. Despite such convenience, the performance of ICL heavily depends on the quality of the in-context examples presented, which makes the in-context example selection approach a critical choice. This paper proposes a novel Bayes… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 17 pages, 6 figures

  9. arXiv:2402.06894  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

    Abstract: Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the divers… ▽ More

    Submitted 16 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: 18 pages, Accepted by ACL 2024. This work is open sourced at: https://github.com/YUCHEN005/GenTranslate

  10. arXiv:2402.05457  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

    Authors: Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang

    Abstract: Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct mapping from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introd… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license

  11. arXiv:2402.01496  [pdf

    cond-mat.mes-hall

    Constructing 100 MΩ and 1 GΩ Resistance Standards via Star-Mesh Transformations

    Authors: Dean G. Jarrett, Albert F. Rigosi, Dominick S. Scaletta, Ngoc Thanh Mai Tran, Heather M. Hill, Alireza R. Panna, Cheng Hsueh Yang, Yanfei Yang, Randolph E. Elmquist, David B. Newell

    Abstract: A recent mathematical framework for optimizing resistor networks to achieve values in the MΩ through GΩ levels was employed for two specific cases. Objectives here include proof of concept and identification of possible apparatus limitations for future experiments involving graphene-based quantum Hall array resistance standards. Using fractal-like, or recursive, features of the framework allows on… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  12. arXiv:2401.10447  [pdf, other

    cs.CL cs.AI cs.LG cs.NE cs.SD eess.AS

    Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

    Authors: Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

    Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dat… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  13. arXiv:2401.10446  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by e… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, Spotlight top 5%, 24 pages. This work will be open sourced at: https://github.com/YUCHEN005/RobustGER under MIT license

  14. arXiv:2312.15316  [pdf, other

    cs.CL eess.AS

    Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

    Authors: Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-yi Lee, Ivan Bulyko

    Abstract: Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore pro… ▽ More

    Submitted 17 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024. Camera-ready version

  15. arXiv:2312.14378  [pdf, other

    cs.LG cs.SD eess.AS

    Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

    Authors: Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

    Abstract: Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowle… ▽ More

    Submitted 9 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 5 pages, 1 figure, ICASSP 2024 Workshop on Self-supervision in Audio, Speech and Beyond

  16. arXiv:2311.12159  [pdf, other

    cs.CV cs.AI cs.IR cs.LG cs.MM

    Conditional Modeling Based Automatic Video Summarization

    Authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring

    Abstract: The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods mainly rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video. There are other non-visual factors, such as interestingness, representativeness,… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: This work has been submitted to the IEEE for possible publication. arXiv admin note: substantial text overlap with arXiv:2305.00455

  17. arXiv:2311.09567  [pdf, other

    cond-mat.mes-hall quant-ph

    Entangling gates on degenerate spin qubits dressed by a global field

    Authors: Ingvild Hansen, Amanda E. Seedhouse, Santiago Serrano, Andreas Nickl, MengKe Feng, Jonathan Y. Huang, Tuomo Tanttu, Nard Dumoulin Stuyck, Wee Han Lim, Fay E. Hudson, Kohei M. Itoh, Andre Saraiva, Arne Laucht, Andrew S. Dzurak, Chih Hwan Yang

    Abstract: Coherently dressed spins have shown promising results as building blocks for future quantum computers owing to their resilience to environmental noise and their compatibility with global control fields. This mode of operation allows for more amenable qubit architecture requirements and simplifies signal routing on the chip. However, multi-qubit operations, such as qubit addressability and two-qubi… ▽ More

    Submitted 30 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

  18. arXiv:2310.13013  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Generative error correction for code-switching speech recognition using large language models

    Authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng

    Abstract: Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), CS-ASR is still a challenging task ought to the grammatical structure complexity of the phenomenon and the data scarcity of specific training corpus. In this work, we propose to leverage large language models (LLMs) and lis… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP2024

  19. arXiv:2310.06434  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

    Authors: Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

    Abstract: We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts. This marks a step towards a fresh paradigm in generative error correction within the realm of n-best hypotheses. Unlike the exis… ▽ More

    Submitted 16 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 as main paper. 10 pages. Revised math notations. GitHub: https://github.com/Srijith-rkr/Whispering-LLaMA

  20. arXiv:2309.15701  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

    Authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

    Abstract: Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e.g., background noise. Intuit… ▽ More

    Submitted 16 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted to NeurIPS 2023, 24 pages. Datasets and Benchmarks Track. Added the first Mandarin and code-switching (zh-cn and en-us) results from the LLM-based generative ASR error correction to Table 8 on Page 21

  21. arXiv:2309.15649  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

    Authors: Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

    Abstract: We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines caus… ▽ More

    Submitted 10 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2023. 8 pages. 2nd version revised from Sep 29th's version

    Journal ref: Proc. IEEE ASRU Workshop, Dec. 2023

  22. arXiv:2309.15463  [pdf, other

    quant-ph cond-mat.mes-hall

    Tomography of entangling two-qubit logic operations in exchange-coupled donor electron spin qubits

    Authors: Holly G. Stemp, Serwan Asaad, Mark R. van Blankenstein, Arjen Vaartjes, Mark A. I. Johnson, Mateusz T. Mądzik, Amber J. A. Heskes, Hannes R. Firgau, Rocky Y. Su, Chih Hwan Yang, Arne Laucht, Corey I. Ostrove, Kenneth M. Rudinger, Kevin Young, Robin Blume-Kohout, Fay E. Hudson, Andrew S. Dzurak, Kohei M. Itoh, Alexander M. Jakob, Brett C. Johnson, David N. Jamieson, Andrea Morello

    Abstract: Scalable quantum processors require high-fidelity universal quantum logic operations in a manufacturable physical platform. Donors in silicon provide atomic size, excellent quantum coherence and compatibility with standard semiconductor processing, but no entanglement between donor-bound electron spins has been demonstrated to date. Here we present the experimental demonstration and tomography of… ▽ More

    Submitted 2 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

  23. arXiv:2309.15223  [pdf, other

    cs.CL cs.AI cs.LG cs.NE cs.SD eess.AS

    Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

    Authors: Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko

    Abstract: We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we p… ▽ More

    Submitted 10 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 pages

    Journal ref: Proc. IEEE ASRU Workshop, Dec. 2023

  24. arXiv:2309.12542  [pdf, other

    quant-ph cond-mat.mes-hall

    Spatio-temporal correlations of noise in MOS spin qubits

    Authors: Amanda E. Seedhouse, Nard Dumoulin Stuyck, Santiago Serrano, Tuomo Tanttu, Will Gilbert, Jonathan Yue Huang, Fay E. Hudson, Kohei M. Itoh, Arne Laucht, Wee Han Lim, Chih Hwan Yang, Andrew S. Dzurak, Andre Saraiva

    Abstract: In quantum computing, characterising the full noise profile of qubits can aid the efforts towards increasing coherence times and fidelities by creating error mitigating techniques specific to the type of noise in the system, or by completely removing the sources of noise. Spin qubits in MOS quantum dots are exposed to noise originated from the complex glassy behaviour of two-level fluctuators, lea… ▽ More

    Submitted 24 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: updated reference

  25. arXiv:2309.12541  [pdf, other

    quant-ph cond-mat.mes-hall

    Real-time feedback protocols for optimizing fault-tolerant two-qubit gate fidelities in a silicon spin system

    Authors: Nard Dumoulin Stuyck, Amanda E. Seedhouse, Santiago Serrano, Tuomo Tanttu, Will Gilbert, Jonathan Yue Huang, Fay Hudson, Kohei M. Itoh, Arne Laucht, Wee Han Lim, Chih Hwan Yang, Andre Saraiva, Andrew S. Dzurak

    Abstract: Recently, several groups have demonstrated two-qubit gate fidelities in semiconductor spin qubit systems above 99%. Achieving this regime of fault-tolerant compatible high fidelities is nontrivial and requires exquisite stability and precise control over the different qubit parameters over an extended period of time. This can be done by efficiently calibrating qubit control parameters against diff… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  26. arXiv:2309.07081  [pdf, other

    eess.AS cs.CL cs.SD

    Can Whisper perform speech-based in-context learning?

    Authors: Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

    Abstract: This paper investigates the in-context learning abilities of the Whisper automatic speech recognition (ASR) models released by OpenAI. A novel speech-based in-context learning (SICL) approach is proposed for test-time adaptation, which can reduce the word error rates (WERs) with only a small number of labelled speech samples without gradient descent. Language-level adaptation experiments using Chi… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  27. arXiv:2309.01849  [pdf, other

    cond-mat.mes-hall quant-ph

    Impact of electrostatic crosstalk on spin qubits in dense CMOS quantum dot arrays

    Authors: Jesus D. Cifuentes, Tuomo Tanttu, Paul Steinacker, Santiago Serrano, Ingvild Hansen, James P. Slack-Smith, Will Gilbert, Jonathan Y. Huang, Ensar Vahapoglu, Ross C. C. Leon, Nard Dumoulin Stuyck, Kohei Itoh, Nikolay Abrosimov, Hans-Joachim Pohl, Michael Thewalt, Arne Laucht, Chih Hwan Yang, Christopher C. Escott, Fay E. Hudson, Wee Han Lim, Rajib Rahman, Andrew S. Dzurak, Andre Saraiva

    Abstract: Quantum processors based on integrated nanoscale silicon spin qubits are a promising platform for highly scalable quantum computation. Current CMOS spin qubit processors consist of dense gate arrays to define the quantum dots, making them susceptible to crosstalk from capacitive coupling between a dot and its neighbouring gates. Small but sizeable spin-orbit interactions can transfer this electros… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 9 pages, 4 figures

  28. High-fidelity operation and algorithmic initialisation of spin qubits above one kelvin

    Authors: Jonathan Y. Huang, Rocky Y. Su, Wee Han Lim, MengKe Feng, Barnaby van Straaten, Brandon Severin, Will Gilbert, Nard Dumoulin Stuyck, Tuomo Tanttu, Santiago Serrano, Jesus D. Cifuentes, Ingvild Hansen, Amanda E. Seedhouse, Ensar Vahapoglu, Nikolay V. Abrosimov, Hans-Joachim Pohl, Michael L. W. Thewalt, Fay E. Hudson, Christopher C. Escott, Natalia Ares, Stephen D. Bartlett, Andrea Morello, Andre Saraiva, Arne Laucht, Andrew S. Dzurak , et al. (1 additional authors not shown)

    Abstract: The encoding of qubits in semiconductor spin carriers has been recognised as a promising approach to a commercial quantum computer that can be lithographically produced and integrated at scale. However, the operation of the large number of qubits required for advantageous quantum applications will produce a thermal load exceeding the available cooling power of cryostats at millikelvin temperatures… ▽ More

    Submitted 18 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Journal ref: Nature 627, 772-777 (2024)

  29. arXiv:2307.12452  [pdf, other

    quant-ph cond-mat.mes-hall

    Characterizing non-Markovian Quantum Process by Fast Bayesian Tomography

    Authors: R. Y. Su, J. Y. Huang, N. Dumoulin. Stuyck, M. K. Feng, W. Gilbert, T. J. Evans, W. H. Lim, F. E. Hudson, K. W. Chan, W. Huang, Kohei M. Itoh, R. Harper, S. D. Bartlett, C. H. Yang, A. Laucht, A. Saraiva, T. Tanttu, A. S. Dzurak

    Abstract: To push gate performance to levels beyond the thresholds for quantum error correction, it is important to characterize the error sources occurring on quantum gates. However, the characterization of non-Markovian error poses a challenge to current quantum process tomography techniques. Fast Bayesian Tomography (FBT) is a self-consistent gate set tomography protocol that can be bootstrapped from ear… ▽ More

    Submitted 4 October, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

  30. arXiv:2307.01947  [pdf, other

    cs.CV cs.AI cs.IR

    Causal Video Summarizer for Video Exploration

    Authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring

    Abstract: Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: This paper is accepted by IEEE International Conference on Multimedia and Expo (ICME), 2022

  31. arXiv:2306.03741   

    quant-ph cs.LG

    Classical-to-Quantum Transfer Learning Facilitates Machine Learning with Variational Quantum Circuit

    Authors: Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh, Hector Zenil, Jesper Tegner

    Abstract: While Quantum Machine Learning (QML) is an exciting emerging area, the accuracy of the loss function still needs to be improved by the number of available qubits. Here, we reformulate the QML problem such that the approximation error (representation power) does not depend on the number of qubits. We prove that a classical-to-quantum transfer learning architecture using a Variational Quantum Circui… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 May, 2023; originally announced June 2023.

    Comments: The paper needs a major revision before it could be submitted to a new journal, and the authors agree that the latest version could not be open to public at the moment

  32. arXiv:2306.01015  [pdf, other

    cs.CL cs.NE cs.SD eess.AS

    How to Estimate Model Transferability of Pre-Trained Speech Models?

    Authors: Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

    Abstract: In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates using the extracted representations. Our framework efficiently computes transferability… ▽ More

    Submitted 5 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech. Code is available at: https://github.com/virginiakm1988/LogME-CTC. Fixed a typo

  33. arXiv:2306.00331  [pdf, other

    eess.AS cs.AI cs.SD eess.SP eess.SY

    A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models

    Authors: Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a multi-dimensional structured state space (S4) approach to speech enhancement. To better capture the spectral dependencies across the frequency axis, we focus on modifying the multi-dimensional S4 layer with whitening transformation to build new small-footprint models that also achieve good performance. We explore several S4-based deep architectures in time (T) and time-frequency (TF)… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023. Code will be released at https://github.com/Kuray107/S4ND-U-Net_speech_enhancement

  34. arXiv:2305.16932  [pdf, other

    cs.SD cs.CL eess.AS

    A Neural State-Space Model Approach to Efficient Speech Separation

    Authors: Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

    Abstract: In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM). Motivated by linear time-invariant systems for sequence modeling, our SSM-based approach can efficiently model input signals into a format of linear ordinary differential equations (ODEs) for representation learning. To extend the SSM technique into speech separation tasks, we firs… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by InterSpeech 2023

  35. arXiv:2305.11360  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Differentially Private Adapters for Parameter Efficient Acoustic Modeling

    Authors: Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi

    Abstract: In this work, we devise a parameter-efficient solution to bring differential privacy (DP) guarantees into adaptation of a cross-lingual speech classifier. We investigate a new frozen pre-trained adaptation framework for DP-preserving speech modeling without full model fine-tuning. First, we introduce a noisy teacher-student ensemble into a conventional adaptation scheme leveraging a frozen pre-tra… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023. Code will be available at: https://github.com/Chun-wei-Ho/Private-Speech-Adapter. The authors would like to express their gratitude to Prof. Chin-Hui Lee from Georgia Tech for providing helpful insights and suggestions

  36. arXiv:2305.11320  [pdf, other

    cs.SD cs.AI cs.NE eess.AS eess.SP

    Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

    Authors: Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien

    Abstract: This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS). A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2\% to 0.8\% of original trainable parameters to achieve competitive performance in voice synthesis. Motivated by a theoretical foundation of optimal transport (OT), this study… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  37. arXiv:2305.11244  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

    Authors: Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

    Abstract: In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI). Specifically, we investigate different setups to incorporate trainable features into a multi-layer encoder-decoder GSM formulation under frozen pre-trained settings. Our architecture includes residual adapter and model reprogramming (inpu… ▽ More

    Submitted 3 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023, 5 pages. Code is available at: https://github.com/Srijith-rkr/KAUST-Whisper-Adapter under MIT license

  38. arXiv:2305.00455  [pdf, other

    cs.CV cs.AI

    Causalainer: Causal Explainer for Automatic Video Summarization

    Authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring

    Abstract: The goal of video summarization is to automatically shorten videos such that it conveys the overall story without losing relevant information. In many application scenarios, improper video summarization can have a large impact. For example in forensics, the quality of the generated video summary will affect an investigator's judgment while in journalism it might yield undesired bias. Because of th… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: The paper has been accepted by the CVPR Workshop on New Frontiers in Visual Language Reasoning: Compositionality, Prompts, and Causality, 2023

  39. arXiv:2303.14864  [pdf, other

    quant-ph cond-mat.mes-hall cond-mat.mtrl-sci

    Bounds to electron spin qubit variability for scalable CMOS architectures

    Authors: Jesús D. Cifuentes, Tuomo Tanttu, Will Gilbert, Jonathan Y. Huang, Ensar Vahapoglu, Ross C. C. Leon, Santiago Serrano, Dennis Otter, Daniel Dunmore, Philip Y. Mai, Frédéric Schlattner, MengKe Feng, Kohei Itoh, Nikolay Abrosimov, Hans-Joachim Pohl, Michael Thewalt, Arne Laucht, Chih Hwan Yang, Christopher C. Escott, Wee Han Lim, Fay E. Hudson, Rajib Rahman, Andrew S. Dzurak, Andre Saraiva

    Abstract: Spins of electrons in CMOS quantum dots combine exquisite quantum properties and scalable fabrication. In the age of quantum technology, however, the metrics that crowned Si/SiO2 as the microelectronics standard need to be reassessed with respect to their impact upon qubit performance. We chart the spin qubit variability due to the unavoidable atomic-scale roughness of the Si/SiO$_2$ interface, co… ▽ More

    Submitted 5 July, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

    Comments: 20 pages, 8 figures

    Journal ref: Nat Commun 15, 4299 (2024)

  40. arXiv:2303.04090  [pdf, other

    quant-ph cond-mat.mes-hall

    Assessment of error variation in high-fidelity two-qubit gates in silicon

    Authors: Tuomo Tanttu, Wee Han Lim, Jonathan Y. Huang, Nard Dumoulin Stuyck, Will Gilbert, Rocky Y. Su, MengKe Feng, Jesus D. Cifuentes, Amanda E. Seedhouse, Stefan K. Seritan, Corey I. Ostrove, Kenneth M. Rudinger, Ross C. C. Leon, Wister Huang, Christopher C. Escott, Kohei M. Itoh, Nikolay V. Abrosimov, Hans-Joachim Pohl, Michael L. W. Thewalt, Fay E. Hudson, Robin Blume-Kohout, Stephen D. Bartlett, Andrea Morello, Arne Laucht, Chih Hwan Yang , et al. (2 additional authors not shown)

    Abstract: Achieving high-fidelity entangling operations between qubits consistently is essential for the performance of multi-qubit systems and is a crucial factor in achieving fault-tolerant quantum processors. Solid-state platforms are particularly exposed to errors due to materials-induced variability between qubits, which leads to performance inconsistencies. Here we study the errors in a spin qubit pro… ▽ More

    Submitted 15 March, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

  41. Accessing the Full Capabilities of Filter Functions: A Tool for Detailed Noise and Control Susceptibility Analysis

    Authors: Ingvild Hansen, Amanda E. Seedhouse, Andre Saraiva, Andrew S. Dzurak, Chih Hwan Yang

    Abstract: The filter function formalism from quantum control theory is typically used to determine the noise susceptibility of pulse sequences by looking at the overlap between the filter function of the sequence and the noise power spectral density. Importantly, the square modulus of the filter function is used for this method, hence directional and phase information is lost. In this work, we take advantag… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Journal ref: Phys. Rev. A 108, 012426 (2023)

  42. arXiv:2301.07851  [pdf, other

    cs.SD cs.AI cs.LG cs.NE eess.AS

    From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

    Authors: Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

    Abstract: In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages. We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement that, for the first time… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: Submitted to ICASSP 2023. The project was initiated in May 2022 during a research internship at Google Research

  43. arXiv:2211.01317  [pdf, other

    cs.SD cs.AI cs.LG cs.NE eess.AS

    Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming

    Authors: Yun-Ning Hung, Chao-Han Huck Yang, Pin-Yu Chen, Alexander Lerch

    Abstract: Transfer learning (TL) approaches have shown promising results when handling tasks with limited training data. However, considerable memory and computational resources are often required for fine-tuning pre-trained neural networks with target domain data. In this work, we introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neur… ▽ More

    Submitted 3 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE ICASSP 2023. The implementation is available at https://github.com/biboamy/music-repro

  44. arXiv:2211.01263  [pdf, other

    cs.SD cs.LG eess.AS quant-ph

    A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

    Authors: Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios. We project acoustic features based on classical-to-quantum feature encoding. Different from existing quantum convolution techniques, we utilize QKL with features in the quantum space to design kernel-based classifiers… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  45. arXiv:2211.01189  [pdf, other

    eess.AS cs.AI cs.LG cs.NE cs.SD

    Inference and Denoise: Causal Inference-based Neural Speech Enhancement

    Authors: Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

    Abstract: This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. Based on the potential outcome framework, the proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement module… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  46. arXiv:2211.00887  [pdf, other

    quant-ph cs.LG cs.NE eess.SP

    Certified Robustness of Quantum Classifiers against Adversarial Examples through Quantum Noise

    Authors: Jhih-Cing Huang, Yu-Lin Tsai, Chao-Han Huck Yang, Cheng-Fang Su, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo

    Abstract: Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification. In this paper, we propose the first theoretical study demonstrating that adding quantum random rotation noise can improve robustness in quantum classifiers against adversarial attacks. We link the definition of diffe… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE ICASSP 2023

  47. arXiv:2210.06382  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

    Authors: Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data. Through boosting under DP, a student model derived from the training data suffers little model degradation from the models trained with no privacy protection. Our proposed solution leverages upon two mechanisms,… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to ISCA, ISCSLP 2022, Singapore. 5 Pages

  48. arXiv:2210.05614  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

    Authors: Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data. Such a noise perturbation often results in a severe performance degradation in automatic speech recognition (ASR) in order to meet a privacy budget $\varepsilon$. Private aggregation of teacher ensemble (PATE) utilizes ensemble probabilit… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 5 pages. Accepted to IEEE SLT 2022. A first version draft was finished in Aug 2021

  49. arXiv:2208.14671  [pdf, other

    quant-ph cond-mat.mes-hall

    High Fidelity Control of a Nitrogen-Vacancy Spin Qubit at Room Temperature using the SMART Protocol

    Authors: Hyma H. Vallabhapurapu, Ingvild Hansen, Chris Adambukulam, Rainer Stohr, Andrej Denisenko, Chih Hwan Yang, Arne Laucht

    Abstract: A practical implementation of a quantum computer requires robust qubits that are protected against their noisy environment. Dynamical decoupling techniques have been successfully used in the past to offer protected high-fidelity gate operations in negatively-charged Nitrogen-Vacancy (NV-) centers in diamond, albeit under specific conditions with the intrinsic nitrogen nuclear spin initialised. In… ▽ More

    Submitted 9 September, 2022; v1 submitted 31 August, 2022; originally announced August 2022.

    Comments: Minor changes. Updated figures, some text and added more references

  50. arXiv:2208.04724  [pdf, other

    cond-mat.mes-hall physics.chem-ph quant-ph

    Jellybean quantum dots in silicon for qubit coupling and on-chip quantum chemistry

    Authors: Zeheng Wang, MengKe Feng, Santiago Serrano, William Gilbert, Ross C. C. Leon, Tuomo Tanttu, Philip Mai, Dylan Liang, Jonathan Y. Huang, Yue Su, Wee Han Lim, Fay E. Hudson, Christopher C. Escott, Andrea Morello, Chih Hwan Yang, Andrew S. Dzurak, Andre Saraiva, Arne Laucht

    Abstract: The small size and excellent integrability of silicon metal-oxide-semiconductor (SiMOS) quantum dot spin qubits make them an attractive system for mass-manufacturable, scaled-up quantum processors. Furthermore, classical control electronics can be integrated on-chip, in-between the qubits, if an architecture with sparse arrays of qubits is chosen. In such an architecture qubits are either transpor… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.