Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–22 of 22 results for author: Ao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13340  [pdf, other

    cs.CL cs.SD eess.AS

    SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2405.05791  [pdf, other

    cs.CV

    Sequential Amodal Segmentation via Cumulative Occlusion Learning

    Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

    Abstract: To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model wi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  4. arXiv:2312.16002  [pdf, other

    eess.AS cs.AI

    The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

    Authors: Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

    Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Technical Report. 2 pages. For ICMC-ASR-2023 Challenge

  5. arXiv:2309.10674  [pdf, other

    cs.SD eess.AS

    USED: Universal Speaker Extraction and Diarization

    Authors: Junyi Ao, Mehmet Sinan Yıldırım, Ruijie Tao, Meng Ge, Shuai Wang, Yanmin Qian, Haizhou Li

    Abstract: Speaker extraction and diarization are two enabling techniques for real-world speech applications. Speaker extraction aims to extract a target speaker's voice from a speech mixture, while speaker diarization demarcates speech segments by speaker, annotating `who spoke when'. Previous studies have typically treated the two tasks independently. In practical applications, it is more meaningful to hav… ▽ More

    Submitted 9 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  6. arXiv:2303.06596  [pdf, other

    cs.CV cs.LG

    Amodal Intra-class Instance Segmentation: Synthetic Datasets and Benchmark

    Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

    Abstract: Images of realistic scenes often contain intra-class objects that are heavily occluded from each other, making the amodal perception task that requires parsing the occluded parts of the objects challenging. Although important for downstream tasks such as robotic grasping systems, the lack of large-scale amodal datasets with detailed annotations makes it difficult to model intra-class occlusions ex… ▽ More

    Submitted 7 November, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: Accepted at WACV 2024. Datasets are available at https://github.com/saraao/amodal-dataset

  7. arXiv:2302.12074  [pdf, other

    cs.LG cs.AI stat.AP stat.CO

    Active learning for structural reliability analysis with multiple limit state functions through variance-enhanced PC-Kriging surrogate models

    Authors: J. Moran A., P. G. Morato, P. Rigo

    Abstract: Existing active strategies for training surrogate models yield accurate structural reliability estimates by aiming at design space regions in the vicinity of a specified limit state function. In many practical engineering applications, various damage conditions, e.g. repair, failure, should be probabilistically characterized, thus demanding the estimation of multiple performance functions. In this… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  8. arXiv:2210.16755  [pdf, other

    cs.CL cs.SD eess.AS

    token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

    Authors: Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

    Abstract: Self-supervised pre-training has been successful in both text and speech processing. Speech and text offer different but complementary information. The question is whether we are able to perform a speech-text joint pre-training on unpaired speech and text. In this paper, we take the idea of self-supervised pre-training one step further and propose token2vec, a novel joint pre-training framework fo… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  9. arXiv:2210.04062  [pdf, other

    cs.SD eess.AS

    CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

    Authors: Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li

    Abstract: Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning, where we predict the code representations based on a masked view of the original speech… ▽ More

    Submitted 5 July, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Accepted by Interspeech 2023

  10. arXiv:2210.03730  [pdf, other

    cs.CL eess.AS

    SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

    Authors: Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei

    Abstract: The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder. Leveraging hidden-unit as an interface to align speech and text, we can decomp… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 14 pages, accepted by EMNLP 2022

  11. Image Amodal Completion: A Survey

    Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger

    Abstract: Existing computer vision systems can compete with humans in understanding the visible parts of objects, but still fall far short of humans when it comes to depicting the invisible parts of partially occluded objects. Image amodal completion aims to equip computers with human-like amodal completion functions to understand an intact object despite it being partially occluded. The main purpose of thi… ▽ More

    Submitted 7 November, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted at Computer Vision and Image Understanding. See https://doi.org/10.1016/j.cviu.2023.103661 for the final version

  12. arXiv:2206.05777  [pdf, other

    cs.CL eess.AS

    The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

    Authors: Ziqiang Zhang, Junyi Ao, Long Zhou, Shujie Liu, Furu Wei, Jinyu Li

    Abstract: This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese. The YiTrans system is built on large-scale pre-trained encoder-decoder models. More specifically, we first design a multi-stage pre-training strategy to build a multi-modality model with a large amount of labe… ▽ More

    Submitted 13 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: 11 pages

  13. arXiv:2203.17113  [pdf, other

    cs.SD cs.LG eess.AS

    Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

    Authors: Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei

    Abstract: This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. One is to predict the pseudo codes via masked language mode… ▽ More

    Submitted 20 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  14. arXiv:2203.15610  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

    Authors: Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

    Abstract: Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pr… ▽ More

    Submitted 18 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, accepted to Insterspeech 2022

  15. arXiv:2110.07205  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

    Authors: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

    Abstract: Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After prepro… ▽ More

    Submitted 24 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by ACL 2022 main conference

  16. arXiv:2110.05036  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Multi-View Self-Attention Based Transformer for Speaker Recognition

    Authors: Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

    Abstract: Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Tr… ▽ More

    Submitted 27 January, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Paper to appear at ICASSP 2022

  17. arXiv:2110.01094  [pdf, other

    cs.CL

    Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

    Authors: Wenqian Ye, Fei Xu, Yaojia Huang, Cassie Huang, Ji A

    Abstract: Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious probl… ▽ More

    Submitted 3 October, 2021; originally announced October 2021.

  18. arXiv:2103.05457  [pdf, other

    cs.IR

    Rudder: A Cross Lingual Video and Text Retrieval Dataset

    Authors: Jayaprakash A, Abhishek, Rishabh Dabral, Ganesh Ramakrishnan, Preethi Jyothi

    Abstract: Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings wher… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

  19. NIT COVID-19 at WNUT-2020 Task 2: Deep Learning Model RoBERTa for Identify Informative COVID-19 English Tweets

    Authors: Jagadeesh M S, Alphonse P J A

    Abstract: This paper presents the model submitted by the NIT_COVID-19 team for identified informative COVID-19 English tweets at WNUT-2020 Task2. This shared task addresses the problem of automatically identifying whether an English tweet related to informative (novel coronavirus) or not. These informative tweets provide information about recovered, confirmed, suspected, and death cases as well as the locat… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 5 pages, one figures, conference

  20. arXiv:2002.05097  [pdf, other

    cs.CR

    EncDBDB: Searchable Encrypted, Fast, Compressed, In-Memory Database using Enclaves

    Authors: Benny Fuhry, Jayanth Jain H A, Florian Kerschbaum

    Abstract: Data confidentiality is an important requirement for clients when outsourcing databases to the cloud. Trusted execution environments, such as Intel SGX, offer an efficient, hardware-based solution to this cryptographic problem. Existing solutions are not optimized for column-oriented, in-memory databases and pose impractical memory requirements on the enclave. We present EncDBDB, a novel approach… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

  21. arXiv:1810.05502  [pdf

    eess.SP cs.DC

    Asynchronous Wi-Fi Control Interface (AWCI) Using Socket IO Technology

    Authors: Devipriya T K, Jovita Franci A, Deepa R, Godwin Sam Josh

    Abstract: The Internet of Things (IoT) is a system of interrelated computing devices to the Internet that are provided with unique identifiers which has the ability to transfer data over a network without requiring human-to- human or human-to- computer interaction. Raspberry pi-3 a popular, cheap, small and powerful computer with built in Wi-Fi can be used to make any devices smart by connecting to that par… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: 5 pages, 5 figures, published with Global Research and Development Journal for Engineering

    Journal ref: Global Research and Development Journal for Engineering, 1(3), pp.66-70, 2017

  22. arXiv:1804.10651  [pdf

    cs.HC

    5PEN TECHNOLOGY: A New Dawn in Homogeneous and Heterogeneous Computing

    Authors: Osagie Scale Uwadia Maxwell, K. O. Obahiagbon, Osagie Joy Amenze, John-Otumu M. A

    Abstract: This research work is a pair review into the conceptual frame work and innovation into Pen-style Personal Network Gadget Package (P-ISM) as inevitable tool to easy, fast and convenient access to the internet. Computing activities have increased the degree of people using personal computers (PCs), complicated packages and all form of social media applications (Apps.) have emerged within this short… ▽ More

    Submitted 5 April, 2018; originally announced April 2018.