Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–16 of 16 results for author: Seo, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.05706  [pdf, other

    cs.CL cs.SD eess.AS

    Integrating Paralinguistics in Speech-Empowered Large Language Models for Natural Conversation

    Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo

    Abstract: Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken respons… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  2. ExPECA: An Experimental Platform for Trustworthy Edge Computing Applications

    Authors: Samie Mostafavi, Vishnu Narayanan Moothedath, Stefan Rönngren, Neelabhro Roy, Gourav Prateek Sharma, Sangwon Seo, Manuel Olguín Muñoz, James Gross

    Abstract: This paper presents ExPECA, an edge computing and wireless communication research testbed designed to tackle two pressing challenges: comprehensive end-to-end experimentation and high levels of experimental reproducibility. Leveraging OpenStack-based Chameleon Infrastructure (CHI) framework for its proven flexibility and ease of operation, ExPECA is located in a unique, isolated underground facili… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  3. arXiv:2307.13241  [pdf, other

    eess.IV

    A Visual Quality Assessment Method for Raster Images in Scanned Document

    Authors: Justin Yang, Peter Bauer, Todd Harris, Changhyung Lee, Hyeon Seok Seo, Jan P Allebach, Fengqing Zhu

    Abstract: Image quality assessment (IQA) is an active research area in the field of image processing. Most prior works focus on visual quality of natural images captured by cameras. In this paper, we explore visual quality of scanned documents, focusing on raster image areas. Different from many existing works which aim to estimate a visual quality score, we propose a machine learning based classification m… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  4. Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

    Authors: Hanbyul Kim, Seunghyun Seo, Lukas Lee, Seolki Baek

    Abstract: Punctuated text prediction is crucial for automatic speech recognition as it enhances readability and impacts downstream natural language processing tasks. In streaming scenarios, the ability to predict punctuation in real-time is particularly desirable but presents a difficult technical challenge. In this work, we propose a method for predicting punctuated text from input speech using a chunk-bas… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

    Journal ref: Proc. INTERSPEECH 2023, 1653-1657

  5. arXiv:2306.00680  [pdf, other

    cs.SD cs.AI eess.AS

    Encoder-decoder multimodal speaker change detection

    Authors: Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee

    Abstract: The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are bui… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted for presentation at INTERSPEECH 2023

  6. arXiv:2210.17017  [pdf, other

    cs.CL cs.SD eess.AS

    Blank Collapse: Compressing CTC emission for the faster decoding

    Authors: Minkyu Jung, Ohhyeok Kwon, Seunghyun Seo, Soonshin Seo

    Abstract: Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a ve… ▽ More

    Submitted 26 June, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted in Interspeech 2023

  7. Machine learning based lens-free imaging technique for field-portable cytometry

    Authors: Rajkumar Vaghashiya, Sanghoon Shin, Varun Chauhan, Kaushal Kapadiya, Smit Sanghavi, Sungkyu Seo, Mohendra Roy

    Abstract: Lens-free Shadow Imaging Technique (LSIT) is a well-established technique for the characterization of microparticles and biological cells. Due to its simplicity and cost-effectiveness, various low-cost solutions have been evolved, such as automatic analysis of complete blood count (CBC), cell viability, 2D cell morphology, 3D cell tomography, etc. The developed auto characterization algorithm so f… ▽ More

    Submitted 2 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: Published in Biosensors Journal

    Journal ref: https://www.mdpi.com/2079-6374/12/3/144

  8. arXiv:2112.15399  [pdf, other

    cs.CV cs.GR eess.IV

    InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering

    Authors: Mijeong Kim, Seonguk Seo, Bohyung Han

    Abstract: We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation. The proposed approach minimizes potential reconstruction inconsistency that happens due to insufficient viewpoints by imposing the entropy constraint of the density in each ray. In addition, to alleviate the potential degenerate issue when all training images are… ▽ More

    Submitted 10 April, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

    Comments: CVPR 2022, Website: http://cv.snu.ac.kr/research/InfoNeRF

  9. arXiv:2105.02400  [pdf, other

    cs.CV eess.IV

    SIPSA-Net: Shift-Invariant Pan Sharpening with Moving Object Alignment for Satellite Imagery

    Authors: Jaehyup Lee, Soomin Seo, Munchurl Kim

    Abstract: Pan-sharpening is a process of merging a high-resolution (HR) panchromatic (PAN) image and its corresponding low-resolution (LR) multi-spectral (MS) image to create an HR-MS and pan-sharpened image. However, due to the different sensors' locations, characteristics and acquisition time, PAN and MS image pairs often tend to have various amounts of misalignment. Conventional deep-learning-based metho… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  10. arXiv:2104.07253  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

    Authors: Seunghyun Seo, Donghyun Kwak, Bowon Lee

    Abstract: Most End-to-End (E2E) SLU networks leverage the pre-trained ASR networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained NLU networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation, cross-modal shared embed… ▽ More

    Submitted 16 February, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted for ICASSP 2022

  11. arXiv:2101.11469  [pdf, ps, other

    eess.AS cs.CL cs.SD

    VOTE400(Voide Of The Elderly 400 Hours): A Speech Dataset to Study Voice Interface for Elderly-Care

    Authors: Minsu Jang, Sangwon Seo, Dohyung Kim, Jaeyeon Lee, Jaehong Kim, Jun-Hwan Ahn

    Abstract: This paper introduces a large-scale Korean speech dataset, called VOTE400, that can be used for analyzing and recognizing voices of the elderly people. The dataset includes about 300 hours of continuous dialog speech and 100 hours of read speech, both recorded by the elderly people aged 65 years or over. A preliminary experiment showed that speech recognition system trained with VOTE400 can outper… ▽ More

    Submitted 20 January, 2021; originally announced January 2021.

    Comments: 3 pages, 7 tables

  12. arXiv:2007.13350  [pdf

    eess.AS cs.LG cs.SD

    Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

    Authors: Soonshin Seo, Ji-Hwan Kim

    Abstract: One of the most important parts of an end-to-end speaker verification system is the speaker embedding generation. In our previous paper, we reported that shortcut connections-based multi-layer aggregation improves the representational power of the speaker embedding. However, the number of model parameters is relatively large and the unspecified variations increase in the multi-layer aggregation. T… ▽ More

    Submitted 28 July, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: 5 pages, 1 figures, 4 tables

  13. arXiv:2006.16583  [pdf, other

    eess.IV

    Pan-Sharpening with Color-Aware Perceptual Loss and Guided Re-Colorization

    Authors: Juan Luis Gonzalez Bello, Soomin Seo, Munchurl Kim

    Abstract: We present a novel color-aware perceptual (CAP) loss for learning the task of pan-sharpening. Our CAP loss is designed to focus on the deep features of a pre-trained VGG network that are more sensitive to spatial details and ignore color information to allow the network to extract the structural information from the PAN image while keeping the color from the lower resolution MS image. Additionally… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  14. arXiv:2004.09239  [pdf

    eess.IV

    Firefly-Algorithm Supported Scheme to Detect COVID-19 Lesion in Lung CT Scan Images using Shannon Entropy and Markov-Random-Field

    Authors: Venkatesan Rajinikanth, Seifedine Kadry, Krishnan Palani Thanaraj, Krishnamurthy Kamalanand, Sanghyun Seo

    Abstract: The pneumonia caused by Coronavirus disease (COVID-19) is one of major global threat and a number of detection and treatment procedures are suggested by the researchers for COVID-19. The proposed work aims to suggest an automated image processing scheme to extract the COVID-19 lesion from the lung CT scan images (CTI) recorded from the patients. This scheme implements the following procedures; (i)… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: 12 pages

  15. arXiv:2001.10817  [pdf

    eess.AS cs.LG cs.SD stat.ML

    MCSAE: Masked Cross Self-Attentive Encoding for Speaker Embedding

    Authors: Soonshin Seo, Ji-Hwan Kim

    Abstract: In general, a self-attention mechanism has been applied for speaker embedding encoding. Previous studies focused on training the self-attention in a high-level layer, such as the last pooling layer. However, the effect of low-level features was reduced in the speaker embedding encoding. Therefore, we propose masked cross self-attentive encoding (MCSAE) using ResNet. It focuses on the features of b… ▽ More

    Submitted 28 July, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

    Comments: 5 pages, 3 figures, 4 tables

  16. arXiv:1904.03814  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

    Authors: Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha

    Abstract: Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, th… ▽ More

    Submitted 18 November, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: In INTERSPEECH 2019