Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 886 results for author: Kim, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03093  [pdf, other

    cs.SE

    Multi-language Unit Test Generation using LLMs

    Authors: Rangeet Pan, Myeongsoo Kim, Rahul Krishna, Raju Pavuluri, Saurabh Sinha

    Abstract: Implementing automated unit tests is an important but time consuming activity in software development. Developers dedicate substantial time to writing tests for validating an application and preventing regressions. To support developers in this task, software engineering research over the past few decades has developed many techniques for automating unit test generation. However, despite this effo… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.00355  [pdf, other

    cs.CL

    YA-TA: Towards Personalized Question-Answering Teaching Assistants using Instructor-Student Dual Retrieval-augmented Knowledge Fusion

    Authors: Dongil Yang, Suyeon Lee, Minjin Kim, Jungsoo Won, Namyoung Kim, Dongha Lee, Jinyoung Yeo

    Abstract: Engagement between instructors and students plays a crucial role in enhancing students'academic performance. However, instructors often struggle to provide timely and personalized support in large classes. To address this challenge, we propose a novel Virtual Teaching Assistant (VTA) named YA-TA, designed to offer responses to students that are grounded in lectures and are easy to understand. To f… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 9 pages, 5 figures

  3. arXiv:2408.17006  [pdf, other

    cs.CV

    Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering

    Authors: Su Hyeon Lim, Minkuk Kim, Hyeon Bae Kim, Seong Tae Kim

    Abstract: Visual Question Answering with Natural Language Explanation (VQA-NLE) task is challenging due to its high demand for reasoning-based inference. Recent VQA-NLE studies focus on enhancing model networks to amplify the model's reasoning capability but this approach is resource-consuming and unstable. In this work, we introduce a new VQA-NLE model, ReRe (Retrieval-augmented natural language Reasoning)… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: ICIP Workshop 2024

  4. arXiv:2408.16208  [pdf, other

    cs.LG cs.CL

    ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics

    Authors: Oishi Banerjee, Agustina Saenz, Kay Wu, Warren Clements, Adil Zia, Dominic Buensalido, Helen Kavnoudias, Alain S. Abi-Ghanem, Nour El Ghawi, Cibele Luna, Patricia Castillo, Khaled Al-Surimi, Rayyan A. Daghistani, Yuh-Min Chen, Heng-sheng Chao, Lars Heiliger, Moon Kim, Johannes Haubold, Frederic Jonske, Pranav Rajpurkar

    Abstract: Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First,… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  5. arXiv:2408.14611  [pdf

    cs.DC cs.DB

    Scalable, reproducible, and cost-effective processing of large-scale medical imaging datasets

    Authors: Michael E. Kim, Karthik Ramadass, Chenyu Gao, Praitayini Kanakaraj, Nancy R. Newlin, Gaurav Rudravaram, Kurt G. Schilling, Blake E. Dewey, Derek Archer, Timothy J. Hohman, Zhiyuan Li, Shunxing Bao, Bennett A. Landman, Nazirah Mohd Khairi

    Abstract: Curating, processing, and combining large-scale medical imaging datasets from national studies is a non-trivial task due to the intense computation and data throughput required, variability of acquired data, and associated financial overhead. Existing platforms or tools for large-scale data curation, processing, and storage have difficulty achieving a viable cost-to-scale ratio of computation spee… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  6. arXiv:2408.14068  [pdf, other

    cond-mat.mtrl-sci cs.CE math.NA

    Variable offsets and processing of implicit forms toward the adaptive synthesis and analysis of heterogeneous conforming microstructure

    Authors: Q. Y. Hong, P. Antolin, G. Elber, M. -S. Kim

    Abstract: The synthesis of porous, lattice, or microstructure geometries has captured the attention of many researchers in recent years. Implicit forms, such as triply periodic minimal surfaces (TPMS) has captured a significant attention, recently, as tiles in lattices, partially because implicit forms have the potential for synthesizing with ease more complex topologies of tiles, compared to parametric for… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 15 pages, 17 figures

  7. arXiv:2408.12150  [pdf, other

    eess.IV cs.AI cs.LG

    DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding

    Authors: Jooyoung Lee, Se Yoon Jeong, Munchurl Kim

    Abstract: Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency compared to simulcast compression. Research on neural network (NN)-based PIC is in its early stages, mainly focusing on applying varying quantization step sizes… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  8. arXiv:2408.12134  [pdf, other

    cs.IT eess.SP

    Machine Learning-based Channel Prediction in Wideband Massive MIMO Systems with Small Overhead for Online Training

    Authors: Beomsoo Ko, Hwanjin Kim, Minje Kim, Junil Choi

    Abstract: Channel prediction compensates for outdated channel state information in multiple-input multiple-output (MIMO) systems. Machine learning (ML) techniques have recently been implemented to design channel predictors by leveraging the temporal correlation of wireless channels. However, most ML-based channel prediction techniques have only considered offline training when generating channel predictors,… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 16 pages, 16 figures, 4 tables

  9. arXiv:2408.07648  [pdf, other

    cs.CV cs.CL

    See It All: Contextualized Late Aggregation for 3D Dense Captioning

    Authors: Minjung Kim, Hyung Suk Lim, Seung Hwan Kim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim

    Abstract: 3D dense captioning is a task to localize objects in a 3D scene and generate descriptive sentences for each object. Recent approaches in 3D dense captioning have adopted transformer encoder-decoder frameworks from object detection to build an end-to-end pipeline without hand-crafted components. However, these approaches struggle with contradicting objectives where a single query attention has to s… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024 Findings

  10. arXiv:2408.07326  [pdf, other

    cs.AR

    LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference

    Authors: Seungjae Moon, Jung-Hoon Kim, Junsoo Kim, Seongmin Hong, Junseo Cha, Minsu Kim, Sukbin Lim, Gyubin Choi, Dongjin Seo, Jongho Kim, Hunjong Lee, Hyunjun Park, Ryeowook Ko, Soongyu Choi, Jongse Park, Jinwon Lee, Joo-Young Kim

    Abstract: The explosive arrival of OpenAI's ChatGPT has fueled the globalization of large language model (LLM), which consists of billions of pretrained parameters that embodies the aspects of syntax and semantics. HyperAccel introduces latency processing unit (LPU), a latency-optimized and highly scalable processor architecture for the acceleration of LLM inference. LPU perfectly balances the memory bandwi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  11. arXiv:2408.06954  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    Neural Speech and Audio Coding

    Authors: Minje Kim, Jan Skoglund

    Abstract: This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs and discusses the limitations of purely data-driven approaches, which often require inefficiently large architectures to match the performance of model-based met… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication in IEEE Signal Processing Magazine

  12. arXiv:2408.06673  [pdf

    cs.CL

    Pragmatic inference of scalar implicature by LLMs

    Authors: Ye-eun Cho, Seong mook Kim

    Abstract: This study investigates how Large Language Models (LLMs), particularly BERT (Devlin et al., 2019) and GPT-2 (Radford et al., 2019), engage in pragmatic inference of scalar implicature, such as some. Two sets of experiments were conducted using cosine similarity and next sentence/token prediction as experimental methods. The results in experiment 1 showed that, both models interpret some as pragmat… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: This research was presented at the Association for Computational Linguistics conference, held on August 11-16

  13. arXiv:2408.06662  [pdf, other

    cs.CV

    Bi-directional Contextual Attention for 3D Dense Captioning

    Authors: Minjung Kim, Hyung Suk Lim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim

    Abstract: 3D dense captioning is a task involving the localization of objects and the generation of descriptions for each object in a 3D scene. Recent approaches have attempted to incorporate contextual information by modeling relationships with object pairs or aggregating the nearest neighbor features of an object. However, the contextual information constructed in these scenarios is limited in two aspects… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024 (Oral)

  14. arXiv:2408.05499  [pdf, other

    cs.DC cs.AI

    LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

    Authors: Jaehong Cho, Minsu Kim, Hyunmin Choi, Guseul Heo, Jongse Park

    Abstract: Recently, there has been an extensive research effort in building efficient large language model (LLM) inference serving systems. These efforts not only include innovations in the algorithm and software domains but also constitute developments of various hardware acceleration techniques. Nevertheless, there is a lack of simulation infrastructure capable of accurately modeling versatile hardware-so… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 15 pages, 11 figures

  15. arXiv:2408.00351  [pdf, other

    cs.CV

    Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

    Authors: Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim

    Abstract: We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos. Our core ingredient is a novel hierarchy deformation model, which captures motions of objects with a tree-structured bones. Our hierarchy system decomposes motions based on the granularity and reveals the correlations between parts without exploiting any prior structural k… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024 accepted

  16. arXiv:2407.18141  [pdf, other

    cs.HC cs.ET cs.LG eess.IV

    IRIS: Wireless Ring for Vision-based Smart Home Interaction

    Authors: Maruchi Kim, Antonio Glenn, Bandhav Veluri, Yunseo Lee, Eyoel Gebre, Aditya Bagaria, Shwetak Patel, Shyamnath Gollakota

    Abstract: Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-a… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 15 pages, 17 figures, 6 tables, to be published in UIST 2024

  17. arXiv:2407.16133  [pdf, other

    cs.CV

    Open-Set Biometrics: Beyond Good Closed-Set Models

    Authors: Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu

    Abstract: Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believ… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Published at ECCV 2024

  18. arXiv:2407.15154  [pdf, other

    cs.CL

    Fine-grained Gender Control in Machine Translation with Large Language Models

    Authors: Minwoo Lee, Hyukhun Koh, Minsung Kim, Kyomin Jung

    Abstract: In machine translation, the problem of ambiguously gendered input has been pointed out, where the gender of an entity is not available in the source sentence. To address this ambiguity issue, the task of controlled translation that takes the gender of the ambiguous entity as additional input have been proposed. However, most existing works have only considered a simplified setup of one target gend… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: NAACL 2024 Main track long paper

  19. arXiv:2407.12330  [pdf, other

    cs.LG cs.AI

    Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset

    Authors: Mijoo Kim, Junseok Kwon

    Abstract: With the rapid advancement in the performance of deep neural networks (DNNs), there has been significant interest in deploying and incorporating artificial intelligence (AI) systems into real-world scenarios. However, many DNNs lack the ability to represent uncertainty, often exhibiting excessive confidence even when making incorrect predictions. To ensure the reliability of AI systems, particular… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  20. arXiv:2407.12325  [pdf, other

    cs.IR

    Optimizing Query Generation for Enhanced Document Retrieval in RAG

    Authors: Hamin Koo, Minseon Kim, Sung Ju Hwang

    Abstract: Large Language Models (LLMs) excel in various language tasks but they often generate incorrect information, a phenomenon known as "hallucinations". Retrieval-Augmented Generation (RAG) aims to mitigate this by using document retrieval for accurate responses. However, RAG still faces hallucinations due to vague queries. This study aims to improve RAG by optimizing query generation with a query-docu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  21. arXiv:2407.11347  [pdf, other

    cs.CV

    I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

    Authors: Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim

    Abstract: We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  22. arXiv:2407.10910  [pdf, other

    cs.CV cs.LG

    DataDream: Few-shot Guided Dataset Generation

    Authors: Jae Myung Kim, Jessica Bader, Stephan Alaniz, Cordelia Schmid, Zeynep Akata

    Abstract: While text-to-image diffusion models have been shown to achieve state-of-the-art results in image synthesis, they have yet to prove their effectiveness in downstream applications. Previous work has proposed to generate data for image classifier training given limited real data access. However, these methods struggle to generate in-distribution images or depict fine-grained features, thereby hinder… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  23. arXiv:2407.09184  [pdf, other

    cs.CL

    Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers

    Authors: Jong Myoung Kim, Young-Jun Lee, Yong-jin Han, Sangkeun Jung, Ho-Jin Choi

    Abstract: Syntactic elements, such as word order and case markers, are fundamental in natural language processing. Recent studies show that syntactic information boosts language model performance and offers clues for people to understand their learning mechanisms. Unlike languages with a fixed word order such as English, Korean allows for varied word sequences, despite its canonical structure, due to case m… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: COLM 2024; Code and dataset is available in https://github.com/grayapple-git/SIKO

  24. arXiv:2407.09012  [pdf, other

    cs.CV cs.AI

    TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

    Authors: Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo

    Abstract: Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  25. arXiv:2407.08476  [pdf, other

    cs.CV

    VideoMamba: Spatio-Temporal Selective State Space Model

    Authors: Jinyoung Park, Hee-Seon Kim, Kangwook Ko, Minbeom Kim, Changick Kim

    Abstract: We introduce VideoMamba, a novel adaptation of the pure Mamba architecture, specifically designed for video recognition. Unlike transformers that rely on self-attention mechanisms leading to high computational costs by quadratic complexity, VideoMamba leverages Mamba's linear complexity and selective SSM mechanism for more efficient processing. The proposed Spatio-Temporal Forward and Backward SSM… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. code available at http://github.com/jinyjelly/VideoMamba

  26. arXiv:2407.05844  [pdf, other

    cs.CV

    Anatomy-guided Pathology Segmentation

    Authors: Alexander Jaus, Constantin Seibold, Simon Reiß, Lukas Heine, Anton Schily, Moon Kim, Fin Hendrik Bahnsen, Ken Herrmann, Rainer Stiefelhagen, Jens Kleesiek

    Abstract: Pathological structures in medical images are typically deviations from the expected anatomy of a patient. While clinicians consider this interplay between anatomy and pathology, recent deep learning algorithms specialize in recognizing either one of the two, rarely considering the patient's body from such a joint perspective. In this paper, we develop a generalist segmentation model that combines… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  27. arXiv:2407.04597  [pdf, other

    cs.CV cs.AI

    Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection

    Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi

    Abstract: In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures, 5 tables

  28. arXiv:2407.03280  [pdf, other

    cs.IT

    Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks

    Authors: Mintae Kim, Hoon Lee, Sangwon Hwang, Merouane Debbah, Inkyu Lee

    Abstract: This paper presents a cooperative multi-agent deep reinforcement learning (MADRL) approach for unmmaned aerial vehicle (UAV)-aided mobile edge computing (MEC) networks. An UAV with computing capability can provide task offlaoding services to ground internet-of-things devices (IDs). With partial observation of the entire network state, the UAV and the IDs individually determine their MEC strategies… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 13 pages, 6 figures

  29. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  30. arXiv:2407.03051  [pdf, other

    cs.CL

    Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

    Authors: Janghwan Lee, Seongmin Park, Sukjin Hong, Minsoo Kim, Du-Seong Chang, Jungwook Choi

    Abstract: The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved throu… ▽ More

    Submitted 18 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ACL 2024 Main

  31. arXiv:2407.02945  [pdf, other

    cs.CV

    VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

    Authors: Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, Jaegul Choo

    Abstract: Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolate… ▽ More

    Submitted 13 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. Project Page: https://vegs3d.github.io/

  32. arXiv:2406.19596  [pdf, other

    cs.CR cs.AI cs.LG

    Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

    Authors: Diksha Goel, Kristen Moore, Mingyu Guo, Derui Wang, Minjune Kim, Seyit Camtepe

    Abstract: This paper addresses a significant gap in Autonomous Cyber Operations (ACO) literature: the absence of effective edge-blocking ACO strategies in dynamic, real-world networks. It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems. Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study coun… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: The manuscript has been accepted as full paper at European Symposium on Research in Computer Security (ESORICS) 2024

  33. arXiv:2406.18925  [pdf, other

    cs.CL cs.CV

    Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

    Authors: Jiwan Chung, Sungjae Lee, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

    Abstract: Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  34. arXiv:2406.18675  [pdf, other

    cs.HC cs.AI cs.CL

    Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants

    Authors: Minhwa Lee, Zae Myung Kim, Vivek Khetan, Dongyeop Kang

    Abstract: Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific wri… ▽ More

    Submitted 15 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to CHI 2024 In2Writing Workshop

  35. arXiv:2406.16716  [pdf, other

    eess.AS cs.CR cs.SD

    One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection

    Authors: Hyun Myung Kim, Kangwook Jang, Hoirin Kim

    Abstract: As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafid… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  36. arXiv:2406.15225  [pdf, other

    cs.AI cs.RO eess.SP

    Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

    Authors: Jiyong Oh, Syed M. Raza, Lusungu J. Mwasinga, Moonseong Kim, Hyunseung Choo

    Abstract: Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures, Published in the 2024 IEEE Network Operations and Management Symposium (NOMS 2024)

  37. arXiv:2406.14703  [pdf, other

    cs.CL cs.AI

    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

    Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

    Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint; Under review

  38. arXiv:2406.14277  [pdf, other

    cs.CL cs.AI

    Augmenting Query and Passage for Retrieval-Augmented Generation using LLMs for Open-Domain Question Answering

    Authors: Minsang Kim, Cheoneum Park, Seungjun Baek

    Abstract: Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambig… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  39. arXiv:2406.14124  [pdf, other

    cs.AI cs.LG

    Measuring Sample Importance in Data Pruning for Training LLMs from a Data Compression Perspective

    Authors: Minsang Kim, Seungjun Baek

    Abstract: Compute-efficient training of large language models (LLMs) has become an important research problem. In this work, we consider data pruning as a method of data-efficient training of LLMs, where we take a data compression view on data pruning. We argue that the amount of information of a sample, or the achievable compression on its description length, represents its sample importance. The key idea… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  40. arXiv:2406.12430  [pdf, other

    cs.CL cs.AI cs.LG

    PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

    Authors: Myeonghwa Lee, Seonho An, Min-Soo Kim

    Abstract: In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locatin… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: NAACL 2024

    ACM Class: I.2.7

  41. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 12 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  42. arXiv:2406.09246  [pdf, other

    cs.RO cs.LG

    OpenVLA: An Open-Source Vision-Language-Action Model

    Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

    Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More

    Submitted 3 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Website: https://openvla.github.io/

  43. arXiv:2406.08702  [pdf, other

    cs.AI cs.CL cs.CV

    VLind-Bench: Measuring Language Priors in Large Vision-Language Models

    Authors: Kang-il Lee, Minbeom Kim, Seunghyun Yoon, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with im… ▽ More

    Submitted 10 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  44. arXiv:2406.08292  [pdf, other

    cs.CV

    Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

    Authors: Dongsu Zhang, Francis Williams, Zan Gojcic, Karsten Kreis, Sanja Fidler, Young Min Kim, Amlan Kar

    Abstract: We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Gener… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 as highlight

  45. arXiv:2406.07867  [pdf, other

    cs.CV cs.AI cs.HC

    Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    Authors: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro

    Abstract: In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp… ▽ More

    Submitted 2 August, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (Oral)

  46. arXiv:2406.05965  [pdf, other

    eess.AS cs.AI

    MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

    Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung Jin Choi, Nam Soo Kim

    Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  47. arXiv:2406.05963  [pdf, other

    cs.CV cs.AI

    Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

    Authors: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

    Abstract: In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  48. arXiv:2406.02803  [pdf, other

    cs.DC

    DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

    Authors: Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

    Abstract: Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  49. arXiv:2406.02756  [pdf, other

    cs.CL cs.AI cs.LG

    Aligning Large Language Models via Fine-grained Supervision

    Authors: Dehong Xu, Liang Qiu, Minseok Kim, Faisal Ladhak, Jaeyoung Do

    Abstract: Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learn… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  50. Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

    Authors: Jungmin Yun, Mihyeon Kim, Youngbin Kim

    Abstract: Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning a… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2023 Findings