Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,293 results for author: Park, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20980  [pdf, other

    cs.DC cs.CR

    Impact of Conflicting Transactions in Blockchain: Detecting and Mitigating Potential Attacks

    Authors: Faisal Haque Bappy, Kamrul Hasan, Joon S. Park, Carlos Caicedo, Tariqul Islam

    Abstract: Conflicting transactions within blockchain networks not only pose performance challenges but also introduce security vulnerabilities, potentially facilitating malicious attacks. In this paper, we explore the impact of conflicting transactions on blockchain attack vectors. Through modeling and simulation, we delve into the dynamics of four pivotal attacks - block withholding, double spending, balan… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.20643  [pdf

    cs.CV

    Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer

    Authors: Biagio Brattoli, Mohammad Mostafavi, Taebum Lee, Wonkyung Jung, Jeongun Ryu, Seonwook Park, Jongchan Park, Sergio Pereira, Seunghwan Shin, Sangjoon Choi, Hyojin Kim, Donggeun Yoo, Siraj M. Ali, Kyunghyun Paeng, Chan-Young Ock, Soo Ick Cho, Seokhwi Kim

    Abstract: Despite advancements in methodologies, immunohistochemistry (IHC) remains the most utilized ancillary test for histopathologic and companion diagnostics in targeted therapies. However, objective IHC assessment poses challenges. Artificial intelligence (AI) has emerged as a potential solution, yet its development requires extensive training for each cancer and IHC type, limiting versatility. We dev… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  3. arXiv:2407.19156  [pdf, other

    cs.CV

    Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

    Authors: Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim

    Abstract: Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi-modal 3D object detection methods heavily rely on the LiDAR sensor, treating the camera as an auxiliary modality for augmenting semantic details. Thi… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  4. arXiv:2407.18879  [pdf, other

    cs.SD cs.LG eess.AS

    Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  5. arXiv:2407.18034  [pdf, other

    cs.CV cs.AI

    AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

    Authors: Junho Park, Kyeongbo Kong, Suk-Ju Kang

    Abstract: Recently, there has been a significant amount of research conducted on 3D hand reconstruction to use various forms of human-computer interaction. However, 3D hand reconstruction in the wild is challenging due to extreme lack of in-the-wild 3D hand datasets. Especially, when hands are in complex pose such as interacting hands, the problems like appearance similarity, self-handed occclusion and dept… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  6. arXiv:2407.16840  [pdf, other

    eess.AS cs.AI

    Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

    Authors: Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: One of the challenges in developing a high quality custom keyword spotting (KWS) model is the lengthy and expensive process of collecting training data covering a wide range of languages, phrases and speaking styles. We introduce Synth4Kws - a framework to leverage Text to Speech (TTS) synthesized data for custom KWS in different resource settings. With no real data, we found increasing TTS phrase… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures, 2 tables The paper is accepted in Interspeech SynData4GenAI 2024 Workshop - https://syndata4genai.org/#call-for-papers

  7. arXiv:2407.16181  [pdf, other

    cs.CL

    Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction

    Authors: Jinwook Park, Kangil Kim

    Abstract: Neural parameterization has significantly advanced unsupervised grammar induction. However, training these models with a traditional likelihood loss for all possible parses exacerbates two issues: 1) $\textit{structural optimization ambiguity}$ that arbitrarily selects one among structurally ambiguous optimal grammars despite the specific preference of gold parses, and 2)… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted in ACL2024 Findings, 16 pages, 10 figures

  8. arXiv:2407.15264  [pdf, other

    cs.DC cs.LG

    LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

    Authors: Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Quresh, Scott Mahlke, Wen-mei Hwu

    Abstract: Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  9. arXiv:2407.15143  [pdf, other

    cs.CV cs.AI

    Rethinking Feature Backbone Fine-tuning for Remote Sensing Object Detection

    Authors: Yechan Kim, JongHyun Park, SooYeon Kim, Moongu Jeon

    Abstract: Recently, numerous methods have achieved impressive performance in remote sensing object detection, relying on convolution or transformer architectures. Such detectors typically have a feature backbone to extract useful features from raw input images. For the remote sensing domain, a common practice among current detectors is to initialize the backbone with pre-training on ImageNet consisting of n… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Under Review

  10. arXiv:2407.15131  [pdf, other

    cs.AR cs.LG

    Token-Picker: Accelerating Attention in Text Generation with Minimized Memory Transfer via Probability Estimation

    Authors: Junyoung Park, Myeonggu Kang, Yunki Han, Yanggon Kim, Jaekang Shin, Lee-Sup Kim

    Abstract: The attention mechanism in text generation is memory-bounded due to its sequential characteristics. Therefore, off-chip memory accesses should be minimized for faster execution. Although previous methods addressed this by pruning unimportant tokens, they fall short in selectively removing tokens with near-zero attention probabilities in each instance. Our method estimates the probability before th… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: To appear in the proceedings of 61st Design Automation Conference (DAC)

  11. arXiv:2407.13524  [pdf, other

    cs.CV cs.AI

    Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation

    Authors: Ilhoon Yoon, Hyeongjun Kwon, Jin Kim, Junyoung Park, Hyunsung Jang, Kwanghoon Sohn

    Abstract: Source-Free domain adaptive Object Detection (SFOD) is a promising strategy for deploying trained detectors to new, unlabeled domains without accessing source data, addressing significant concerns around data privacy and efficiency. Most SFOD methods leverage a Mean-Teacher (MT) self-training paradigm relying heavily on High-confidence Pseudo Labels (HPL). However, these HPL often overlook small i… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  12. arXiv:2407.12363  [pdf, other

    cs.CL

    Conversational Query Reformulation with the Guidance of Retrieved Documents

    Authors: Jeonghyun Park, Hwanhee Lee

    Abstract: Conversational search seeks to retrieve relevant passages for the given questions in Conversational QA (ConvQA). Questions in ConvQA face challenges such as omissions and coreferences, making it difficult to obtain desired search results. Conversational Query Reformulation (CQR) transforms these current queries into de-contextualized forms to resolve these issues. However, existing CQR methods foc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 tables

  13. arXiv:2407.12331  [pdf, other

    cs.CV cs.AI

    I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps

    Authors: Junseo Park, Hyeryung Jang

    Abstract: Large-scale diffusion models have made significant advancements in the field of image generation, especially through the use of cross-attention mechanisms that guide image formation based on textual descriptions. While the analysis of text-guided cross-attention in diffusion models has been extensively studied in recent years, its application in image-to-image diffusion models remains underexplore… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 9 pages, 9 figures

  14. arXiv:2407.12329  [pdf, other

    cs.CV

    Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views

    Authors: Jihoon Cho, Suhyun Ahn, Beomju Kim, Hyungjoon Bae, Xiaofeng Liu, Fangxu Xing, Kyungeun Lee, Georges Elfakhri, Van Wedeen, Jonghye Woo, Jinah Park

    Abstract: Deep learning-based segmentation techniques have shown remarkable performance in brain segmentation, yet their success hinges on the availability of extensive labeled training data. Acquiring such vast datasets, however, poses a significant challenge in many clinical applications. To address this issue, in this work, we propose a novel 3D brain segmentation approach using complementary 2D diffusio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Extended version of "3D Segmentation of Subcortical Brain Structure with Few Labeled Data using 2D Diffusion Models" (ISMRM 2024 oral)

  15. arXiv:2407.11370  [pdf, other

    cs.SD cs.CL eess.AS

    A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora

    Authors: Kentaro Onda, Joonyong Park, Nobuaki Minematsu, Daisuke Saito

    Abstract: We propose a method of simulating the human process of foreign accentuation using Generative Spoken Language Model (GSLM) only with native speech corpora. When one listens to spoken words of a foreign language and repeats them, the repeated speech is often with the accent of that listener's L1. This is said to be because the spoken words are mentally represented as a sequence of phonological units… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH2024

  16. arXiv:2407.10542  [pdf, other

    cs.CV cs.AI

    3D Geometric Shape Assembly via Efficient Point Cloud Matching

    Authors: Nahyuk Lee, Juhong Min, Junha Lee, Seungwook Kim, Kanghee Lee, Jaesik Park, Minsu Cho

    Abstract: Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matchin… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

  17. arXiv:2407.10461  [pdf, ps, other

    cs.IT

    Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights

    Authors: Seyong Kim, Jinseok Choi, Wonjae Shin, Namyoon Lee, Jeonghun Park

    Abstract: To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  18. arXiv:2407.09005  [pdf, other

    cs.CV cs.AI eess.IV

    Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset

    Authors: Yongjin Kim, Jinbum Park, Sanha Kang, Hanguen Kim

    Abstract: The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light r… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 11 pages, 9 figures, whitepaper

  19. arXiv:2407.08923  [pdf, other

    cs.IT cs.NI

    A Bistatic ISAC Framework for LEO Satellite Systems: A Rate-Splitting Approach

    Authors: Juha Park, Jaehyup Seong, Jaehak Ryu, Yijie Mao, Wonjae Shin

    Abstract: Aiming to achieve ubiquitous global connectivity and target detection on the same platform with improved spectral/energy efficiency and reduced onboard hardware cost, low Earth orbit (LEO) satellite systems capable of simultaneously performing communications and radar have attracted significant attention. Designing such a joint system should address not only the challenges of integrating two funct… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 33 pages, 8 figures, 2 tables

  20. arXiv:2407.08476  [pdf, other

    cs.CV

    VideoMamba: Spatio-Temporal Selective State Space Model

    Authors: Jinyoung Park, Hee-Seon Kim, Kangwook Ko, Minbeom Kim, Changick Kim

    Abstract: We introduce VideoMamba, a novel adaptation of the pure Mamba architecture, specifically designed for video recognition. Unlike transformers that rely on self-attention mechanisms leading to high computational costs by quadratic complexity, VideoMamba leverages Mamba's linear complexity and selective SSM mechanism for more efficient processing. The proposed Spatio-Temporal Forward and Backward SSM… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. code available at http://github.com/jinyjelly/VideoMamba

  21. arXiv:2407.07982  [pdf, other

    cs.LG

    Automating Weak Label Generation for Data Programming with Clinicians in the Loop

    Authors: Jean Park, Sydney Pugh, Kaustubh Sridhar, Mengyu Liu, Navish Yarna, Ramneet Kaur, Souradeep Dutta, Elena Bernardis, Oleg Sokolsky, Insup Lee

    Abstract: Large Deep Neural Networks (DNNs) are often data hungry and need high-quality labeled data in copious amounts for learning to converge. This is a challenge in the field of medicine since high quality labeled data is often scarce. Data programming has been the ray of hope in this regard, since it allows us to label unlabeled data using multiple weak labeling functions. Such functions are often supp… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  22. arXiv:2407.06551  [pdf, other

    cs.CL

    OffsetBias: Leveraging Debiased Data for Tuning Evaluators

    Authors: Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, Sanghyuk Choi

    Abstract: Employing Large Language Models (LLMs) to assess the quality of generated responses, such as prompting instruct-tuned models or fine-tuning judge models, has become a widely adopted evaluation method. It is also known that such evaluators are vulnerable to biases, such as favoring longer responses. While it is important to overcome this problem, the specifics of these biases remain under-explored.… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Work in Progress

  23. arXiv:2407.06149  [pdf, other

    cs.SI

    What Is Being Argued (WIBA)? An Application to Legislative Deliberation in the U.S. Congress

    Authors: Arman Irani, Ju Yeon Park, Kevin Esterling, Michalis Faloutsos

    Abstract: How can we utilize state-of-the-art NLP tools to better understand legislative deliberation? Committee hearings are a core feature of any legislature, and they offer an institutional setting which promotes the exchange of arguments and reasoning that directly impact and shape legislation. We apply What Is Being Argued (WIBA), which is an argument extraction and analysis framework that we previousl… ▽ More

    Submitted 15 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Paper for presentation at Polmeth 2024. Updated version with additional methodology sections and analysis results

  24. arXiv:2407.05530  [pdf, other

    cs.RO cs.AI cs.CV

    This&That: Language-Gesture Controlled Video Generation for Robot Planning

    Authors: Boyang Wang, Nikhil Sridhar, Chao Feng, Mark Van der Merwe, Adam Fishman, Nima Fazeli, Jeong Joon Park

    Abstract: We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communicat… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  25. arXiv:2407.05516  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

    Authors: Jin Woo Lee, Jaehyun Park, Min Jun Choi, Kyogu Lee

    Abstract: While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling wit… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  26. arXiv:2407.03627  [pdf, other

    cs.CL

    DSLR: Document Refinement with Sentence-Level Re-ranking and Reconstruction to Enhance Retrieval-Augmented Generation

    Authors: Taeho Hwang, Soyeong Jeong, Sukmin Cho, SeungYoon Han, Jong C. Park

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly improved their performance across various Natural Language Processing (NLP) tasks. However, LLMs still struggle with generating non-factual responses due to limitations in their parametric memory. Retrieval-Augmented Generation (RAG) systems address this issue by incorporating external knowledge with a retrieval module. Despite… ▽ More

    Submitted 7 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Journal ref: KnowledgeNLP@ACL 2024

  27. arXiv:2407.02854  [pdf, other

    cs.CL cs.CV

    Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production

    Authors: Eui Jun Hwang, Sukmin Cho, Huije Lee, Youngwoo Yoon, Jong C. Park

    Abstract: Sign language, essential for the deaf and hard-of-hearing, presents unique challenges in translation and production due to its multimodal nature and the inherent ambiguity in mapping sign language motion to spoken language words. Previous methods often rely on gloss annotations, requiring time-intensive labor and specialized expertise in sign language. Gloss-free methods have emerged to address th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures

  28. arXiv:2407.02403  [pdf, other

    cs.CV cs.AI

    Face Reconstruction Transfer Attack as Out-of-Distribution Generalization

    Authors: Yoon Gyo Jung, Jaewoo Park, Xingbo Dong, Hojin Park, Andrew Beng Jin Teoh, Octavia Camps

    Abstract: Understanding the vulnerability of face recognition systems to malicious attacks is of critical importance. Previous works have focused on reconstructing face images that can penetrate a targeted verification system. Even in the white-box scenario, however, naively reconstructed images misrepresent the identity information, hence the attacks are easily neutralized once the face system is updated o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024

  29. arXiv:2407.02286  [pdf, other

    cs.CV cs.AI

    Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather

    Authors: Junsung Park, Kyungmin Kim, Hyunjung Shim

    Abstract: Existing LiDAR semantic segmentation methods often struggle with performance declines in adverse weather conditions. Previous work has addressed this issue by simulating adverse weather or employing universal data augmentation during training. However, these methods lack a detailed analysis and understanding of how adverse weather negatively affects LiDAR semantic segmentation performance. Motivat… ▽ More

    Submitted 17 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 29 pages, 11 figures, accpeted in ECCV 2024

  30. arXiv:2407.02004  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    SAVE: Segment Audio-Visual Easy way using Segment Anything Model

    Authors: Khanh-Binh Nguyen, Chae Jung Park

    Abstract: The primary aim of Audio-Visual Segmentation (AVS) is to precisely identify and locate auditory elements within visual scenes by accurately predicting segmentation masks at the pixel level. Achieving this involves comprehensively considering data and model aspects to address this task effectively. This study presents a lightweight approach, SAVE, which efficiently adapts the pre-trained segment an… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  31. arXiv:2407.01942  [pdf, other

    cs.AI cs.CL cs.CV

    Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

    Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

    Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages

  32. arXiv:2407.01624  [pdf, other

    cs.LG cs.AI

    Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization

    Authors: Taeyoung Yun, Sujin Yun, Jaewoo Lee, Jinkyoo Park

    Abstract: Optimizing complex and high-dimensional black-box functions is ubiquitous in science and engineering fields. Unfortunately, the online evaluation of these functions is restricted due to time and safety constraints in most cases. In offline model-based optimization (MBO), we aim to find a design that maximizes the target function using only a pre-existing offline dataset. While prior methods consid… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 29 pages, 11 figures, 17 tables

  33. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  34. arXiv:2406.19638  [pdf, other

    cs.CV cs.AI

    Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation

    Authors: Junsung Park, Hyunjung Shim

    Abstract: Weakly Supervised Semantic Segmentation (WSSS) employs weak supervision, such as image-level labels, to train the segmentation model. Despite the impressive achievement in recent WSSS methods, we identify that introducing weak labels with high mean Intersection of Union (mIoU) does not guarantee high segmentation performance. Existing studies have emphasized the importance of prioritizing precisio… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 5 pages, 5 figures, accepted in AAAI 2024 Edge Intelligence Workshop

  35. arXiv:2406.19502  [pdf, other

    cs.CL cs.AI

    Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning

    Authors: Miyoung Ko, Sue Hyun Park, Joonsuk Park, Minjoon Seo

    Abstract: Despite significant advancements, there is a limited understanding of how large language models (LLMs) utilize knowledge for reasoning. To address this, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with parent nodes of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Work in progress; code is available at https://github.com/kaistAI/knowledge-reasoning

  36. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  37. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  38. arXiv:2406.18898  [pdf, other

    cs.CV cs.AI

    360 in the Wild: Dataset for Depth Prediction and View Synthesis

    Authors: Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

    Abstract: The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale… ▽ More

    Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  39. arXiv:2406.18388  [pdf, other

    cs.RO cs.AI

    SAM: Semi-Active Mechanism for Extensible Continuum Manipulator and Real-time Hysteresis Compensation Control Algorithm

    Authors: Junhyun Park, Seonghyeok Jang, Myeongbo Park, Hyojae Park, Jeonghyeon Yoon, Minho Hwang

    Abstract: Cable-Driven Continuum Manipulators (CDCMs) enable scar-free procedures via natural orifices and improve target lesion accessibility through curved paths. However, CDCMs face limitations in workspace and control accuracy due to non-linear cable effects causing hysteresis. This paper introduces an extensible CDCM with a Semi-active Mechanism (SAM) to expand the workspace via translational motion wi… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 12 pages, 14 figures, 6 tables

  40. arXiv:2406.17763  [pdf, other

    cs.LG cs.AI cs.CV math.NA

    DiffusionPDE: Generative PDE-Solving Under Partial Observation

    Authors: Jiahe Huang, Guandao Yang, Zichen Wang, Jeong Joon Park

    Abstract: We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project page: https://jhhuangchloe.github.io/Diffusion-PDE/

  41. arXiv:2406.16013  [pdf, other

    cs.CL cs.AI cs.IR

    Database-Augmented Query Representation for Information Retrieval

    Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

    Abstract: Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  42. arXiv:2406.15275  [pdf, other

    cs.CL

    Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model

    Authors: Doyoung Kim, Jongwon Lee, Jinho Park, Minjoon Seo

    Abstract: Language models have demonstrated impressive capabilities across various natural language processing tasks, yet they struggle with planning tasks requiring multi-step simulations. Inspired by human cognitive processes, this paper investigates the optimal planning power of language models that can construct a cognitive map of a given environment. Our experiments demonstrate that cognitive map signi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  43. arXiv:2406.15007  [pdf, other

    cs.AI

    RouteFinder: Towards Foundation Models for Vehicle Routing Problems

    Authors: Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels Wouda, Leon Lan, Kevin Tierney, Jinkyoo Park

    Abstract: Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  44. arXiv:2406.14954  [pdf, other

    eess.IV cs.CV

    A Unified Framework for Synthesizing Multisequence Brain MRI via Hybrid Fusion

    Authors: Jihoon Cho, Jonghye Woo, Jinah Park

    Abstract: Multisequence Magnetic Resonance Imaging (MRI) provides a reliable diagnosis in clinical applications through complementary information within sequences. However, in practice, the absence of certain MR sequences is a common problem that can lead to inconsistent analysis results. In this work, we propose a novel unified framework for synthesizing multisequence MR images, called Hybrid Fusion GAN (H… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 11 pages, 7 figures

  45. arXiv:2406.13633  [pdf, ps, other

    cs.LG math.OC

    Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

    Authors: Jaehyun Park, Dabeen Lee

    Abstract: We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  46. arXiv:2406.12904  [pdf, other

    cs.LG physics.comp-ph physics.optics

    Meent: Differentiable Electromagnetic Simulator for Machine Learning

    Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

    Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: under review

  47. arXiv:2406.12721  [pdf

    eess.AS cs.SD

    Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

    Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

    Abstract: In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More

    Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 challenge Task4, 4 pages

  48. arXiv:2406.12233  [pdf, other

    cs.AI cs.CL cs.CV

    SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

    Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

    Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  49. arXiv:2406.11813  [pdf, other

    cs.CL

    How Do Large Language Models Acquire Factual Knowledge During Pretraining?

    Authors: Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo

    Abstract: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge ac… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    ACM Class: I.2.7

  50. arXiv:2406.10935  [pdf, other

    cs.CV

    Pick-or-Mix: Dynamic Channel Sampling for ConvNets

    Authors: Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

    Abstract: Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Published in Computer Vision and Pattern Recognition (CVPR 2024)

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024