Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,249 results for author: Park, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13633  [pdf, ps, other

    cs.LG math.OC

    Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

    Authors: Jaehyun Park, Dabeen Lee

    Abstract: We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.12904  [pdf, other

    cs.LG physics.comp-ph physics.optics

    Meent: Differentiable Electromagnetic Simulator for Machine Learning

    Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

    Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: under review

  3. arXiv:2406.12721  [pdf

    eess.AS cs.SD

    Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

    Authors: Sangwon Son, Jongyeon Park, Hongkook Kim, Sulaiman Vesal, Jeongeun Lim

    Abstract: In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 challenge Task4, 4 pages

  4. arXiv:2406.12233  [pdf, other

    cs.AI cs.CL cs.CV

    SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

    Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

    Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.11813  [pdf, other

    cs.CL

    How Do Large Language Models Acquire Factual Knowledge During Pretraining?

    Authors: Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo

    Abstract: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge ac… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    ACM Class: I.2.7

  6. arXiv:2406.10935  [pdf, other

    cs.CV

    Pick-or-Mix: Dynamic Channel Sampling for ConvNets

    Authors: Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

    Abstract: Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Published in Computer Vision and Pattern Recognition (CVPR 2024)

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  7. arXiv:2406.09877  [pdf, other

    cs.LG cs.AI cs.DC

    Federated Learning with Flexible Architectures

    Authors: Jong-Ik Park, Carlee Joe-Wong

    Abstract: Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To addre… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.09719  [pdf, other

    cs.CL cs.AI

    Self-Knowledge Distillation for Learning Ambiguity

    Authors: Hancheol Park, Soyeong Jeong, Sukmin Cho, Jong C. Park

    Abstract: Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models t… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures

  9. arXiv:2406.09396  [pdf, other

    cs.CV

    Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

    Authors: Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo

    Abstract: Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large lan… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.09047  [pdf, other

    cs.CG

    DeepJEB: 3D Deep Learning-based Synthetic Jet Engine Bracket Dataset

    Authors: Seongjun Hong, Yongmin Kwon, Dongju Shin, Jangseop Park, Namwoo Kang

    Abstract: Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.08719  [pdf, other

    cs.CR

    TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution

    Authors: Juhee Kim, Jinbum Park, Sihyeon Roh, Jaeyoung Chung, Youngjoo Lee, Taesoo Kim, Byoungyoung Lee

    Abstract: ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential secur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.08051  [pdf, other

    cs.AR cs.PF

    ONNXim: A Fast, Cycle-level Multi-core NPU Simulator

    Authors: Hyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim

    Abstract: As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing architectural NPU simulators lack support for high-speed simulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or different de… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.07867  [pdf, other

    cs.CV cs.AI cs.HC

    Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    Authors: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro

    Abstract: In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  14. arXiv:2406.06448  [pdf, other

    cs.HC

    How is the Pilot Doing: VTOL Pilot Workload Estimation by Multimodal Machine Learning on Psycho-physiological Signals

    Authors: Jong Hoon Park, Lawrence Chen, Ian Higgins, Zhaobo Zheng, Shashank Mehrotra, Kevin Salubre, Mohammadreza Mousaei, Steven Willits, Blain Levedahl, Timothy Buker, Eliot Xing, Teruhisa Misu, Sebastian Scherer, Jean Oh

    Abstract: Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a cr… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 7 figures

  15. arXiv:2406.05963  [pdf, other

    cs.CV cs.AI

    Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

    Authors: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

    Abstract: In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  16. arXiv:2406.04064  [pdf, other

    cs.CL cs.AI cs.CY

    Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

    Authors: Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, Jong C. Park

    Abstract: Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities. To fully understand such social bias in large language models (LLMs), it is essential to consider the composite of social perceptions from diverse perspectives among identities. Previous studies have either evaluated biases in LLMs by indirectly assessing the presence of sentiment… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  17. arXiv:2406.03411  [pdf, other

    cs.CV

    Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach

    Authors: Saehyung Lee, Sangwon Yu, Junsung Park, Jihun Yi, Sungroh Yoon

    Abstract: In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: To appear in ACL 2024 Main

  18. arXiv:2406.03202  [pdf, other

    cs.CL cs.AI

    ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction

    Authors: Jeiyoon Park, Chanjun Park, Heuiseok Lim

    Abstract: We explore and improve the capabilities of LLMs to generate data for grammatical error correction (GEC). When merely producing parallel sentences, their patterns are too simplistic to be valuable as a corpus. To address this issue, we propose an automated framework that includes a Subject Selector, Grammar Selector, Prompt Manager, and Evaluator. Additionally, we introduce a new dataset for GEC ta… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: preprint

  19. arXiv:2406.02989  [pdf, other

    cs.RO cs.AI

    Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

    Authors: Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm, Jeongsoo Park, Jemin Hwangbo

    Abstract: For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves m… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Robotics and Automation Letters (RA-L), First two authors contributed equally

  20. arXiv:2406.02893  [pdf, other

    cs.CL

    Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task

    Authors: Unggi Lee, Jiyeong Bae, Dohee Kim, Sookbun Lee, Jaekwon Park, Taekyung Ahn, Gunho Lee, Damji Stratton, Hyeoncheol Kim

    Abstract: Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures, 3 tables

  21. arXiv:2406.02867  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Oscillations enhance time-series prediction in reservoir computing with feedback

    Authors: Yuji Kawai, Takashi Morita, Jihoon Park, Minoru Asada

    Abstract: Reservoir computing, a machine learning framework used for modeling the brain, can predict temporal data with little observations and minimal computational resources. However, it is difficult to accurately reproduce the long-term target time series because the reservoir system becomes unstable. This predictive capability is required for a wide variety of time-series processing, including predictio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.02477  [pdf, other

    eess.IV cs.CV cs.LG

    Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion

    Authors: Colin Hansen, Simas Glinskis, Ashwin Raju, Micha Kornreich, JinHyeong Park, Jayashri Pawar, Richard Herzog, Li Zhang, Benjamin Odry

    Abstract: Data driven models for automated diagnosis in radiology suffer from insufficient and imbalanced datasets due to low representation of pathology in a population and the cost of expert annotations. Datasets can be bolstered through data augmentation. However, even when utilizing a full suite of transformations during model training, typical data augmentations do not address variations in human anato… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  23. arXiv:2406.02331  [pdf, other

    cs.CL

    Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering

    Authors: ChaeHun Park, Koanho Lee, Hyesu Lim, Jaeseok Kim, Junmo Park, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo

    Abstract: Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings Accepted

  24. arXiv:2406.01996  [pdf, other

    cs.LG cs.AI cs.CV cs.GR

    Bayesian Mesh Optimization for Graph Neural Networks to Enhance Engineering Performance Prediction

    Authors: Jangseop Park, Namwoo Kang

    Abstract: In engineering design, surrogate models are widely employed to replace computationally expensive simulations by leveraging design variables and geometric parameters from computer-aided design (CAD) models. However, these models often lose critical information when simplified to lower dimensions and face challenges in parameter definition, especially with the complex 3D shapes commonly found in ind… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, 3 tables

  25. arXiv:2406.00798  [pdf, other

    cs.CV cs.AI

    PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency

    Authors: Yeonsung Jung, Heecheol Yun, Joonhyung Park, Jin-Hwa Kim, Eunho Yang

    Abstract: Neural Radiance Fields (NeRF) have shown remarkable performance in learning 3D scenes. However, NeRF exhibits vulnerability when confronted with distractors in the training images -- unexpected objects are present only within specific views, such as moving entities like pedestrians or birds. Excluding distractors during dataset construction is a straightforward solution, but without prior knowledg… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  26. arXiv:2405.20610  [pdf, other

    cs.CV

    Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation

    Authors: Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Sung Won Han

    Abstract: In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatc… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, submitted to IEEE TPAMI. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2405.20542  [pdf, ps, other

    cs.LG stat.ML

    On the Connection Between Non-negative Matrix Factorization and Latent Dirichlet Allocation

    Authors: Benedikt Geiger, Peter J. Park

    Abstract: Non-negative matrix factorization with the generalized Kullback-Leibler divergence (NMF) and latent Dirichlet allocation (LDA) are two popular approaches for dimensionality reduction of non-negative data. Here, we show that NMF with $\ell_1$ normalization constraints on the columns of both matrices of the decomposition and a Dirichlet prior on the columns of one matrix is equivalent to LDA. To sho… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 9 pages

    MSC Class: 62H30; 15A23; 62H22; 62F15; 68W40

  28. arXiv:2405.19778  [pdf, other

    cs.CL cs.AI

    Enhancing Consistency and Role-Specific Knowledge Capturing by Rebuilding Fictional Character's Persona

    Authors: Jeiyoon Park, Chanjun Park, Heuiseok Lim

    Abstract: With the recent introduction of Assistants API, it is expected that document-based language models will be actively used in various domains, especially Role-playing. However, a key challenge lies in utilizing protagonist's persona: Assistants API often fails to achieve with its search because the information extraction part is different each time and it often omits important information such as pr… ▽ More

    Submitted 4 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: preprint

  29. arXiv:2405.19570  [pdf, other

    cs.MA cs.RO

    Distributed Online Planning for Min-Max Problems in Networked Markov Games

    Authors: Alexandros E. Tzikas, Jinkyoo Park, Mykel J. Kochenderfer, Ross E. Allen

    Abstract: Min-max problems are important in multi-agent sequential decision-making because they improve the performance of the worst-performing agent in the network. However, solving the multi-agent min-max problem is challenging. We propose a modular, distributed, online planning-based algorithm that is able to approximate the solution of the min-max objective in networked Markov games, assuming that the a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to appear in the IEEE Robotics and Automation Letters

  30. arXiv:2405.18762  [pdf, other

    cs.CV cs.AI

    Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation

    Authors: Jiyoon Myung, Jihyeon Park

    Abstract: This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, whic… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Paper accepted in CVPRW 2024

  31. arXiv:2405.18400  [pdf, other

    cs.CL cs.LG

    Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

    Authors: Ethan Shen, Alan Fan, Sarah M Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

    Abstract: Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 22 pages, 15 figures

  32. arXiv:2405.18012  [pdf, other

    cs.CV eess.IV

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim

    Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  33. arXiv:2405.17918  [pdf, other

    cs.LG cs.AI

    Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation

    Authors: Dong Bok Lee, Aoxuan Silvia Zhang, Byungjoo Kim, Junhyeon Park, Juho Lee, Sung Ju Hwang, Hae Beom Lee

    Abstract: In this paper, we address the problem of cost-sensitive multi-fidelity Bayesian Optimization (BO) for efficient hyperparameter optimization (HPO). Specifically, we assume a scenario where users want to early-stop the BO when the performance improvement is not satisfactory with respect to the required computational cost. Motivated by this scenario, we introduce utility, which is a function predefin… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  34. arXiv:2405.16907  [pdf, other

    cs.AI cs.LG

    GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

    Authors: Jaewoo Lee, Sujin Yun, Taeyoung Yun, Jinkyoo Park

    Abstract: Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the qualit… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted (Spotlight) to ICLR 2024 Workshop on Generative Models for Decision Making. Jaewoo Lee and Sujin Yun are equal contribution authors

  35. arXiv:2405.16877  [pdf, other

    cs.LG cs.AI

    Are Self-Attentions Effective for Time Series Forecasting?

    Authors: Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki Kim

    Abstract: Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically shifted the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 20 pages, 14 figures, 13 tables. Submitted to NeurIPS 2024 (under review)

  36. arXiv:2405.16823  [pdf, other

    cs.CV cs.AI

    Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

    Authors: Gihyun Kwon, Jangho Park, Jong Chul Ye

    Abstract: While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approache… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://unifyediting.github.io/

  37. arXiv:2405.16012  [pdf, other

    cs.LG

    Pessimistic Backward Policy for GFlowNets

    Authors: Hyosoon Jang, Yunhui Jang, Minsu Kim, Jinkyoo Park, Sungsoo Ahn

    Abstract: This paper studies Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function through the trajectory of state transitions. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories, which may lead to a large gap between the estimated flow and the (known) reward valu… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  38. arXiv:2405.15311  [pdf, other

    cs.CV cs.AI

    Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

    Authors: Khanh-Binh Nguyen, Chae Jung Park

    Abstract: Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurate… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  39. arXiv:2405.14150  [pdf, other

    cs.CL

    jp-evalb: Robust Alignment-based PARSEVAL Measures

    Authors: Jungyeul Park, Junrui Wang, Eunkyul Leah Jo, Angela Yoonseo Park

    Abstract: We introduce an evaluation system designed to compute PARSEVAL measures, offering a viable alternative to \texttt{evalb} commonly used for constituency parsing evaluation. The widely used \texttt{evalb} script has traditionally been employed for evaluating the accuracy of constituency parsing results, albeit with the requirement for consistent tokenization and sentence boundaries. In contrast, our… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: To appear in The system demonstration track at NAACL-HLT 2024

  40. arXiv:2405.12386  [pdf, other

    stat.ML cs.LG stat.AP stat.CO

    Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression

    Authors: Sisi Shao, Junhyung Park, Weng Kee Wong

    Abstract: General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  41. arXiv:2405.11905  [pdf, other

    cs.CV

    CSTA: CNN-based Spatiotemporal Attention for Video Summarization

    Authors: Jaewon Son, Jaehun Park, Kwangsu Kim

    Abstract: Video summarization aims to generate a concise representation of a video, capturing its essential content and key moments while reducing its overall length. Although several methods employ attention mechanisms to handle long-term dependencies, they often fail to capture the visual significance inherent in frames. To address this limitation, we propose a CNN-based SpatioTemporal Attention (CSTA) me… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024

  42. arXiv:2405.11867  [pdf, other

    cs.CV cs.LG cs.RO

    Depth Prompting for Sensor-Agnostic Depth Estimation

    Authors: Jin-Hwi Park, Chanhwi Jeong, Junoh Lee, Hae-Gon Jeon

    Abstract: Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-kno… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPR 2024

  43. arXiv:2405.11855  [pdf, other

    cs.RO

    Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments

    Authors: Jooyong Park, Jungwoo Lee, Euncheol Choi, Younggun Cho

    Abstract: In urban environments for delivery robots, particularly in areas such as campuses and towns, many custom features defy standard road semantic categorizations. Addressing this challenge, our paper introduces a method leveraging Salient Object Detection (SOD) to extract these unique features, employing them as pivotal factors for enhanced robot loop closure and localization. Traditional geometric fe… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 8 pages, 9 figures, 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  44. arXiv:2405.11807  [pdf, other

    cs.HC cs.RO eess.SY

    Dual-sided Peltier Elements for Rapid Thermal Feedback in Wearables

    Authors: Seongjun Kang, Gwangbin Kim, Seokhyun Hwang, Jeongju Park, Ahmed Elsharkawy, SeungJun Kim

    Abstract: This paper introduces a motor-driven Peltier device designed to deliver immediate thermal sensations within extended reality (XR) environments. The system incorporates eight motor-driven Peltier elements, facilitating swift transitions between warm and cool sensations by rotating preheated or cooled elements to opposite sides. A multi-layer structure, comprising aluminum and silicone layers, ensur… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 3 pages, 4 figures, ICRA Wearable Workshop 2024 - 1st Workshop on Advancing Wearable Devices and Applications through Novel Design, Sensing, Actuation, and AI

  45. arXiv:2405.11297  [pdf, other

    cs.CL

    Unveiling Key Aspects of Fine-Tuning in Sentence Embeddings: A Representation Rank Analysis

    Authors: Euna Jung, Jaeill Kim, Jungmin Ko, Jinwoo Park, Wonjong Rhee

    Abstract: The latest advancements in unsupervised learning of sentence embeddings predominantly involve employing contrastive learning-based (CL-based) fine-tuning over pre-trained language models. In this study, we analyze the latest sentence embedding methods by adopting representation rank as the primary tool of analysis. We first define Phase 1 and Phase 2 of fine-tuning based on when representation ran… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  46. arXiv:2405.09976  [pdf, other

    cs.CV eess.SP

    Language-Oriented Semantic Latent Representation for Image Transmission

    Authors: Giordano Cicchetti, Eleonora Grassucci, Jihong Park, Jinho Choi, Sergio Barbarossa, Danilo Comminiello

    Abstract: In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too c… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under review at IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024

  47. arXiv:2405.09866  [pdf, other

    eess.SP cs.LG

    Rethinking Multi-User Semantic Communications with Deep Generative Models

    Authors: Eleonora Grassucci, Jinho Choi, Jihong Park, Riccardo F. Gramaccioni, Giordano Cicchetti, Danilo Comminiello

    Abstract: In recent years, novel communication strategies have emerged to face the challenges that the increased number of connected devices and the higher quality of transmitted information are posing. Among them, semantic communication obtained promising results especially when combined with state-of-the-art deep generative models, such as large language or diffusion models, able to regenerate content fro… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under review in IEEE Journal on Selected Areas in Communications

  48. arXiv:2405.05967  [pdf, other

    cs.CV cs.GR cs.LG

    Distilling Diffusion Models into Conditional GANs

    Authors: Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

    Abstract: We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Project page: https://mingukkang.github.io/Diffusion2GAN/

  49. arXiv:2405.03958  [pdf, other

    cs.CV cs.AI cs.LG

    Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

    Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

    Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  50. arXiv:2405.03929  [pdf, other

    cs.AI physics.ao-ph

    Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations

    Authors: Jaesung Park, Sungchul Hong, Yoonseo Cho, Jong-June Jeon

    Abstract: Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architectur… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.