Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 426 results for author: Lee, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13779  [pdf, other

    cs.PL cs.DC

    Concurrent Data Structures Made Easy (Extended Version)

    Authors: Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey

    Abstract: Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Extended version of the OOPSLA'24 paper

  2. arXiv:2408.11294  [pdf, other

    cs.CL

    RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining

    Authors: Anh-Dung Vo, Minseong Jung, Wonbeen Lee, Daewoo Choi

    Abstract: The field of Natural Language Processing (NLP) has seen significant advancements with the development of Large Language Models (LLMs). However, much of this research remains focused on English, often overlooking low-resource languages like Korean. This oversight presents challenges due to the unique non-alphabetic token structure of Korean and the substantial memory and computational demands requi… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10356  [pdf, other

    cs.CV physics.data-an physics.soc-ph

    Diversity and stylization of the contemporary user-generated visual arts in the complexity-entropy plane

    Authors: Seunghwan Kim, Byunghwee Lee, Wonjae Lee

    Abstract: The advent of computational and numerical methods in recent times has provided new avenues for analyzing art historiographical narratives and tracing the evolution of art styles therein. Here, we investigate an evolutionary process underpinning the emergence and stylization of contemporary user-generated visual art styles using the complexity-entropy (C-H) plane, which quantifies local structures… ▽ More

    Submitted 21 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: 18 pages, 3 figures, 1 table, SI(4 figures, 3 tables)

  4. arXiv:2408.09111  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    Measuring Visual Sycophancy in Multimodal Models

    Authors: Jaehyuk Lim, Bruce W. Lee

    Abstract: This paper introduces and examines the phenomenon of "visual sycophancy" in multimodal language models, a term we propose to describe these models' tendency to disproportionately favor visually presented information, even when it contradicts their prior knowledge or responses. Our study employs a systematic methodology to investigate this phenomenon: we present models with images of multiple-choic… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  5. arXiv:2408.09049  [pdf, other

    cs.CL cs.AI cs.HC

    Language Models Show Stable Value Orientations Across Diverse Role-Plays

    Authors: Bruce W. Lee, Yeongheon Lee, Hyunsoo Cho

    Abstract: We demonstrate that large language models (LLMs) exhibit consistent value orientations despite adopting diverse personas, revealing a persistent inertia in their responses that remains stable across the variety of roles they are prompted to assume. To systematically explore this phenomenon, we introduce the role-play-at-scale methodology, which involves prompting LLMs with randomized, diverse pers… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.06201  [pdf, other

    cs.HC cs.IR

    Investigating Characteristics of Media Recommendation Solicitation in r/ifyoulikeblank

    Authors: Md Momen Bhuiyan, Donghan Hu, Andrew Jelson, Tanushree Mitra, Sang Won Lee

    Abstract: Despite the existence of search-based recommender systems like Google, Netflix, and Spotify, online users sometimes may turn to crowdsourced recommendations in places like the r/ifyoulikeblank subreddit. In this exploratory study, we probe why users go to r/ifyoulikeblank, how they look for recommendation, and how the subreddit users respond to recommendation requests. To answer, we collected samp… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: page 23

  7. arXiv:2408.06065  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    An Investigation Into Explainable Audio Hate Speech Detection

    Authors: Jinmyeong An, Wonjun Lee, Yejin Jeon, Jungseul Ok, Yunsu Kim, Gary Geunbae Lee

    Abstract: Research on hate speech has predominantly revolved around detection and interpretation from textual inputs, leaving verbal content largely unexplored. While there has been limited exploration into hate speech detection within verbal acoustic speech inputs, the aspect of interpretability has been overlooked. Therefore, we introduce a new task of explainable audio hate speech detection. Specifically… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to SIGDIAL 2024

  8. arXiv:2408.06043  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning

    Authors: Wonjun Lee, San Kim, Gary Geunbae Lee

    Abstract: Recent dialogue systems rely on turn-based spoken interactions, requiring accurate Automatic Speech Recognition (ASR). Errors in ASR can significantly impact downstream dialogue tasks. To address this, using dialogue context from user and agent interactions for transcribing subsequent utterances has been proposed. This method incorporates the transcription of the user's speech and the agent's resp… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 2 figures, Accepted to SIGDIAL2024

  9. arXiv:2408.05694  [pdf, other

    cs.CR

    ICSFuzz: Collision Detector Bug Discovery in Autonomous Driving Simulators

    Authors: Weiwei Fu, Heqing Huang, Yifan Zhang, Ke Zhang, Jin Huang, Wei-Bin Lee, Jianping Wang

    Abstract: With the increasing adoption of autonomous vehicles, ensuring the reliability of autonomous driving systems (ADSs) deployed on autonomous vehicles has become a significant concern. Driving simulators have emerged as crucial platforms for testing autonomous driving systems, offering realistic, dynamic, and configurable environments. However, existing simulation-based ADS testers have largely overlo… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  10. arXiv:2408.03601  [pdf, other

    cs.RO

    DRAMA: An Efficient End-to-end Motion Planner for Autonomous Driving with Mamba

    Authors: Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Zefan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, Marcelo H. Ang Jr

    Abstract: Motion planning is a challenging task to generate safe and feasible trajectories in highly dynamic and complex environments, forming a core capability for autonomous vehicles. In this paper, we propose DRAMA, the first Mamba-based end-to-end motion planner for autonomous vehicles. DRAMA fuses camera, LiDAR Bird's Eye View images in the feature space, as well as ego status information, to generate… ▽ More

    Submitted 14 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  11. Hierarchical Neural Constructive Solver for Real-world TSP Scenarios

    Authors: Yong Liang Goh, Zhiguang Cao, Yining Ma, Yanfei Dong, Mohammed Haroon Dupty, Wee Sun Lee

    Abstract: Existing neural constructive solvers for routing problems have predominantly employed transformer architectures, conceptualizing the route construction as a set-to-sequence learning task. However, their efficacy has primarily been demonstrated on entirely random problem instances that inadequately capture real-world scenarios. In this paper, we introduce realistic Traveling Salesman Problem (TSP)… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to KDD 2024

  12. arXiv:2408.02944  [pdf, ps, other

    eess.SP cs.AI eess.SY

    LLM-Empowered Resource Allocation in Wireless Communications Systems

    Authors: Woongsup Lee, Jeonghun Park

    Abstract: The recent success of large language models (LLMs) has spurred their application in various fields. In particular, there have been efforts to integrate LLMs into various aspects of wireless communication systems. The use of LLMs in wireless communication systems has the potential to realize artificial general intelligence (AGI)-enabled wireless networks. In this paper, we investigate an LLM-based… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: submitted to possible IEEE journal

  13. arXiv:2407.12614  [pdf

    cs.CV

    Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm

    Authors: Shiyu Liu, Congliang Zhou, Won Suk Lee

    Abstract: The strawberry industry yields significant economic benefits for Florida, yet the process of monitoring strawberry growth and yield is labor-intensive and costly. The development of machine learning-based detection and tracking methodologies has been used for helping automated monitoring and prediction of strawberry yield, still, enhancement has been limited as previous studies only applied the de… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.10495  [pdf, other

    cs.LG cs.CV

    Improving Hyperbolic Representations via Gromov-Wasserstein Regularization

    Authors: Yifei Yang, Wonjun Lee, Dongmian Zou, Gilad Lerman

    Abstract: Hyperbolic representations have shown remarkable efficacy in modeling inherent hierarchies and complexities within data structures. Hyperbolic neural networks have been commonly applied for learning such representations from data, but they often fall short in preserving the geometric structures of the original feature spaces. In response to this challenge, our work applies the Gromov-Wasserstein (… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  15. arXiv:2407.09303  [pdf, other

    cs.CV

    ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

    Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee

    Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework calle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project Page: https://sungmin-woo.github.io/prodepth/

  16. arXiv:2407.06372  [pdf, other

    cs.LG cs.CV

    Non-Robust Features are Not Always Useful in One-Class Classification

    Authors: Matthew Lau, Haoran Wang, Alec Helbling, Matthew Hul, ShengYun Peng, Martin Andreoni, Willian T. Lunardi, Wenke Lee

    Abstract: The robustness of machine learning models has been questioned by the existence of adversarial examples. We examine the threat of adversarial examples in practical applications that require lightweight models for one-class classification. Building on Ilyas et al. (2019), we investigate the vulnerability of lightweight one-class classifiers to adversarial attacks and possible reasons for it. Our res… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR Visual and Anomaly Detection (VAND) Workshop 2024

    MSC Class: 68T45 ACM Class: I.2.10; I.4.10; I.5.4

  17. arXiv:2407.05516  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

    Authors: Jin Woo Lee, Jaehyun Park, Min Jun Choi, Kyogu Lee

    Abstract: While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling wit… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  18. arXiv:2406.19707  [pdf, other

    cs.LG cs.DC

    InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

    Authors: Wonbeom Lee, Jungi Lee, Junghwan Seo, Jaewoong Sim

    Abstract: Transformer-based large language models (LLMs) demonstrate impressive performance across various natural language processing tasks. Serving LLM inference for generating long contents, however, poses a challenge due to the enormous memory footprint of the transient state, known as the key-value (KV) cache, which scales with the sequence length and batch size. In this paper, we present InfiniGen, a… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: OSDI 2024

  19. arXiv:2406.18109  [pdf, other

    cs.DC

    Composing Distributed Computations Through Task and Kernel Fusion

    Authors: Rohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael Bauer, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  20. arXiv:2406.15723  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

    Authors: Heejin Do, Wonjun Lee, Gary Geunbae Lee

    Abstract: In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly inte… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  21. arXiv:2406.12930  [pdf, other

    cs.LG cs.AR

    Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

    Authors: Jungi Lee, Wonbeom Lee, Jaewoong Sim

    Abstract: Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning and have thus become one of the most important workloads in today's computing landscape. However, deploying LLM inference poses challenges due to the high compute and memory requirements stemming from the enormous model size and the difficulty of running it in the integer pipelines. In this paper,… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: To appear at the 51st International Symposium on Computer Architecture (ISCA 2024)

  22. arXiv:2406.11707  [pdf, other

    cs.CR cs.CV cs.LG

    A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

    Authors: Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jianping Wang

    Abstract: Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 33rd USENIX Security Symposium 2024

  23. Eliciting New Perspectives in RtD Studies through Annotated Portfolios: A Case Study of Robotic Artefacts

    Authors: Marius Hoggenmuller, Wen-Ying Lee, Luke Hespanhol, Malte Jung, Martin Tomitsch

    Abstract: In this paper, we investigate how to elicit new perspectives in research-through-design (RtD) studies through annotated portfolios. Situating the usage in human-robot interaction (HRI), we used two robotic artefacts as a case study: we first created our own annotated portfolio and subsequently ran online workshops during which we asked HRI experts to annotate our robotic artefacts. We report on th… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  24. arXiv:2406.07485  [pdf, other

    cs.HC

    PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction

    Authors: Adnan Abbas, Sang Won Lee

    Abstract: Efficient task planning is essential for productivity and mental well-being, yet individuals often struggle to create realistic plans and reflect upon their productivity. Leveraging the advancement in artificial intelligence (AI), conversational agents have emerged as a promising tool for enhancing productivity. Our work focuses on externalizing plans through conversation, aiming to solidify inten… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  25. arXiv:2406.03867  [pdf, other

    quant-ph cs.ET

    A Comprehensive Study of Quantum Arithmetic Circuits

    Authors: Siyi Wang, Xiufan Li, Wei Jie Bryan Lee, Suman Deb, Eugene Lim, Anupam Chattopadhyay

    Abstract: In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Under review at the Royal Society's Philosophical Transactions A

  26. arXiv:2406.01512  [pdf, other

    cs.CL

    MAD: Multi-Alignment MEG-to-Text Decoding

    Authors: Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, Jinni Zhou, Won Hee Lee, Renjing Xu, Hui Xiong

    Abstract: Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predomi… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2405.16731  [pdf, other

    cs.LG cs.NE

    Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

    Authors: Jeonghwan Cheon, Sang Wan Lee, Se-Bum Paik

    Abstract: The brain prepares for learning even before interacting with the environment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However, the mechanism of such a process has yet to be thoroughly understood, and it is unclear whether this process can benefit the algorithm of machine learning. Here, we study this issue using a neural network wi… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  28. arXiv:2405.16450  [pdf, other

    cs.LG cs.AI cs.PL

    Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

    Authors: Max Liu, Chan-Hung Yu, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen, Shao-Hua Sun

    Abstract: Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search fra… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  29. arXiv:2405.16185  [pdf, other

    cs.LG cs.AI

    Differentiable Cluster Graph Neural Network

    Authors: Yanfei Dong, Mohammed Haroon Dupty, Lambert Deng, Zhuanghua Liu, Yong Liang Goh, Wee Sun Lee

    Abstract: Graph Neural Networks often struggle with long-range information propagation and in the presence of heterophilous neighborhoods. We address both challenges with a unified framework that incorporates a clustering inductive bias into the message passing mechanism, using additional cluster-nodes. Central to our approach is the formulation of an optimal transport based implicit clustering objective fu… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  30. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.13968  [pdf, other

    cs.HC

    TaleMate: Exploring the use of Voice Agents for Parent-Child Joint Reading Experiences

    Authors: Daniel Vargas-Diaz, Jisun Kim, Sulakna Karunaratna, Maegan Reinhardt, Caroline Hornburg, Koeun Choi, Sang Won Lee

    Abstract: Joint reading is a key activity for early learners, with caregiver-child interactions such as questioning and feedback playing an essential role in children's cognitive and linguistic development. However, for some parents, actively engaging children in storytelling can be challenging. To address this, we introduce TaleMate a platform designed to enhance shared reading by leveraging conversational… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 4 pages, 2 figures, CHI 2024 Workshop on Child-centred AI Design

  32. arXiv:2405.13890  [pdf, other

    cs.HC

    An empirical study to understand how students use ChatGPT for writing essays and how it affects their ownership

    Authors: Andrew Jelson, Sang Won Lee

    Abstract: As large language models (LLMs) become more powerful and ubiquitous, systems like ChatGPT are increasingly used by students to help them with writing tasks. To better understand how these tools are used, we investigate how students might use an LLM for essay writing, for example, to study the queries asked to ChatGPT and the responses that ChatGPT gives. To that end, we plan to conduct a user stud… ▽ More

    Submitted 4 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 5 pages, 2 figures, submitted and accepted to ACM CHI Workshop In2Writing in 2024

  33. arXiv:2405.13154  [pdf, other

    cs.HC

    Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming

    Authors: Xinyi Zhang, Pamela J. Wisniewski, Jin-hee Cho, Lifu Huang, Sang Won Lee

    Abstract: Cybergrooming emerges as a growing threat to adolescent safety and mental health. One way to combat cybergrooming is to leverage predictive artificial intelligence (AI) to detect predatory behaviors in social media. However, these methods can encounter challenges like false positives and negative implications such as privacy concerns. Another complementary strategy involves using generative artifi… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  34. arXiv:2405.09713  [pdf, other

    cs.CV cs.AI cs.CL

    SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

    Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

    Abstract: Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically withi… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR

  35. Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

    Authors: Yanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin, Francesco Gelli, Soujanya Poria, Wee Sun Lee

    Abstract: Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinator… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  36. arXiv:2405.06459  [pdf, other

    cs.CL cs.AI

    Are EEG-to-Text Models Working?

    Authors: Hyejeong Jo, Yiqian Yang, Juhyeok Han, Yiqun Duan, Hui Xiong, Won Hee Lee

    Abstract: This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  37. arXiv:2405.01028  [pdf, other

    cs.CV

    Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores

    Authors: Kiyoon Jeong, Woojun Lee, Woongchan Nam, Minjeong Ma, Pilsung Kang

    Abstract: This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that account… ▽ More

    Submitted 13 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  38. Reactive Composition of UAV Delivery Services in Urban Environments

    Authors: Woojin Lee, Babar Shahzaad, Balsam Alkouz, Athman Bouguettaya

    Abstract: We propose a novel failure-aware reactive UAV delivery service composition framework. A skyway network infrastructure is presented for the effective provisioning of services in urban areas. We present a formal drone delivery service model and a system architecture for reactive drone delivery services. We develop radius-based, cell density-based, and two-phased algorithms to reduce the search space… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 14 pages, 18 figures. This is an accepted paper and it is going to appear in IEEE Transactions on Intelligent Transportation Systems (T-ITS)

    Journal ref: IEEE Transactions on Intelligent Transportation Systems 2024

  39. arXiv:2404.17020  [pdf, other

    cs.SE cs.AI

    Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach

    Authors: Cristopher McIntyre-Garcia, Adrien Heymans, Beril Borali, Won-Sook Lee, Shiva Nejati

    Abstract: Deep Learning (DL) models excel in computer vision tasks but can be susceptible to adversarial examples. This paper introduces Triple-Metric EvoAttack (TM-EVO), an efficient algorithm for evaluating the robustness of object-detection DL models against adversarial attacks. TM-EVO utilizes a multi-metric fitness function to guide an evolutionary search efficiently in creating effective adversarial t… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  40. arXiv:2404.11041  [pdf, other

    cs.AI cs.LG

    On the Empirical Complexity of Reasoning and Planning in LLMs

    Authors: Liwei Kang, Zirui Zhao, David Hsu, Wee Sun Lee

    Abstract: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimente… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  41. arXiv:2404.10633  [pdf, other

    cs.CV

    Contextrast: Contextual Contrastive Learning for Semantic Segmentation

    Authors: Changki Sung, Wanhee Kim, Jungho An, Wooju Lee, Hyungtae Lim, Hyun Myung

    Abstract: Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive lea… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  42. arXiv:2404.03887  [pdf, other

    cs.CL cs.AI

    SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

    Authors: Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park

    Abstract: This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  43. arXiv:2404.02754  [pdf, ps, other

    cs.LG

    Continual Learning of Numerous Tasks from Long-tail Distributions

    Authors: Liwei Kang, Wee Sun Lee

    Abstract: Continual learning, an important aspect of artificial intelligence and machine learning research, focuses on developing models that learn and adapt to new tasks while retaining previously acquired knowledge. Existing continual learning algorithms usually involve a small number of tasks with uniform sizes and may not accurately represent real-world learning scenarios. In this paper, we investigate… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  44. arXiv:2404.02135  [pdf, other

    cs.CV eess.IV

    Enhancing Ship Classification in Optical Satellite Imagery: Integrating Convolutional Block Attention Module with ResNet for Improved Performance

    Authors: Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Junseob Shin, Hyerin Cha, Seung Won Lee

    Abstract: In this study, we present an advanced convolutional neural network (CNN) architecture for ship classification based on optical satellite imagery, which significantly enhances performance through the integration of a convolutional block attention module (CBAM) and additional architectural innovations. Building upon the foundational ResNet50 model, we first incorporated a standard CBAM to direct the… ▽ More

    Submitted 20 August, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE Access on August 16, 2024

  45. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  46. arXiv:2404.00385  [pdf, other

    cs.CV cs.AI cs.LG

    Constrained Layout Generation with Factor Graphs

    Authors: Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng, Guoji Fu, Yong Liang Goh, Wei Lu, Wee Sun Lee

    Abstract: This paper addresses the challenge of object-centric layout generation under spatial constraints, seen in multiple domains including floorplan design process. The design process typically involves specifying a set of spatial constraints that include object attributes like size and inter-object relations such as relative positioning. Existing works, which typically represent objects as single nodes… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: To be published at IEEE/CVF CVPR 2024

  47. arXiv:2403.18406  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM

    Authors: Wonkyun Kim, Changin Choi, Wonseok Lee, Wonjong Rhee

    Abstract: Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced vision encoders with LLMs. Recently, an alternative strategy has surfaced, employing readily availab… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Our code is available at https://github.com/imagegridworth/IG-VLM

  48. arXiv:2403.15517  [pdf, other

    cs.LG cs.CV

    Improving Forward Compatibility in Class Incremental Learning by Increasing Representation Rank and Feature Richness

    Authors: Jaeill Kim, Wonseok Lee, Moonjung Eo, Wonjong Rhee

    Abstract: Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible me… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  49. arXiv:2403.15249  [pdf, other

    cs.CV cs.AI cs.LG

    Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

    Authors: Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee, Jong Chul Ye

    Abstract: The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive fram… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://geonyeong-park.github.io/spectral-motion-alignment/

  50. arXiv:2403.11330  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

    Authors: Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

    Abstract: We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI) multimodal reward signals to… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: 10 pages, 3 figures, 2 tables