Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 383 results for author: Lee, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16042  [pdf, other

    cs.CV

    Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

    Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: The project page is available at https://ku-cvlab.github.io/Diff-ID/

  2. arXiv:2406.12246  [pdf, other

    cs.LG cs.CL cs.CV

    TroL: Traversal of Layers for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro

    Abstract: Large language and vision models (LLVMs) have been driven by the generalization power of large language models (LLMs) and the advent of visual instruction tuning. Along with scaling them up directly, these models enable LLVMs to showcase powerful vision language (VL) performances by covering diverse tasks via natural language instructions. However, existing open-source LLVMs that perform comparabl… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Code is available in https://github.com/ByungKwanLee/TroL

  3. arXiv:2406.08719  [pdf, other

    cs.CR

    TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution

    Authors: Juhee Kim, Jinbum Park, Sihyeon Roh, Jaeyoung Chung, Youngjoo Lee, Taesoo Kim, Byoungyoung Lee

    Abstract: ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential secur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.06316  [pdf, other

    cs.CL cs.AI cs.CE cs.LG

    Tx-LLM: A Large Language Model for Therapeutics

    Authors: Juan Manuel Zambrano Chaves, Eric Wang, Tao Tu, Eeshit Dhaval Vaishnav, Byron Lee, S. Sara Mahdavi, Christopher Semturs, David Fleet, Vivek Natarajan, Shekoofeh Azizi

    Abstract: Developing therapeutics is a lengthy and expensive process that requires the satisfaction of many different criteria, and AI models capable of expediting the process would be invaluable. However, the majority of current AI approaches address only a narrowly defined set of tasks, often circumscribed within a particular domain. To bridge this gap, we introduce Tx-LLM, a generalist large language mod… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  5. arXiv:2406.06072  [pdf, other

    cs.CV cs.LG cs.RO

    Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

    Authors: Dongyoon Hwang, Byungkun Lee, Hojoon Lee, Hyunseung Kim, Jaegul Choo

    Abstract: Vision Transformers (ViT), when paired with large-scale pretraining, have shown remarkable performance across various computer vision tasks, primarily due to their weak inductive bias. However, while such weak inductive bias aids in pretraining scalability, this may hinder the effective adaptation of ViTs for visuo-motor control tasks as a result of the absence of control-centric inductive biases.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  6. arXiv:2406.05431  [pdf

    cs.CL

    MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

    Authors: Gyeong Hoon Yi, Jiwoo Choi, Hyeongyun Song, Olivia Miano, Jaewoong Choi, Kihoon Bang, Byungju Lee, Seok Su Sohn, David Buttler, Anna Hiszpanski, Sang Soo Han, Donghun Kim

    Abstract: Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  7. arXiv:2406.03867  [pdf, other

    quant-ph cs.ET

    A Comprehensive Study of Quantum Arithmetic Circuits

    Authors: Siyi Wang, Xiufan Li, Wei Jie Bryan Lee, Suman Deb, Eugene Lim, Anupam Chattopadhyay

    Abstract: In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Under review at the Royal Society's Philosophical Transactions A

  8. arXiv:2406.02562  [pdf, other

    eess.AS cs.AI cs.CL

    Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

    Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

    Abstract: In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter… ▽ More

    Submitted 23 April, 2024; originally announced June 2024.

    Comments: Table 2 is revised

    Journal ref: ICASSP 2024 Workshop(HSCMA 2024) paper

  9. arXiv:2406.01570  [pdf, ps, other

    cs.LG eess.SY stat.ML

    Single Trajectory Conformal Prediction

    Authors: Brian Lee, Nikolai Matni

    Abstract: We study the performance of risk-controlling prediction sets (RCPS), an empirical risk minimization-based formulation of conformal prediction, with a single trajectory of temporally correlated data from an unknown stochastic dynamical system. First, we use the blocking technique to show that RCPS attains performance guarantees similar to those enjoyed in the iid setting whenever data is generated… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 16 pages

  10. arXiv:2406.00324  [pdf, other

    cs.LG cs.AI

    Do's and Don'ts: Learning Desirable Skills with Instruction Videos

    Authors: Hyunseung Kim, Byungkun Lee, Hojoon Lee, Dongyoon Hwang, Donghu Kim, Jaegul Choo

    Abstract: Unsupervised skill discovery is a learning paradigm that aims to acquire diverse behaviors without explicit rewards. However, it faces challenges in learning complex behaviors and often leads to learning unsafe or undesirable behaviors. For instance, in various continuous control tasks, current unsupervised skill discovery methods succeed in learning basic locomotions like standing but struggle wi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  11. arXiv:2405.17918  [pdf, other

    cs.LG cs.AI

    Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation

    Authors: Dong Bok Lee, Aoxuan Silvia Zhang, Byungjoo Kim, Junhyeon Park, Juho Lee, Sung Ju Hwang, Hae Beom Lee

    Abstract: In this paper, we address the problem of cost-sensitive multi-fidelity Bayesian Optimization (BO) for efficient hyperparameter optimization (HPO). Specifically, we assume a scenario where users want to early-stop the BO when the performance improvement is not satisfactory with respect to the required computational cost. Motivated by this scenario, we introduce utility, which is a function predefin… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.15574  [pdf, other

    cs.CV

    Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Chae Won Kim, Beomchan Park, Yong Man Ro

    Abstract: The rapid development of large language and vision models (LLVMs) has been driven by advances in visual instruction tuning. Recently, open-source LLVMs have curated high-quality visual instruction tuning datasets and utilized additional vision encoders or multiple computer vision models in order to narrow the performance gap with powerful closed-source LLVMs. These advancements are attributed to m… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Code is available in https://github.com/ByungKwanLee/Meteor

  13. arXiv:2405.13858  [pdf, other

    cs.DC cs.AR cs.ET cs.LG

    Carbon Connect: An Ecosystem for Sustainable Computing

    Authors: Benjamin C. Lee, David Brooks, Arthur van Benthem, Udit Gupta, Gage Hills, Vincent Liu, Benjamin Pierce, Christopher Stewart, Emma Strubell, Gu-Yeon Wei, Adam Wierman, Yuan Yao, Minlan Yu

    Abstract: Computing is at a moment of profound opportunity. Emerging applications -- such as capable artificial intelligence, immersive virtual realities, and pervasive sensor systems -- drive unprecedented demand for computer. Despite recent advances toward net zero carbon emissions, the computing industry's gross energy usage continues to rise at an alarming rate, outpacing the growth of new energy instal… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  14. arXiv:2405.00260  [pdf, other

    cs.CV

    CREPE: Coordinate-Aware End-to-End Document Parser

    Authors: Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim, Moon Bin Yim, Seunghyun Park, Bado Lee

    Abstract: In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OC… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024) main conference

  15. Cost-Driven Data Replication with Predictions

    Authors: Tianyu Zuo, Xueyan Tang, Bu Sung Lee

    Abstract: This paper studies an online replication problem for distributed data access. The goal is to dynamically create and delete data copies in a multi-server system as time passes to minimize the total storage and network cost of serving access requests. We study the problem in the emergent learning-augmented setting, assuming simple binary predictions about inter-request times at individual servers. W… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: The formal version of this draft will appear in ACM SPAA'24 conference

  16. arXiv:2404.09030  [pdf, other

    eess.SY cs.LG

    Active Learning for Control-Oriented Identification of Nonlinear Systems

    Authors: Bruce D. Lee, Ingvar Ziemann, George J. Pappas, Nikolai Matni

    Abstract: Model-based reinforcement learning is an effective approach for controlling an unknown system. It is based on a longstanding pipeline familiar to the control community in which one performs experiments on the environment to collect a dataset, uses the resulting dataset to identify a model of the system, and finally performs control synthesis using the identified model. As interacting with the syst… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  17. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  18. arXiv:2404.01636  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SY

    Learning to Control Camera Exposure via Reinforcement Learning

    Authors: Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee

    Abstract: Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance degradation. Traditional camera exposure control methods require multiple convergence steps and time-consuming processes, making them unsuitable for dynamic lighting conditions. In t… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024, *First two authors contributed equally to this work. Project page link: https://sites.google.com/view/drl-ae

  19. arXiv:2403.19985  [pdf, other

    cs.CV

    Stable Surface Regularization for Fast Few-Shot NeRF

    Authors: Byeongin Joung, Byeong-Uk Lee, Jaesung Choe, Ukcheol Shin, Minjun Kang, Taeyeop Lee, In So Kweon, Kuk-Jin Yoon

    Abstract: This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 3DV 2024

  20. arXiv:2403.18222  [pdf, other

    cs.RO cs.LG

    Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

    Authors: Bo Wu, Bruce D. Lee, Kostas Daniilidis, Bernadette Bucher, Nikolai Matni

    Abstract: Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifical… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures

  21. arXiv:2403.15692  [pdf, other

    cs.IT eess.SP

    Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption

    Authors: Donghwa Han, Bowhyung Lee, Min Jang, Donghun Lee, Seho Myung, Namyoon Lee

    Abstract: Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  22. Visual Highlighting for Situated Brushing and Linking

    Authors: Nina Doerr, Benjamin Lee, Katarina Baricova, Dieter Schmalstieg, Michael Sedlmair

    Abstract: Brushing and linking is widely used for visual analytics in desktop environments. However, using this approach to link many data items between situated (e.g., a virtual screen with data) and embedded views (e.g., highlighted objects in the physical environment) is largely unexplored. To this end, we study the effectiveness of visual highlighting techniques in helping users identify and link physic… ▽ More

    Submitted 11 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: published at EuroVis 2024

  23. Putting Our Minds Together: Iterative Exploration for Collaborative Mind Mapping

    Authors: Ying Yang, Tim Dwyer, Zachari Swiecki, Benjamin Lee, Michael Wybrow, Maxime Cordeil, Teresa Wulandari, Bruce H. Thomas, Mark Billinghurst

    Abstract: We delineate the development of a mind-mapping system designed concurrently for both VR and desktop platforms. Employing an iterative methodology with groups of users, we systematically examined and improved various facets of our system, including interactions, communication mechanisms and gamification elements, to streamline the mind-mapping process while augmenting situational awareness and prom… ▽ More

    Submitted 23 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted at AHs 2024

  24. arXiv:2403.07508  [pdf, other

    cs.CV

    MoAI: Mixture of All Intelligence for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

    Abstract: The rise of large language models (LLMs) and instruction tuning has led to the current trend of instruction-tuned large language and vision models (LLVMs). This trend involves either meticulously curating numerous instruction tuning datasets tailored to specific objectives or enlarging LLVMs to manage vast amounts of vision language (VL) data. However, current LLVMs have disregarded the detailed a… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Code available: https://github.com/ByungKwanLee/MoAI

  25. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  26. arXiv:2403.02568  [pdf, other

    cs.HC

    Designing Born-Accessible Courses in Data Science and Visualization: Challenges and Opportunities of a Remote Curriculum Taught by Blind Instructors to Blind Students

    Authors: JooYoung Seo, Sile O'Modhrain, Yilin Xia, Sanchita Kamath, Bongshin Lee, James M. Coughlan

    Abstract: While recent years have seen a growing interest in accessible visualization tools and techniques for blind people, little attention is paid to the learning opportunities and teaching strategies of data science and visualization tailored for blind individuals. Whereas the former focuses on the accessibility issues of data visualization tools, the latter is concerned with the learnability of concept… ▽ More

    Submitted 22 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  27. arXiv:2403.01827  [pdf

    cs.NE cs.AI

    Analysis and Fully Memristor-based Reservoir Computing for Temporal Data Classification

    Authors: Ankur Singh, Sanghyeon Choi, Gunuk Wang, Maryaradhiya Daimari, Byung-Geun Lee

    Abstract: Reservoir computing (RC) offers a neuromorphic framework that is particularly effective for processing spatiotemporal signals. Known for its temporal processing prowess, RC significantly lowers training costs compared to conventional recurrent neural networks. A key component in its hardware deployment is the ability to generate dynamic reservoir states. Our research introduces a novel dual-memory… ▽ More

    Submitted 16 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 22 pages, 20 figures, Journal, Typo corrected and updated reference

  28. MAIDR: Making Statistical Visualizations Accessible with Multimodal Data Representation

    Authors: JooYoung Seo, Yilin Xia, Bongshin Lee, Sean McCurry, Yu Jun Yam

    Abstract: This paper investigates new data exploration experiences that enable blind users to interact with statistical data visualizations$-$bar plots, heat maps, box plots, and scatter plots$-$leveraging multimodal data representations. In addition to sonification and textual descriptions that are commonly employed by existing accessible visualizations, our MAIDR (multimodal access and interactive data re… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CHI 2024. Source code is available at https://github.com/xability/maidr

  29. arXiv:2402.11349  [pdf, other

    cs.CL cs.AI

    Language Models Don't Learn the Physical Manifestation of Language

    Authors: Bruce W. Lee, JaeHyuk Lim

    Abstract: We argue that language-only models don't learn the physical manifestation of language. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test. These tasks highlight a fundamental gap between human linguistic understanding and the sensory-deprived linguistic understanding of LLMs. In support of our hypothesis, 1. deliberate reasoning… ▽ More

    Submitted 6 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Main

  30. arXiv:2402.11248  [pdf, other

    cs.CV

    CoLLaVO: Crayon Large Language and Vision mOdel

    Authors: Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

    Abstract: The remarkable success of Large Language Models (LLMs) and instruction tuning drives the evolution of Vision Language Models (VLMs) towards a versatile general-purpose model. Yet, it remains unexplored whether current VLMs genuinely possess quality object-level image understanding capabilities determined from 'what objects are in the image?' or 'which object corresponds to a specified bounding box… ▽ More

    Submitted 2 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings. Code available: https://github.com/ByungKwanLee/CoLLaVO

  31. Design Space of Visual Feedforward And Corrective Feedback in XR-Based Motion Guidance Systems

    Authors: Xingyao Yu, Benjamin Lee, Michael Sedlmair

    Abstract: Extended reality (XR) technologies are highly suited in assisting individuals in learning motor skills and movements -- referred to as motion guidance. In motion guidance, the "feedforward" provides instructional cues of the motions that are to be performed, whereas the "feedback" provides cues which help correct mistakes and minimize errors. Designing synergistic feedforward and feedback is vital… ▽ More

    Submitted 16 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: To appear in ACM CHI 2024

  32. arXiv:2402.07381  [pdf, other

    cs.IT

    RIS-Empowered LEO Satellite Networks for 6G: Promising Usage Scenarios and Future Directions

    Authors: Mesut Toka, Byungju Lee, Jaehyup Seong, Aryan Kaushik, Juhwan Lee, Jungwoo Lee, Namyoon Lee, Wonjae Shin, H. Vincent Poor

    Abstract: Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has eme… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: 18 pages, 5 figures, Paper accepted by IEEE Communications Magazine

  33. arXiv:2402.05330  [pdf, other

    stat.ML cs.LG

    Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

    Authors: Luca Masserano, Alex Shen, Michele Doro, Tommaso Dorigo, Rafael Izbicki, Ann B. Lee

    Abstract: An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 26 pages, 19 figures, code available at https://github.com/lee-group-cmu/lf2i

  34. arXiv:2402.01969  [pdf, other

    cs.LG eess.SP

    Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction

    Authors: Ahmed P. Mohamed, Byunghyun Lee, Yaguang Zhang, Max Hollingsworth, C. Robert Anderson, James V. Krogmeier, David J. Love

    Abstract: Machine learning (ML) offers a promising solution to pathloss prediction. However, its effectiveness can be degraded by the limited availability of data. To alleviate these challenges, this paper introduces a novel simulation-enhanced data augmentation method for ML pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected re… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 6 pages, 5 figures, Accepted at ICC 2024

  35. arXiv:2402.01915  [pdf, other

    cs.CV stat.CO

    Robust Inverse Graphics via Probabilistic Inference

    Authors: Tuan Anh Le, Pavel Sountsov, Matthew D. Hoffman, Ben Lee, Brian Patton, Rif A. Saurous

    Abstract: How do we infer a 3D scene from a single image in the presence of corruptions like rain, snow or fog? Straightforward domain randomization relies on knowing the family of corruptions ahead of time. Here, we propose a Bayesian approach-dubbed robust inverse graphics (RIG)-that relies on a strong scene prior and an uninformative uniform corruption prior, making it applicable to a wide range of corru… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML submission. Reworked main body, new appendix figures

  36. arXiv:2401.09728  [pdf, other

    cs.LG

    Offline Imitation Learning by Controlling the Effective Planning Horizon

    Authors: Hee-Jun Ahn, Seong-Woong Shim, Byung-Jun Lee

    Abstract: In offline imitation learning (IL), we generally assume only a handful of expert trajectories and a supplementary offline dataset from suboptimal behaviors to learn the expert policy. While it is now common to minimize the divergence between state-action visitation distributions so that the agent also considers the future consequences of an action, a sampling error in an offline dataset may lead t… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Preprint

  37. arXiv:2401.00834  [pdf, other

    cs.CV

    Deblurring 3D Gaussian Splatting

    Authors: Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park

    Abstract: Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model… ▽ More

    Submitted 26 May, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: 29 pages, 16 figures

  38. arXiv:2401.00825  [pdf, other

    cs.CV cs.GR eess.IV

    Sharp-NeRF: Grid-based Fast Deblurring Neural Radiance Fields Using Sharpness Prior

    Authors: Byeonghyeon Lee, Howoong Lee, Usman Ali, Eunbyung Park

    Abstract: Neural Radiance Fields (NeRF) have shown remarkable performance in neural rendering-based novel view synthesis. However, NeRF suffers from severe visual quality degradation when the input images have been captured under imperfect conditions, such as poor illumination, defocus blurring, and lens aberrations. Especially, defocus blur is quite common in the images when they are normally captured usin… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted to WACV 2024

  39. arXiv:2401.00073  [pdf, other

    eess.SY cs.LG

    Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

    Authors: Bruce D. Lee, Anders Rantzer, Nikolai Matni

    Abstract: The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-tr… ▽ More

    Submitted 21 May, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

  40. arXiv:2312.14492  [pdf, other

    cs.CV

    Context Enhanced Transformer for Single Image Object Detection

    Authors: Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-bas… ▽ More

    Submitted 26 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Project page: https://ku-cvlab.github.io/CETR

  41. arXiv:2312.13027  [pdf, other

    cs.LG cs.CV

    Doubly Perturbed Task Free Continual Learning

    Authors: Byung Hyun Lee, Min-hwan Oh, Se Young Chun

    Abstract: Task Free online continual learning (TF-CL) is a challenging problem where the model incrementally learns tasks without explicit task information. Although training with entire data from the past, present as well as future is considered as the gold standard, naive approaches in TF-CL with the current samples may be conflicted with learning with samples in the future, leading to catastrophic forget… ▽ More

    Submitted 18 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024 (Oral)

  42. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  43. arXiv:2312.11144  [pdf, other

    cs.HC

    LSDvis: Hallucinatory Data Visualisations in Real World Environments

    Authors: Ari Kouts, Lonni Besançon, Michael Sedlmair, Benjamin Lee

    Abstract: We propose the concept of "LSDvis": the (highly exaggerated) visual blending of situated visualisations and the real-world environment to produce data representations that resemble hallucinations. Such hallucinatory visualisations incorporate elements of the physical environment, twisting and morphing their appearance such that they become part of the visualisation itself. We demonstrate LSDvis in… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Presented at the alt.VIS workshop at IEEE VIS 2023: https://altvis.github.io/

  44. arXiv:2312.09603  [pdf, other

    cs.SD cs.LG eess.AS

    Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification

    Authors: June-Woo Kim, Sangmin Bae, Won-Yang Cho, Byungjo Lee, Ho-Young Jung

    Abstract: Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within t… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: accepted to ICASSP 2024

  45. arXiv:2312.00902  [pdf

    cs.NI cond-mat.mtrl-sci

    Lennard Jones Token: a blockchain solution to scientific data curation

    Authors: Brian H. Lee, Alejandro Strachan

    Abstract: Data science and artificial intelligence have become an indispensable part of scientific research. While such methods rely on high-quality and large quantities of machine-readable scientific data, the current scientific data infrastructure faces significant challenges that limit effective data curation and sharing. These challenges include insufficient return on investment for researchers to share… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  46. arXiv:2311.18061  [pdf, other

    cs.LG cs.NE

    TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly Detection

    Authors: Ijaz Ul Haq, Byung Suk Lee, Donna M. Rizzo

    Abstract: The surge in real-time data collection across various industries has underscored the need for advanced anomaly detection in both univariate and multivariate time series data. This paper introduces TransNAS-TSAD, a framework that synergizes the transformer architecture with neural architecture search (NAS), enhanced through NSGA-II algorithm optimization. This approach effectively tackles the compl… ▽ More

    Submitted 4 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 32 pages , 4 figures, It will submitted to a journal

  47. arXiv:2311.10224  [pdf, other

    eess.IV cs.CV cs.LG

    CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images

    Authors: Syed Farhan Abbas, Nguyen Thanh Duc, Yoonguu Song, Kyungwon Kim, Ekta Srivastava, Boreom Lee

    Abstract: Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural ne… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  48. arXiv:2311.08589  [pdf, other

    cs.DC cs.AR

    Carbon Responder: Coordinating Demand Response for the Datacenter Fleet

    Authors: Jiali Xing, Bilge Acun, Aditya Sundarrajan, David Brooks, Manoj Chakkaravarthy, Nikky Avila, Carole-Jean Wu, Benjamin C. Lee

    Abstract: The increasing integration of renewable energy sources results in fluctuations in carbon intensity throughout the day. To mitigate their carbon footprint, datacenters can implement demand response (DR) by adjusting their load based on grid signals. However, this presents challenges for private datacenters with diverse workloads and services. One of the key challenges is efficiently and fairly allo… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  49. arXiv:2311.07079  [pdf, other

    cs.LG cs.AI eess.SP

    Sample Dominance Aware Framework via Non-Parametric Estimation for Spontaneous Brain-Computer Interface

    Authors: Byeong-Hoo Lee, Byoung-Hee Kwon, Seong-Whan Lee

    Abstract: Deep learning has shown promise in decoding brain signals, such as electroencephalogram (EEG), in the field of brain-computer interfaces (BCIs). However, the non-stationary characteristics of EEG signals pose challenges for training neural networks to acquire appropriate knowledge. Inconsistent EEG signals resulting from these non-stationary characteristics can lead to poor performance. Therefore,… ▽ More

    Submitted 14 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures

  50. arXiv:2311.04783  [pdf, other

    cs.CV

    VioLA: Aligning Videos to 2D LiDAR Scans

    Authors: Jun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle, Bhoram Lee, Volkan Isler

    Abstract: We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the re… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 8 pages