Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 497 results for author: Lee, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09260  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

    Authors: Maneesha Wickramasuriya, Taeyoung Lee, Murray Snyder

    Abstract: This paper introduces a deep transformer network for estimating the relative 6D pose of a Unmanned Aerial Vehicle (UAV) with respect to a ship using monocular images. A synthetic dataset of ship images is created and annotated with 2D keypoints of multiple ship parts. A Transformer Neural Network model is trained to detect these keypoints and estimate the 6D pose of each part. The estimates are in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 23 pages, 25 figures, 3 tables

  2. arXiv:2406.08989  [pdf, other

    eess.AS cs.SD

    ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

    Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

    Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.08176  [pdf, other

    cs.CV cs.RO

    Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment

    Authors: Taekbeom Lee, Youngseok Jang, H. Jin Kim

    Abstract: Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: RA-L. 8 pages, 8 figures, 4 tables

  4. arXiv:2406.07843  [pdf, other

    cs.CV q-bio.NC

    Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification

    Authors: Isaac Lin, Tianye Wang, Shang Gao, Shiming Tang, Tai Sing Lee

    Abstract: Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation v… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint NeurIPS 2024

  5. arXiv:2406.06329  [pdf, other

    cs.CL eess.AS

    A Parameter-efficient Language Extension Framework for Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  6. arXiv:2406.00636  [pdf, other

    cs.CV

    T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

    Authors: Taeryung Lee, Fabien Baradel, Thomas Lucas, Kyoung Mu Lee, Gregory Rogez

    Abstract: In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step. However, this appr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 HuMoGen Workshop

  7. arXiv:2406.00247  [pdf, other

    cs.IR cs.AI

    Large Language Models for Relevance Judgment in Product Search

    Authors: Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao

    Abstract: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 11 tables - SIGIR 2024, LLM4Eval

    ACM Class: H.3.3; I.2.7

  8. arXiv:2405.20954  [pdf, other

    cs.LG stat.ML

    Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score

    Authors: Nathan Tsoi, Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez

    Abstract: Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_β$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  9. arXiv:2405.19691  [pdf, other

    cs.HC

    Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing

    Authors: Minsun Kim, SeonGyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Hyungseung Lim, Jieun Han, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong, Tak Yeon Lee

    Abstract: While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  10. arXiv:2405.18027  [pdf, other

    cs.CL

    TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

    Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

    Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

  11. arXiv:2405.16021  [pdf, other

    cs.RO

    VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

    Authors: Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson, Nikhil J Joshi, Daniel Lam, Tsang-Wei Edward Lee, Alex Luong, Sharath Maddineni, Harsh Patel, Jodilyn Peralta, Jornell Quiambao, Diego Reyes, Rosario M Jauregui Ruano, Dorsa Sadigh, Pannag Sanketi, Leila Takayama, Pavel Vodenski, Fei Xia

    Abstract: Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon ta… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  12. arXiv:2405.12701  [pdf, other

    cs.CL cs.AI

    OLAPH: Improving Factuality in Biomedical Long-form Question Answering

    Authors: Minbyul Jeong, Hyeon Hwang, Chanwoong Yoon, Taewhoo Lee, Jaewoo Kang

    Abstract: In the medical domain, numerous scenarios necessitate the long-form generation ability of large language models (LLMs). Specifically, when addressing patients' questions, it is essential that the model's response conveys factual claims, highlighting the need for an automated method to evaluate those claims. Thus, we introduce MedLFQA, a benchmark dataset reconstructed using long-form question-answ… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  13. arXiv:2405.12648  [pdf, other

    cs.CV cs.AI

    Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency

    Authors: Hyeongjin Kim, Sangwon Kim, Dasom Ahn, Jong Taek Lee, Byoung Chul Ko

    Abstract: Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding o… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  14. arXiv:2405.11817  [pdf

    cs.ET cs.CL

    Systematic Review on Healthcare Systems Engineering utilizing ChatGPT

    Authors: Jungwoo Kim, Ji-Su Lee, Huijae Kim, Taesik Lee

    Abstract: This paper presents an analytical framework for conducting academic reviews in the field of Healthcare Systems Engineering, employing ChatGPT, a state-of-the-art tool among recent language models. We utilized 9,809 abstract paragraphs from conference presentations to systematically review the field. The framework comprises distinct analytical processes, each employing tailored prompts and the syst… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  15. arXiv:2405.10725  [pdf, other

    cs.CL cs.IR

    INDUS: Effective and Efficient Language Models for Scientific Applications

    Authors: Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kaylin Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grezes, Megan Ansdell, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics,… ▽ More

    Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  16. arXiv:2405.09879  [pdf, other

    cs.CV cs.AI

    Generative Unlearning for Any Identity

    Authors: Juwon Seo, Sung-Hoon Lee, Tae-Young Lee, Seungjun Moon, Gyeong-Moon Park

    Abstract: Recent advances in generative models trained on large-scale datasets have made it possible to synthesize high-quality samples across various domains. Moreover, the emergence of strong inversion networks enables not only a reconstruction of real-world images but also the modification of attributes through various editing methods. However, in certain domains related to privacy issues, e.g., human fa… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 15 pages, 17 figures, 10 tables, CVPR 2024 Poster

  17. arXiv:2405.06424  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

    Authors: JoonHo Lee, Jae Oh Woo, Juree Seok, Parisa Hassanzadeh, Wooseok Jang, JuYoun Son, Sima Didari, Baruch Gutow, Heng Hao, Hankyu Moon, Wenjun Hu, Yeong-Dae Kwon, Taehee Lee, Seungjai Min

    Abstract: Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t… ▽ More

    Submitted 19 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  18. arXiv:2405.03141  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

    Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-Ping Lam, Yong-Ping Zheng

    Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  19. arXiv:2404.17709  [pdf, other

    stat.ML cs.LG

    Low-rank Matrix Bandits with Heavy-tailed Rewards

    Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

    Abstract: In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: The 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

  20. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  21. arXiv:2404.11104  [pdf, other

    cs.CV

    Object Remover Performance Evaluation Methods using Class-wise Object Removal Images

    Authors: Changsuk Oh, Dongseok Shim, Taekbeom Lee, H. Jin Kim

    Abstract: Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  22. arXiv:2404.06466  [pdf, other

    cs.LG stat.ML

    Hyperparameter Selection in Continual Learning

    Authors: Thomas L. Lee, Sigrid Passano Hellan, Linus Ericsson, Elliot J. Crowley, Amos Storkey

    Abstract: In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparam… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Preprint, 9 pages

  23. arXiv:2404.03188  [pdf

    eess.IV cs.CV cs.LG

    Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture

    Authors: W. S. H. M. W. Ahmad, M. F. A. Fauzi, M. K. Abdullahi, Jenny T. H. Lee, N. S. A. Basry, A Yahaya, A. M. Ismail, A. Adam, Elaine W. L. Chan, F. S. Abas

    Abstract: Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This article has been accepted in the Journal of Engineering Science and Technology (JESTEC) and awaiting publication

  24. arXiv:2404.01351  [pdf, other

    cs.LG cs.AI cs.CV

    AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation

    Authors: Taeckyung Lee, Sorn Chottananurak, Taesik Gong, Sung-Ju Lee

    Abstract: Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, suc… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  25. arXiv:2404.00420  [pdf, other

    cs.SE cs.LG

    Learning Service Selection Decision Making Behaviors During Scientific Workflow Development

    Authors: Xihao Xie, Jia Zhang, Rahul Ramachandran, Tsengdar J. Lee, Seungwon Lee

    Abstract: Increasingly, more software services have been published onto the Internet, making it a big challenge to recommend services in the process of a scientific workflow composition. In this paper, a novel context-aware approach is proposed to recommending next services in a workflow development process, through learning service representation and service selection decision making behaviors from workflo… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 14 pages, 8 figures. arXiv admin note: text overlap with arXiv:2205.11771

  26. arXiv:2404.00376  [pdf, other

    cs.CL

    Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

    Authors: Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, Jaewoo Kang

    Abstract: While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  27. arXiv:2404.00234  [pdf, other

    cs.CV

    Grid Diffusion Models for Text-to-Video Generation

    Authors: Taegyeong Lee, Soyeong Kwon, Taehwan Kim

    Abstract: Recent advances in the diffusion models have significantly improved text-to-image generation. However, generating videos from text is a more challenging task than generating images from text, due to the much larger dataset and higher computational cost required. Most existing video generation methods use either a 3D U-Net architecture that considers the temporal dimension or autoregressive generat… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  28. arXiv:2403.19985  [pdf, other

    cs.CV

    Stable Surface Regularization for Fast Few-Shot NeRF

    Authors: Byeongin Joung, Byeong-Uk Lee, Jaesung Choe, Ukcheol Shin, Minjun Kang, Taeyeop Lee, In So Kweon, Kuk-Jin Yoon

    Abstract: This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 3DV 2024

  29. arXiv:2403.19456  [pdf, other

    cs.CV cs.GR cs.MM

    Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization

    Authors: Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Oliver Deussen, Weiming Dong, Jintao Li, Tong-Yee Lee

    Abstract: Personalized generation paradigms empower designers to customize visual intellectual properties with the help of textual descriptions by tuning or adapting pre-trained text-to-image models on a few images. Recent works explore approaches for concurrently customizing both content and detailed visual style appearance. However, these existing approaches often generate images where the content and sty… ▽ More

    Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  30. arXiv:2403.19146  [pdf, ps, other

    cs.DS cs.DC math.OC

    Improving the Bit Complexity of Communication for Distributed Convex Optimization

    Authors: Mehrdad Ghadiri, Yin Tat Lee, Swati Padmanabhan, William Swartworth, David Woodruff, Guanghao Ye

    Abstract: We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank app… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To appear in STOC '24. Abstract shortened to meet the arXiv limits. Comments welcome!

  31. arXiv:2403.18421  [pdf, other

    cs.CL cs.AI

    BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

    Authors: Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang, Michael Carbin, Christopher D. Manning

    Abstract: Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build an… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages

  32. arXiv:2403.16510  [pdf, other

    cs.CV

    Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

    Authors: Ziyao Huang, Fan Tang, Yong Zhang, Xiaodong Cun, Juan Cao, Jintao Li, Tong-Yee Lee

    Abstract: Despite the remarkable process of talking-head-based avatar-creating solutions, directly generating anchor-style videos with full-body motions remains challenging. In this study, we propose Make-Your-Anchor, a novel system necessitating only a one-minute video clip of an individual for training, subsequently enabling the automatic generation of anchor-style videos with precise torso and hand movem… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: accepted at CVPR2024

  33. arXiv:2403.15341  [pdf, other

    cs.AI cs.MA

    Collaborative AI Teaming in Unknown Environments via Active Goal Deduction

    Authors: Zuyuan Zhang, Hanhan Zhou, Mahdi Imani, Taeyoung Lee, Tian Lan

    Abstract: With the advancements of artificial intelligence (AI), we're seeing more scenarios that require AI to work closely with other agents, whose goals and strategies might not be known beforehand. However, existing approaches for training collaborative agents often require defined and known reward signals and cannot address the problem of teaming with unknown agents that often have latent objectives/re… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  34. arXiv:2403.10041  [pdf, other

    cs.RO cs.AI

    Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)

    Authors: Jeongeun Park, Taemoon Jeong, Hyeonseong Kim, Taehyun Byun, Seungyoon Shin, Keunjun Choi, Jaewoon Kwon, Taeyoon Lee, Matthew Pan, Sungjoon Choi

    Abstract: This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animate… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 4 pages, 3 figures

  35. arXiv:2403.08272  [pdf, other

    cs.CL

    RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education

    Authors: Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh

    Abstract: The integration of generative AI in education is expanding, yet empirical analyses of large-scale and real-world interactions between students and AI systems still remain limited. Addressing this gap, we present RECIPE4U (RECIPE for University), a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses. During the study, studen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.13243

  36. arXiv:2403.08133  [pdf, other

    eess.SP cs.AI cs.IT

    Physics-Inspired Deep Learning Anti-Aliasing Framework in Efficient Channel State Feedback

    Authors: Yu-Chien Lin, Yan Xin, Ta-Sung Lee, Charlie, Zhang, Zhi Ding

    Abstract: Acquiring downlink channel state information (CSI) at the base station is vital for optimizing performance in massive Multiple input multiple output (MIMO) Frequency-Division Duplexing (FDD) systems. While deep learning architectures have been successful in facilitating UE-side CSI feedback and gNB-side recovery, the undersampling issue prior to CSI feedback is often overlooked. This issue, which… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  37. arXiv:2403.01749  [pdf, other

    cs.CL

    Differentially Private Synthetic Data via Foundation Model APIs 2: Text

    Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

    Abstract: Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalab… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  38. arXiv:2402.12071  [pdf, other

    cs.CL cs.AI

    EmoBench: Evaluating the Emotional Intelligence of Large Language Models

    Authors: Sahand Sabour, Siyang Liu, Zheyuan Zhang, June M. Liu, Jinfeng Zhou, Alvionna S. Sunaryo, Juanzi Li, Tatia M. C. Lee, Rada Mihalcea, Minlie Huang

    Abstract: Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion regulation and thought facilitati… ▽ More

    Submitted 7 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Main Conference

  39. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  40. arXiv:2402.07872  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

    Authors: Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

    Abstract: Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  41. arXiv:2401.17800  [pdf, other

    cs.SD cs.MM eess.AS

    Dance-to-Music Generation with Encoder-based Textual Inversion of Diffusion Models

    Authors: Sifei Li, Weiming Dong, Yuxin Zhang, Fan Tang, Chongyang Ma, Oliver Deussen, Tong-Yee Lee, Changsheng Xu

    Abstract: The harmonious integration of music with dance movements is pivotal in vividly conveying the artistic essence of dance. This alignment also significantly elevates the immersive quality of gaming experiences and animation productions. While there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly concentrate on modulating overarch… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures

  42. arXiv:2401.16731  [pdf, other

    cs.CL cs.AI

    Towards Generating Informative Textual Description for Neurons in Language Models

    Authors: Shrayani Mondal, Rishabh Garodia, Arbaaz Qureshi, Taesung Lee, Youngja Park

    Abstract: Recent developments in transformer-based language models have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: AAAI 2024

  43. arXiv:2401.08962  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

    Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

    Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  44. arXiv:2401.07298  [pdf, other

    stat.ML cs.LG

    Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

    Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

    Abstract: In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $Θ^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized lo… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Revision of the paper accepted by NeurIPS 2022

  45. arXiv:2401.03816  [pdf, other

    eess.AS cs.SD

    Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss

    Authors: Yusheng Tian, Jingyu Li, Tan Lee

    Abstract: This research is about the creation of personalized synthetic voices for head and neck cancer survivors. It is focused particularly on tongue cancer patients whose speech might exhibit severe articulation impairment. Our goal is to restore normal articulation in the synthesized speech, while maximally preserving the target speaker's individuality in terms of both the voice timbre and speaking styl… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  46. arXiv:2401.03689  [pdf, other

    eess.AS cs.SD

    LUPET: Incorporating Hierarchical Information Path into Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Toward high-performance multilingual automatic speech recognition (ASR), various types of linguistic information and model design have demonstrated their effectiveness independently. They include language identity (LID), phoneme information, language-specific processing modules, and cross-lingual self-supervised speech representation. It is expected that leveraging their benefits synergistically i… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech 2024

  47. Image Collage on Arbitrary Shape via Shape-Aware Slicing and Optimization

    Authors: Dong-Yi Wu, Thi-Ngoc-Hanh Le, Sheng-Yi Yao, Yun-Chen Lin, Tong-Yee Lee

    Abstract: Image collage is a very useful tool for visualizing an image collection. Most of the existing methods and commercial applications for generating image collages are designed on simple shapes, such as rectangular and circular layouts. This greatly limits the use of image collages in some artistic and creative settings. Although there are some methods that can generate irregularly-shaped image collag… ▽ More

    Submitted 17 November, 2023; originally announced January 2024.

    Comments: This paper has been accepted for publication on IEEE Transactions on Visualization and Computer Graphics (TVCG), March 2023. Project website http://graphics.csie.ncku.edu.tw/shapedimagecollage

  48. arXiv:2311.16452  [pdf, other

    cs.CL

    Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

    Authors: Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz

    Abstract: Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM. We build… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 21 pages, 7 figures

    ACM Class: I.2.7

  49. arXiv:2311.16401  [pdf, ps, other

    quant-ph cs.DS

    On the quantum time complexity of divide and conquer

    Authors: Jonathan Allcock, Jinge Bao, Aleksandrs Belovs, Troy Lee, Miklos Santha

    Abstract: We initiate a systematic study of the time complexity of quantum divide and conquer algorithms for classical problems. We establish generic conditions under which search and minimization problems with classical divide and conquer algorithms are amenable to quantum speedup and apply these theorems to an array of problems involving strings, integers, and geometric objects. They include LONGEST DISTI… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 48 pages, accepted to QIP 2024

  50. arXiv:2311.14737  [pdf, other

    cs.CL cs.AI cs.LG

    Positional Description Matters for Transformers Arithmetic

    Authors: Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang

    Abstract: Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is their naive reliance on positional information to solve arithmetic problems with a small number of digits, leading to poor performance on larger numbers. Herei… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 18 pages