-
On Speculative Decoding for Multimodal Large Language Models
Authors:
Mukul Gagrani,
Raghavv Goel,
Wonseok Jeon,
Junyoung Park,
Mingu Lee,
Christopher Lott
Abstract:
Inference with Multimodal Large Language Models (MLLMs) is slow due to their large-language-model backbone which suffers from memory bandwidth bottleneck and generates tokens auto-regressively. In this paper, we explore the application of speculative decoding to enhance the inference efficiency of MLLMs, specifically the LLaVA 7B model. We show that a language-only model can serve as a good draft…
▽ More
Inference with Multimodal Large Language Models (MLLMs) is slow due to their large-language-model backbone which suffers from memory bandwidth bottleneck and generates tokens auto-regressively. In this paper, we explore the application of speculative decoding to enhance the inference efficiency of MLLMs, specifically the LLaVA 7B model. We show that a language-only model can serve as a good draft model for speculative decoding with LLaVA 7B, bypassing the need for image tokens and their associated processing components from the draft model. Our experiments across three different tasks show that speculative decoding can achieve a memory-bound speedup of up to 2.37$\times$ using a 115M parameter language model that we trained from scratch. Additionally, we introduce a compact LLaVA draft model incorporating an image adapter, which shows marginal performance gains in image captioning while maintaining comparable results in other tasks.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation
Authors:
Jinkyung Park,
Pamela Wisniewski,
Vivek Singh
Abstract:
In this position paper, we discuss the potential for leveraging LLMs as interactive research tools to facilitate collaboration between human coders and AI to effectively annotate online risk data at scale. Collaborative human-AI labeling is a promising approach to annotating large-scale and complex data for various tasks. Yet, tools and methods to support effective human-AI collaboration for data…
▽ More
In this position paper, we discuss the potential for leveraging LLMs as interactive research tools to facilitate collaboration between human coders and AI to effectively annotate online risk data at scale. Collaborative human-AI labeling is a promising approach to annotating large-scale and complex data for various tasks. Yet, tools and methods to support effective human-AI collaboration for data annotation are under-studied. This gap is pertinent because co-labeling tasks need to support a two-way interactive discussion that can add nuance and context, particularly in the context of online risk, which is highly subjective and contextualized. Therefore, we provide some of the early benefits and challenges of using LLMs-based tools for risk annotation and suggest future directions for the HCI research community to leverage LLMs as research tools to facilitate human-AI collaboration in contextualized online data annotation. Our research interests align very well with the purposes of the LLMs as Research Tools workshop to identify ongoing applications and challenges of using LLMs to work with data in HCI research. We anticipate learning valuable insights from organizers and participants into how LLMs can help reshape the HCI community's methods for working with data.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
CAT: Contrastive Adapter Training for Personalized Image Generation
Authors:
Jae Wan Park,
Sang Hyun Park,
Jun Young Koh,
Junha Lee,
Min Song
Abstract:
The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the…
▽ More
The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution
Authors:
Jihee Kim,
Jia Park,
Jiwon Shin,
Hanseok Kim,
Kahyun Kim,
Haengbeom Shin,
Ha-Jung Park,
Woo-Seok Choi
Abstract:
This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq…
▽ More
This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the frequency offset without any phase interpolators. To this end, a fractional divider controlled by CDR is placed close to the global phase locked loop. Moreover, in order to address the sub-optimal lock point of conventional baud-rate phase detectors, the proposed CDR employs a background eye-climbing algorithm, which optimizes the sampling phase and maximizes the vertical eye margin (VEM). Fabricated in a 28nm CMOS process, the proposed 4x32Gb/s RX shows a low integrated fractional spur of -40.4dBc at a 2500ppm frequency offset. Furthermore, it improves bit-error-rate performance by increasing the VEM by 17%. The entire RX achieves the energy efficiency of 1.8pJ/bit with the aggregate data rate of 128Gb/s.
△ Less
Submitted 22 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
A Data Efficient Framework for Learning Local Heuristics
Authors:
Rishi Veerapaneni,
Jonathan Park,
Muhammad Suhail Saleem,
Maxim Likhachev
Abstract:
With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset…
▽ More
With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset of target values by querying an oracle on many planning problems (in this case, local planning problems). This data collection process can become slow as the size of the local region increases or if the domain requires expensive collision checks. Our main insight is that when an A* search solves a start-goal planning problem it inherently ends up solving multiple local planning problems. We exploit this observation to propose an efficient data collection framework that does <1/10th the amount of work (measured by expansions) to collect the same amount of data in comparison to baselines. This idea also enables us to run LoHA* in an online manner where we can iteratively collect data and improve our model while solving relevant start-goal tasks. We demonstrate the performance of our data collection and online framework on a 4D $(x, y, θ, v)$ navigation domain.
△ Less
Submitted 3 May, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Text-to-Image Synthesis for Any Artistic Styles: Advancements in Personalized Artistic Image Generation via Subdivision and Dual Binding
Authors:
Junseo Park,
Beomseok Ko,
Hyeryung Jang
Abstract:
Recent advancements in text-to-image models, such as Stable Diffusion, have demonstrated their ability to synthesize visual images through natural language prompts. One approach of personalizing text-to-image models, exemplified by DreamBooth, fine-tunes the pre-trained model by binding unique text identifiers with a few images of a specific subject. Although existing fine-tuning methods have demo…
▽ More
Recent advancements in text-to-image models, such as Stable Diffusion, have demonstrated their ability to synthesize visual images through natural language prompts. One approach of personalizing text-to-image models, exemplified by DreamBooth, fine-tunes the pre-trained model by binding unique text identifiers with a few images of a specific subject. Although existing fine-tuning methods have demonstrated competence in rendering images according to the styles of famous painters, it is still challenging to learn to produce images encapsulating distinct art styles due to abstract and broad visual perceptions of stylistic attributes such as lines, shapes, textures, and colors. In this paper, we introduce a new method, Single-StyleForge, for personalization. It fine-tunes pre-trained text-to-image diffusion models to generate diverse images in specified styles from text prompts. By using around 15-20 images of the target style, the approach establishes a foundational binding of a unique token identifier with a broad range of the target style. It also utilizes auxiliary images to strengthen this binding, resulting in offering specific guidance on representing elements such as persons in a target style-consistent manner. In addition, we present ways to improve the quality of style and text-image alignment through a method called Multi-StyleForge, which inherits the strategy used in StyleForge and learns tokens in multiple. Experimental evaluation conducted on six distinct artistic styles demonstrates substantial improvements in both the quality of generated images and the perceptual fidelity metrics, such as FID, KID, and CLIP scores.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
A 0.65-pJ/bit 3.6-TB/s/mm I/O Interface with XTalk Minimizing Affine Signaling for Next-Generation HBM with High Interconnect Density
Authors:
Hyunjun Park,
Jiwon Shin,
Hanseok Kim,
Jihee Kim,
Haengbeom Shin,
Taehoon Kim,
Jung-Hun Park,
Woo-Seok Choi
Abstract:
This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through n…
▽ More
This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through numerical experiments. XMAS not only demonstrates exceptional crosstalk removing capabilities but also exhibits robustness against noise, especially simultaneous switching noise. Fabricated in a 28-nm CMOS process, the prototype XMAS transceiver achieves an edge density of 3.6TB/s/mm and an energy efficiency of 0.65pJ/b. Compared to the single-ended signaling, the crosstalk-induced peak-to-peak jitter of the received eye with XMAS is reduced by 75% at 10GS/s/pin data rate, and the horizontal eye opening extends to 0.2UI at a bit error rate < 10$^{-12}$.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Evaluation of the performance of the event reconstruction algorithms in the JSNS$^2$ experiment using a $^{252}$Cf calibration source
Authors:
D. H. Lee,
M. K. Cheoun,
J. H. Choi,
J. Y. Choi,
T. Dodo,
J. Goh,
K. Haga,
M. Harada,
S. Hasegawa,
W. Hwang,
T. Iida,
H. I. Jang,
J. S. Jang,
K. K. Joo,
D. E. Jung,
S. K. Kang,
Y. Kasugai,
T. Kawasaki,
E. J. Kim,
J. Y. Kim,
S. B Kim,
W. Kim,
H. Kinoshita,
T. Konno,
I. T. Lim
, et al. (28 additional authors not shown)
Abstract:
JSNS$^2$ searches for short baseline neutrino oscillations with a baseline of 24~meters and a target of 17~tonnes of the Gd-loaded liquid scintillator. The correct algorithm on the event reconstruction of events, which determines the position and energy of neutrino interactions in the detector, are essential for the physics analysis of the data from the experiment. Therefore, the performance of th…
▽ More
JSNS$^2$ searches for short baseline neutrino oscillations with a baseline of 24~meters and a target of 17~tonnes of the Gd-loaded liquid scintillator. The correct algorithm on the event reconstruction of events, which determines the position and energy of neutrino interactions in the detector, are essential for the physics analysis of the data from the experiment. Therefore, the performance of the event reconstruction is carefully checked with calibrations using $^{252}$Cf source. This manuscript describes the methodology and the performance of the event reconstruction.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Upgrade of NaI(Tl) crystal encapsulation for the NEON experiment
Authors:
J. J. Choi,
E. J. Jeon,
J. Y. Kim,
K. W. Kim,
S. H. Kim,
S. K. Kim,
Y. D. Kim,
Y. J. Ko,
B. C. Koh,
C. Ha,
B. J. Park,
S. H. Lee,
I. S. Lee,
H. Lee,
H. S. Lee,
J. Lee,
Y. M. Oh
Abstract:
The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which…
▽ More
The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which operates at a thermal power of 2.8\,GW. Initial engineering operation was performed from May 2021 to March 2022 and observed unexpected photomultiplier-induced noise and a decreased light yield that were caused by leakage of liquid scintillator into the detector due to weakness of detector encapsulation. We upgraded the detector encapsulation design to prevent the leakage of the liquid scintillator. Meanwhile two small-sized detectors were replaced with larger ones resulting in a total mass of 16.7\,kg. With this new design implementation, the detector system has been operating stably since April 2022 for over a year without detector gain drop. In this paper, we present an improved crystal encapsulation design and stability of the NEON experiment.
△ Less
Submitted 28 June, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Pulse Shape Discrimination in JSNS$^2$
Authors:
T. Dodo,
M. K. Cheoun,
J. H. Choi,
J. Y. Choi,
J. Goh,
K. Haga,
M. Harada,
S. Hasegawa,
W. Hwang,
T. Iida,
H. I. Jang,
J. S. Jang,
K. K. Joo,
D. E. Jung,
S. K. Kang,
Y. Kasugai,
T. Kawasaki,
E. J. Kim,
J. Y. Kim,
S. B. Kim,
W. Kim,
H. Kinoshita,
T. Konno,
D. H. Lee,
I. T. Lim
, et al. (29 additional authors not shown)
Abstract:
JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment that is searching for sterile neutrinos via the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations using neutrinos with muon decay-at-rest. For this search, rejecting cosmic-ray-induced neutron events by Pulse Shape Discrimination (PSD) is essential because the JSNS$^2$ detector is loca…
▽ More
JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment that is searching for sterile neutrinos via the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations using neutrinos with muon decay-at-rest. For this search, rejecting cosmic-ray-induced neutron events by Pulse Shape Discrimination (PSD) is essential because the JSNS$^2$ detector is located above ground, on the third floor of the building. We have achieved 95$\%$ rejection of neutron events while keeping 90$\%$ of signal, electron-like events using a data driven likelihood method.
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
Toward Safe Evolution of Artificial Intelligence (AI) based Conversational Agents to Support Adolescent Mental and Sexual Health Knowledge Discovery
Authors:
Jinkyung Park,
Vivek Singh,
Pamela Wisniewski
Abstract:
Following the recent release of various Artificial Intelligence (AI) based Conversation Agents (CAs), adolescents are increasingly using CAs for interactive knowledge discovery on sensitive topics, including mental and sexual health topics. Exploring such sensitive topics through online search has been an essential part of adolescent development, and CAs can support their knowledge discovery on su…
▽ More
Following the recent release of various Artificial Intelligence (AI) based Conversation Agents (CAs), adolescents are increasingly using CAs for interactive knowledge discovery on sensitive topics, including mental and sexual health topics. Exploring such sensitive topics through online search has been an essential part of adolescent development, and CAs can support their knowledge discovery on such topics through human-like dialogues. Yet, unintended risks have been documented with adolescents' interactions with AI-based CAs, such as being exposed to inappropriate content, false information, and/or being given advice that is detrimental to their mental and physical well-being (e.g., to self-harm). In this position paper, we discuss the current landscape and opportunities for CAs to support adolescents' mental and sexual health knowledge discovery. We also discuss some of the challenges related to ensuring the safety of adolescents when interacting with CAs regarding sexual and mental health topics. We call for a discourse on how to set guardrails for the safe evolution of AI-based CAs for adolescents.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
Authors:
SeungJeh Chung,
JooHyun Park,
Hyewon Kan,
HyeongYeop Kang
Abstract:
3D stylization, which entails the application of specific styles to three-dimensional objects, holds significant commercial potential as it enables the creation of diverse 3D objects with distinct moods and styles, tailored to specific demands of different scenes. With recent advancements in text-driven methods and artificial intelligence, the stylization process is increasingly intuitive and auto…
▽ More
3D stylization, which entails the application of specific styles to three-dimensional objects, holds significant commercial potential as it enables the creation of diverse 3D objects with distinct moods and styles, tailored to specific demands of different scenes. With recent advancements in text-driven methods and artificial intelligence, the stylization process is increasingly intuitive and automated, thereby diminishing the reliance on manual labor and expertise. However, existing methods have predominantly focused on holistic stylization, thereby leaving the application of styles to individual components of a 3D object unexplored. In response, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, part-tailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP leverages the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize the individual parts of the 3D mesh and modify their colors and local geometries to align them with the desired styles specified in the text prompt. 3DStyleGLIP is effectively trained for 3D stylization tasks through a part-level style loss working in GLIP's embedding space, supplemented by two complementary learning techniques. Extensive experimental validation confirms that our method achieves significant part-wise stylization capabilities, demonstrating promising potential in advancing the field of 3D stylization.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Authors:
Chanyeong Kim,
Jongwoong Park,
Hyunglip Bae,
Woo Chang Kim
Abstract:
Solving large-scale multistage stochastic programming (MSP) problems poses a significant challenge as commonly used stagewise decomposition algorithms, including stochastic dual dynamic programming (SDDP), face growing time complexity as the subproblem size and problem count increase. Traditional approaches approximate the value functions as piecewise linear convex functions by incrementally accum…
▽ More
Solving large-scale multistage stochastic programming (MSP) problems poses a significant challenge as commonly used stagewise decomposition algorithms, including stochastic dual dynamic programming (SDDP), face growing time complexity as the subproblem size and problem count increase. Traditional approaches approximate the value functions as piecewise linear convex functions by incrementally accumulating subgradient cutting planes from the primal and dual solutions of stagewise subproblems. Recognizing these limitations, we introduce TranSDDP, a novel Transformer-based stagewise decomposition algorithm. This innovative approach leverages the structural advantages of the Transformer model, implementing a sequential method for integrating subgradient cutting planes to approximate the value function. Through our numerical experiments, we affirm TranSDDP's effectiveness in addressing MSP problems. It efficiently generates a piecewise linear approximation for the value function, significantly reducing computation time while preserving solution quality, thus marking a promising progression in the treatment of large-scale multistage stochastic programming problems.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
An inversion problem for optical spectrum data via physics-guided machine learning
Authors:
Hwiwoo Park,
Jun H. Park,
Jungseek Hwang
Abstract:
We propose the regularized recurrent inference machine (rRIM), a novel machine-learning approach to solve the challenging problem of deriving the pairing glue function from measured optical spectra. The rRIM incorporates physical principles into both training and inference and affords noise robustness, flexibility with out-of-distribution data, and reduced data requirements. It effectively obtains…
▽ More
We propose the regularized recurrent inference machine (rRIM), a novel machine-learning approach to solve the challenging problem of deriving the pairing glue function from measured optical spectra. The rRIM incorporates physical principles into both training and inference and affords noise robustness, flexibility with out-of-distribution data, and reduced data requirements. It effectively obtains reliable pairing glue functions from experimental optical spectra and yields promising solutions for similar inverse problems of the Fredholm integral equation of the first kind.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Search for a sub-eV sterile neutrino using Daya Bay's full dataset
Authors:
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding,
Y. Y. Ding
, et al. (176 additional authors not shown)
Abstract:
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis…
▽ More
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$.
△ Less
Submitted 15 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Strong interactions and isospin symmetry breaking in a supermoiré lattice
Authors:
Yonglong Xie,
Andrew T. Pierce,
Jeong Min Park,
Daniel E. Parker,
Jie Wang,
Patrick Ledwith,
Zhuozhen Cai,
Kenji Watanabe,
Takashi Taniguchi,
Eslam Khalaf,
Ashvin Vishwanath,
Pablo Jarillo-Herrero,
Amir Yacoby
Abstract:
In multilayer moiré heterostructures, the interference of multiple twist angles ubiquitously leads to tunable ultra-long-wavelength patterns known as supermoiré lattices. However, their impact on the system's many-body electronic phase diagram remains largely unexplored. We present local compressibility measurements revealing numerous incompressible states resulting from supermoiré-lattice-scale i…
▽ More
In multilayer moiré heterostructures, the interference of multiple twist angles ubiquitously leads to tunable ultra-long-wavelength patterns known as supermoiré lattices. However, their impact on the system's many-body electronic phase diagram remains largely unexplored. We present local compressibility measurements revealing numerous incompressible states resulting from supermoiré-lattice-scale isospin symmetry breaking driven by strong interactions. By using the supermoiré lattice occupancy as a probe of isospin symmetry, we observe an unexpected doubling of the miniband filling near $ν=-2$, possibly indicating a hidden phase transition or normal-state pairing proximal to the superconducting phase. Our work establishes supermoiré lattices as a tunable parameter for designing novel quantum phases and an effective tool for unraveling correlated phenomena in moiré materials.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Holo-VQVAE: VQ-VAE for phase-only holograms
Authors:
Joohyun Park,
Hyeongyeop Kang
Abstract:
Holography stands at the forefront of visual technology innovation, offering immersive, three-dimensional visualizations through the manipulation of light wave amplitude and phase. Contemporary research in hologram generation has predominantly focused on image-to-hologram conversion, producing holograms from existing images. These approaches, while effective, inherently limit the scope of innovati…
▽ More
Holography stands at the forefront of visual technology innovation, offering immersive, three-dimensional visualizations through the manipulation of light wave amplitude and phase. Contemporary research in hologram generation has predominantly focused on image-to-hologram conversion, producing holograms from existing images. These approaches, while effective, inherently limit the scope of innovation and creativity in hologram generation. In response to this limitation, we present Holo-VQVAE, a novel generative framework tailored for phase-only holograms (POHs). Holo-VQVAE leverages the architecture of Vector Quantized Variational AutoEncoders, enabling it to learn the complex distributions of POHs. Furthermore, it integrates the Angular Spectrum Method into the training process, facilitating learning in the image domain. This framework allows for the generation of unseen, diverse holographic content directly from its intricately learned latent space without requiring pre-existing images. This pioneering work paves the way for groundbreaking applications and methodologies in holographic content creation, opening a new era in the exploration of holographic content.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Long time stability and instability in the two-dimensional Boussinesq system with kinematic viscosity
Authors:
Jaemin Park
Abstract:
In this paper, we investigate the long-time behavior of the two-dimensional incompressible Boussinesq system with kinematic viscosity in a periodic channel, focusing on instability and asymptotic stability near hydrostatic equilibria. Firstly, we prove that any hydrostatic equilibrium reveals long-time instability when the initial data are perturbed in Sobolev spaces of low regularity. Secondly, w…
▽ More
In this paper, we investigate the long-time behavior of the two-dimensional incompressible Boussinesq system with kinematic viscosity in a periodic channel, focusing on instability and asymptotic stability near hydrostatic equilibria. Firstly, we prove that any hydrostatic equilibrium reveals long-time instability when the initial data are perturbed in Sobolev spaces of low regularity. Secondly, we establish asymptotic stability of the stratified density, which is strictly decreasing in the vertical direction, under sufficiently regular perturbations, proving that the solution converges to the unique minimizer of the total energy.
Our analysis is based on the energy method. Although the total energy dissipates due to kinematic viscosity, such mechanism cannot capture the stratification of the density. We overcome this difficulty by discovering another Lyapunov functional which exhibits the density stratification in a quantitative manner.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Prompt Learning via Meta-Regularization
Authors:
Jinyoung Park,
Juyeon Ko,
Hyunwoo J. Kim
Abstract:
Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of th…
▽ More
Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Dynamic Transfer Policies for Parallel Queues
Authors:
Timothy C. Y. Chan,
Jangwon Park,
Vahid Sarhangian
Abstract:
We consider the problem of load balancing in parallel queues by transferring customers between them at discrete points in time. Holding costs accrue as customers wait in the queue, while transfer decisions incur both fixed (setup) and variable costs proportional to the number and direction of transfers. Our work is primarily motivated by inter-facility patient transfers between hospitals during a…
▽ More
We consider the problem of load balancing in parallel queues by transferring customers between them at discrete points in time. Holding costs accrue as customers wait in the queue, while transfer decisions incur both fixed (setup) and variable costs proportional to the number and direction of transfers. Our work is primarily motivated by inter-facility patient transfers between hospitals during a surge in demand for hospitalization (e.g., during a pandemic). By analyzing an associated fluid control problem, we show that under fairly general assumptions including time-varying arrivals and convex increasing holding costs, the optimal policy in each period partitions the state-space into a well-defined $\textit{no-transfer region}$ and its complement, such that transferring is optimal if and only if the system is sufficiently imbalanced. In the absence of fixed transfer costs, an optimal policy moves the state to the no-transfer region's boundary; in contrast, with fixed costs, the state is moved to the no-transfer region's relative interior. We further leverage the fluid control problem to provide insights on the trade-off between holding and transfer costs, emphasizing the importance of preventing excessive idleness when transfers are not feasible in continuous-time. Using simulation experiments, we investigate the performance and robustness of the fluid policy for the stochastic system. In particular, our case study calibrated using data during the pandemic in the Greater Toronto Area demonstrates that transferring patients between hospitals could result in up to 27.7% reduction in total cost with relatively few transfers.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Angular analysis of $B \to K^* e^+ e^-$ in the low-$q^2$ region with new electron identification at Belle
Authors:
Belle Collaboration,
D. Ferlewicz,
P. Urquijo,
I. Adachi,
K. Adamczyk,
H. Aihara,
D. M. Asner,
H. Atmacan,
R. Ayad,
V. Babu,
Sw. Banerjee,
P. Behera,
K. Belous,
J. Bennett,
M. Bessner,
V. Bhardwaj,
B. Bhuyan,
T. Bilka,
D. Biswas,
D. Bodrov,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola
, et al. (145 additional authors not shown)
Abstract:
We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from r…
▽ More
We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from right-handed currents from physics beyond the Standard Model by constraining the Wilson coefficients $\mathcal{C}_7^{(\prime)}$. We perform a fit to the $B\to K^* e^+ e^-$ differential decay rate and measure the imaginary component of the transversality amplitude to be $A_T^{\rm Im} = -1.27 \pm 0.52 \pm 0.12$, and the $K^*$ transverse asymmetry to be $A_T^{(2)} = 0.52 \pm 0.53 \pm 0.11$. The resulting constraints on the value of $\mathcal{C}_7^{\prime}$ are consistent with the Standard Model within a $2σ$ confidence interval.
△ Less
Submitted 2 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation
Authors:
Jinyeong Park,
Jaegyoon Ahn,
Jonghwan Choi,
Jibum Kim
Abstract:
Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing…
▽ More
Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
Authors:
Seyeon Kim,
Siyoon Jin,
Jihye Park,
Kihong Kim,
Jiyoung Kim,
Jisu Nam,
Seungryong Kim
Abstract:
Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overc…
▽ More
Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overcome these challenges, we propose a novel motion-disentangled diffusion model for high-quality talking head generation, dubbed MoDiTalker. We introduce the two modules: audio-to-motion (AToM), designed to generate a synchronized lip motion from audio, and motion-to-video (MToV), designed to produce high-quality head video following the generated motion. AToM excels in capturing subtle lip movements by leveraging an audio attention mechanism. In addition, MToV enhances temporal consistency by leveraging an efficient tri-plane representation. Our experiments conducted on standard benchmarks demonstrate that our model achieves superior performance compared to existing models. We also provide comprehensive ablation studies and user study results.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Two-level overlapping Schwarz preconditioners with universal coarse spaces for $2m$th-order elliptic problems
Authors:
Jongho Park
Abstract:
We propose a novel universal construction of two-level overlapping Schwarz preconditioners for $2m$th-order elliptic boundary value problems, where $m$ is a positive integer. The word "universal" here signifies that the coarse space construction can be applied to any finite element discretization for any $m$ that satisfies some common assumptions. We present numerical results for conforming, nonco…
▽ More
We propose a novel universal construction of two-level overlapping Schwarz preconditioners for $2m$th-order elliptic boundary value problems, where $m$ is a positive integer. The word "universal" here signifies that the coarse space construction can be applied to any finite element discretization for any $m$ that satisfies some common assumptions. We present numerical results for conforming, nonconforming, and discontinuous Galerkin-type finite element discretizations for high-order problems to demonstrate the scalability of the proposed two-level overlapping Schwarz preconditioners.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
TC4D: Trajectory-Conditioned Text-to-4D Generation
Authors:
Sherwin Bahmani,
Xian Liu,
Yifan Wang,
Ivan Skorokhodov,
Victor Rong,
Ziwei Liu,
Xihui Liu,
Jeong Joon Park,
Sergey Tulyakov,
Gordon Wetzstein,
Andrea Tagliasacchi,
David B. Lindell
Abstract:
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The la…
▽ More
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The lack of a more flexible motion model contributes to the gap in realism between 4D generation methods and recent, near-photorealistic video generation models. Here, we propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We represent the global motion of a scene's bounding box using rigid transformation along a trajectory parameterized by a spline. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion, which we evaluate qualitatively and through a user study. Video results can be viewed on our website: https://sherwinbahmani.github.io/tc4d.
△ Less
Submitted 10 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Authors:
Jongha Kim,
Jihwan Park,
Jinyoung Park,
Jinyoung Kim,
Sehyung Kim,
Hyunwoo J. Kim
Abstract:
Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to…
▽ More
Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Argument Quality Assessment in the Age of Instruction-Following Large Language Models
Authors:
Henning Wachsmuth,
Gabriella Lapesa,
Elena Cabrio,
Anne Lauscher,
Joonsuk Park,
Eva Maria Vecchi,
Serena Villata,
Timon Ziegenbein
Abstract:
The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument…
▽ More
The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Incorporating Heterogeneous Interactions for Ecological Biodiversity
Authors:
Jong Il Park,
Deok-Sun Lee,
Sang Hoon Lee,
Hye Jin Park
Abstract:
Understanding the behaviors of ecological systems is challenging given their multi-faceted complexity. To proceed, theoretical models such as Lotka-Volterra dynamics with random interactions have been investigated by the dynamical mean-field theory to provide insights into underlying principles such as how biodiversity and stability depend on the randomness in interaction strength. Yet the fully-c…
▽ More
Understanding the behaviors of ecological systems is challenging given their multi-faceted complexity. To proceed, theoretical models such as Lotka-Volterra dynamics with random interactions have been investigated by the dynamical mean-field theory to provide insights into underlying principles such as how biodiversity and stability depend on the randomness in interaction strength. Yet the fully-connected structure assumed in these previous studies is not realistic as revealed by a vast amount of empirical data. We derive a generic formula for the abundance distribution under an arbitrary distribution of degree, the number of interacting neighbors, which leads to degree-dependent abundance patterns of species. Notably, in contrast to the well-mixed system, the number of surviving species can be reduced as the community becomes cooperative in heterogeneous interaction structures. Our study, therefore, demonstrates that properly taking into account heterogeneity in the interspecific interaction structure is indispensable to understanding the diversity in large ecosystems, and our general theoretical framework can apply to a much wider range of interacting many-body systems.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Semantic Communication Challenges: Understanding Dos and Avoiding Don'ts
Authors:
Jinho Choi,
Jihong Park,
Eleonora Grassucci,
Danilo Comminiello
Abstract:
Semantic communication, emerging as a promising paradigm for data transmission, offers an innovative departure from the constraints of Shannon theory, heralding significant advancements in future communication technologies. Despite the proliferation of proposed approaches, there are still numerous challenges. In this paper, we review current semantic communication methodologies and shed light on p…
▽ More
Semantic communication, emerging as a promising paradigm for data transmission, offers an innovative departure from the constraints of Shannon theory, heralding significant advancements in future communication technologies. Despite the proliferation of proposed approaches, there are still numerous challenges. In this paper, we review current semantic communication methodologies and shed light on pivotal issues and addressing certain discrepancies that exist within the field. By elucidating both dos and don'ts, we aim to provide valuable insights into the emerging landscape of semantic communication.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Control Designs for Critical-Continegency Responsible Grid-Following Inverters and Seamless Transitions To and From Grid-Forming Modes
Authors:
Jaesang Park,
Alireza Askarian,
Srinivasa Salapaka
Abstract:
This article introduces two control frameworks: one for Grid-Following (GFL) inverters aiding Grid-Forming (GFM) inverters in voltage regulation during large contingency events and optimizing power transactions under normal conditions; and another for seamless transitions between grid-tied and grid-isolated setups, managing voltage transient characteristics. In microgrids, GFM inverters regulate v…
▽ More
This article introduces two control frameworks: one for Grid-Following (GFL) inverters aiding Grid-Forming (GFM) inverters in voltage regulation during large contingency events and optimizing power transactions under normal conditions; and another for seamless transitions between grid-tied and grid-isolated setups, managing voltage transient characteristics. In microgrids, GFM inverters regulate voltage, while GFL inverters handle power transactions. The proposed GFL control detects abrupt load/generation changes, adjusting power transactions using local storage to support GFM inverters during contingencies. Additionally, a transition control ensures smooth GFL-GFM shifts, reducing power and voltage fluctuations. Simulation results validate improved voltage regulation during contingencies and enhanced power tracking during slow changes, alongside minimized transient overshoot.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Authors:
Jimyeong Kim,
Jungwon Park,
Wonjong Rhee
Abstract:
In text-to-image personalization, a timely and crucial challenge is the tendency of generated images overfitting to the biases present in the reference images. We initiate our study with a comprehensive categorization of the biases into background, nearby-object, tied-object, substance (in style re-contextualization), and pose biases. These biases manifest in the generated images due to their enta…
▽ More
In text-to-image personalization, a timely and crucial challenge is the tendency of generated images overfitting to the biases present in the reference images. We initiate our study with a comprehensive categorization of the biases into background, nearby-object, tied-object, substance (in style re-contextualization), and pose biases. These biases manifest in the generated images due to their entanglement into the subject embedding. This undesired embedding entanglement not only results in the reflection of biases from the reference images into the generated images but also notably diminishes the alignment of the generated images with the given generation prompt. To address this challenge, we propose SID~(Selectively Informative Description), a text description strategy that deviates from the prevalent approach of only characterizing the subject's class identification. SID is generated utilizing multimodal GPT-4 and can be seamlessly integrated into optimization-based models. We present comprehensive experimental results along with analyses of cross-attention maps, subject-alignment, non-subject-disentanglement, and text-alignment.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
All van der Waals three-terminal SOT-MRAM realized by topological ferromagnet Fe3GeTe2
Authors:
Jingyuan Cui,
Kai-Xuan Zhang,
Je-Geun Park
Abstract:
Magnetic van der Waals (vdW) materials have attracted massive attention because of their academic interest and application potential for the past few years. Its main advantage is the intrinsic two-dimensionality, enabling much smaller devices of novel concepts. One particular exciting direction lies in the current-driven spin-orbit torque (SOT). Here, we, for the first time, realize an all vdW thr…
▽ More
Magnetic van der Waals (vdW) materials have attracted massive attention because of their academic interest and application potential for the past few years. Its main advantage is the intrinsic two-dimensionality, enabling much smaller devices of novel concepts. One particular exciting direction lies in the current-driven spin-orbit torque (SOT). Here, we, for the first time, realize an all vdW three-terminal SOT memory, employing the unique physics principle of gigantic intrinsic SOT of Fe3GeTe2 (FGT) and the well-known industry-adopted tunnelling magnetoresistance (TMR) effect. We designed the device operation procedure and fabricated the FGT/h-BN/FGT vdW heterostructure as a proof of concept. This device exhibits a classical TMR effect and unambiguously demonstrates the conception by precise performance as expected: the magnetic information of the top-FGT is written by current-driven SOT and read out by TMR separately. The writing and reading current paths are physically decoupled, enhancing the design and optimization flexibility substantially and further strengthening the device's endurance naturally. Our work would prompt more expansive use of vdW magnets for spintronic applications.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Analysis of Log Data from an International Online Educational Assessment System: A Multi-state Survival Modeling Approach to Reaction Time between and across Action Sequence
Authors:
Jina Park,
Ick Hoon Jin,
Minjeong Jeon
Abstract:
With increasingly available computer-based or online assessments, researchers have shown keen interest in analyzing log data to improve our understanding of test takers' problem-solving processes. In this paper, we propose a multi-state survival model (MSM) to action sequence data from log files, focusing on modeling test takers' reaction times between actions, in order to investigate which factor…
▽ More
With increasingly available computer-based or online assessments, researchers have shown keen interest in analyzing log data to improve our understanding of test takers' problem-solving processes. In this paper, we propose a multi-state survival model (MSM) to action sequence data from log files, focusing on modeling test takers' reaction times between actions, in order to investigate which factors and how they influence test takers' transition speed between actions. In particular, we focus on the effects of the occurrence and timing of key actions that differentiate correct answers from incorrect answers. We demonstrate our proposed approach with problem-solving test items from the the Programme for International Assessment of Adult Competence (PIAAC) problem-solving test items.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Advanced Deep Operator Networks to Predict Multiphysics Solution Fields in Materials Processing and Additive Manufacturing
Authors:
Shashank Kushwaha,
Jaewan Park,
Seid Koric,
Junyan He,
Iwona Jasiuk,
Diab Abueidda
Abstract:
Unlike classical artificial neural networks, which require retraining for each new set of parametric inputs, the Deep Operator Network (DeepONet), a lately introduced deep learning framework, approximates linear and nonlinear solution operators by taking parametric functions (infinite-dimensional objects) as inputs and mapping them to complete solution fields. In this paper, two newly devised Deep…
▽ More
Unlike classical artificial neural networks, which require retraining for each new set of parametric inputs, the Deep Operator Network (DeepONet), a lately introduced deep learning framework, approximates linear and nonlinear solution operators by taking parametric functions (infinite-dimensional objects) as inputs and mapping them to complete solution fields. In this paper, two newly devised DeepONet formulations with sequential learning and Residual U-Net (ResUNet) architectures are trained for the first time to simultaneously predict complete thermal and mechanical solution fields under variable loading, loading histories, process parameters, and even variable geometries. Two real-world applications are demonstrated: 1- coupled thermo-mechanical analysis of steel continuous casting with multiple visco-plastic constitutive laws and 2- sequentially coupled direct energy deposition for additive manufacturing. Despite highly challenging spatially variable target stress distributions, DeepONets can infer reasonably accurate full-field temperature and stress solutions several orders of magnitude faster than traditional and highly optimized finite-element analysis (FEA), even when FEA simulations are run on the latest high-performance computing platforms. The proposed DeepONet model's ability to provide field predictions almost instantly for unseen input parameters opens the door for future preliminary evaluation and design optimization of these vital industrial processes.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Language Repository for Long Video Understanding
Authors:
Kumara Kahatapitiya,
Kanchana Ranasinghe,
Jongwoo Park,
Michael S. Ryoo
Abstract:
Language has become a prominent modality in computer vision with the rise of multi-modal LLMs. Despite supporting long context-lengths, their effectiveness in handling long-term information gradually declines with input length. This becomes critical, especially in applications such as long-form video understanding. In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintai…
▽ More
Language has become a prominent modality in computer vision with the rise of multi-modal LLMs. Despite supporting long context-lengths, their effectiveness in handling long-term information gradually declines with input length. This becomes critical, especially in applications such as long-form video understanding. In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i.e., all-textual) representation. Our repository is updated iteratively based on multi-scale video chunks. We introduce write and read operations that focus on pruning redundancies in text, and extracting information at various temporal scales. The proposed framework is evaluated on zero-shot visual question-answering benchmarks including EgoSchema, NExT-QA, IntentQA and NExT-GQA, showing state-of-the-art performance at its scale. Our code is available at https://github.com/kkahatapitiya/LangRepo.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Authors:
Soyeong Jeong,
Jinheon Baek,
Sukmin Cho,
Sung Ju Hwang,
Jong C. Park
Abstract:
Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnece…
▽ More
Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnecessary computational overhead or fail to adequately address complex multi-step queries; yet, not all user requests fall into only one of the simple or complex categories. In this work, we propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs from the simplest to the most sophisticated ones based on the query complexity. Also, this selection process is operationalized with a classifier, which is a smaller LM trained to predict the complexity level of incoming queries with automatically collected labels, obtained from actual predicted outcomes of models and inherent inductive biases in datasets. This approach offers a balanced strategy, seamlessly adapting between the iterative and single-step retrieval-augmented LLMs, as well as the no-retrieval methods, in response to a range of query complexities. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems, compared to relevant baselines including the adaptive retrieval approaches. Code is available at: https://github.com/starsuzi/Adaptive-RAG.
△ Less
Submitted 28 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics
Authors:
Yoonsung Kim,
Changhun Oh,
Jinwoo Hwang,
Wonung Kim,
Seongryong Oh,
Yubin Lee,
Hardik Sharma,
Amir Yazdanbakhsh,
Jongse Park
Abstract:
Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a l…
▽ More
Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.
△ Less
Submitted 28 April, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Stability analysis of the incompressible porous media equation and the Stokes transport system via energy structure
Authors:
Jaemin Park
Abstract:
In this paper, we revisit asymptotic stability for the two-dimensional incompressible porous media equation and the Stokes transport system in a periodic channel. It is well-known that a stratified density, which strictly decreases in the vertical direction, is asymptotically stable under sufficiently small and smooth perturbations. We provide improvements in the regularity assumptions on the pert…
▽ More
In this paper, we revisit asymptotic stability for the two-dimensional incompressible porous media equation and the Stokes transport system in a periodic channel. It is well-known that a stratified density, which strictly decreases in the vertical direction, is asymptotically stable under sufficiently small and smooth perturbations. We provide improvements in the regularity assumptions on the perturbation and in the convergence rate. Unlike the standard approach for stability analysis relying on linearized equations, we directly address the nonlinear problem by exploiting the energy structure of each system. While it is widely known that the potential energy is a Lyapunov functional in both systems, our key observation is that the second derivative of the potential energy reveals a (degenerate) coercive structure, which arises from the fact that the solution converges to the minimizer of the energy.
△ Less
Submitted 1 April, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
A Design Space for Intelligent and Interactive Writing Assistants
Authors:
Mina Lee,
Katy Ilonka Gero,
John Joon Young Chung,
Simon Buckingham Shum,
Vipul Raheja,
Hua Shen,
Subhashini Venugopalan,
Thiemo Wambsganss,
David Zhou,
Emad A. Alghamdi,
Tal August,
Avinash Bhat,
Madiha Zahrah Choksi,
Senjuti Dutta,
Jin L. C. Guo,
Md Naimul Hoque,
Yewon Kim,
Simon Knight,
Seyed Parsa Neshaei,
Agnia Sergeyuk,
Antonette Shibani,
Disha Shrivastava,
Lila Shroff,
Jessi Stark,
Sarah Sterman
, et al. (11 additional authors not shown)
Abstract:
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore…
▽ More
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore five aspects of writing assistants: task, user, technology, interaction, and ecosystem. Within each aspect, we define dimensions (i.e., fundamental components of an aspect) and codes (i.e., potential options for each dimension) by systematically reviewing 115 papers. Our design space aims to offer researchers and designers a practical tool to navigate, comprehend, and compare the various possibilities of writing assistants, and aid in the envisioning and design of new writing assistants.
△ Less
Submitted 26 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)
Authors:
Yimeng Fan,
Pedram Agand,
Mo Chen,
Edward J. Park,
Allison Kennedy,
Chanwoo Bae
Abstract:
The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static…
▽ More
The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI}
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus
Authors:
Seungpil Lee,
Woochang Sim,
Donghyeon Shin,
Sanha Hwang,
Wongyu Seo,
Jiwon Park,
Seokki Lee,
Sejin Kim,
Sundong Kim
Abstract:
The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logica…
▽ More
The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Accelerating String-Key Learned Index Structures via Memoization-based Incremental Training
Authors:
Minsu Kim,
Jinwoo Hwang,
Guseul Heo,
Seiyeon Cho,
Divya Mahajan,
Jongse Park
Abstract:
Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their models to incorporate the changes introduced by update queries. To efficiently retrain the models, existing learned index systems often harness a linea…
▽ More
Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their models to incorporate the changes introduced by update queries. To efficiently retrain the models, existing learned index systems often harness a linear algebraic QR factorization technique that performs matrix decomposition. This factorization approach processes all key-position pairs during each retraining, resulting in compute operations that grow linearly with the total number of keys and their lengths. Consequently, the retrainings create a severe performance bottleneck, especially for variable-length string keys, while the retrainings are crucial for maintaining high prediction accuracy and in turn, ensuring low query service latency.
To address this performance problem, we develop an algorithm-hardware co-designed string-key learned index system, dubbed SIA. In designing SIA, we leverage a unique algorithmic property of the matrix decomposition-based training method. Exploiting the property, we develop a memoization-based incremental training scheme, which only requires computation over updated keys, while decomposition results of non-updated keys from previous computations can be reused. We further enhance SIA to offload a portion of this training process to an FPGA accelerator to not only relieve CPU resources for serving index queries (i.e., inference), but also accelerate the training itself. Our evaluation shows that compared to ALEX, LIPP, and SIndex, a state-of-the-art learned index systems, SIA-accelerated learned indexes offer 2.6x and 3.4x higher throughput on the two real-world benchmark suites, YCSB and Twitter cache trace, respectively.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Deep Neural Network NMPC for Computationally Tractable Optimal Power Management of Hybrid Electric Vehicle
Authors:
Suyong Park,
Duc Giap Nguyen,
Jinrak Park,
Dohee Kim,
Jeong Soo Eo,
Kyoungseok Han
Abstract:
This study presents a method for deep neural network nonlinear model predictive control (DNN-MPC) to reduce computational complexity, and we show its practical utility through its application in optimizing the energy management of hybrid electric vehicles (HEVs). For optimal power management of HEVs, we first design the online NMPC to collect the data set, and the deep neural network is trained to…
▽ More
This study presents a method for deep neural network nonlinear model predictive control (DNN-MPC) to reduce computational complexity, and we show its practical utility through its application in optimizing the energy management of hybrid electric vehicles (HEVs). For optimal power management of HEVs, we first design the online NMPC to collect the data set, and the deep neural network is trained to approximate the NMPC solutions. We assess the effectiveness of our approach by conducting comparative simulations with rule and online NMPC-based power management strategies for HEV, evaluating both fuel consumption and computational complexity. Lastly, we verify the real-time feasibility of our approach through process-in-the-loop (PIL) testing. The test results demonstrate that the proposed method closely approximates the NMPC performance while substantially reducing the computational burden.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Modeling and Coverage Analysis of K-Tier Integrated Satellite-Terrestrial Downlink Networks
Authors:
Jungbin Yim,
Jeonghun Park,
Namyoon Lee
Abstract:
Integrated satellite-terrestrial networks (ISTNs) can significantly expand network coverage while diminishing reliance on terrestrial infrastructure. Despite the enticing potential of ISTNs, there is no comprehensive mathematical performance analysis framework for these emerging networks. In this paper, we introduce a tractable approach to analyze the downlink coverage performance of multi-tier IS…
▽ More
Integrated satellite-terrestrial networks (ISTNs) can significantly expand network coverage while diminishing reliance on terrestrial infrastructure. Despite the enticing potential of ISTNs, there is no comprehensive mathematical performance analysis framework for these emerging networks. In this paper, we introduce a tractable approach to analyze the downlink coverage performance of multi-tier ISTNs, where each network tier operates with orthogonal frequency bands. The proposed approach is to model the spatial distribution of cellular and satellite base stations using homogeneous Poisson point processes arranged on concentric spheres with varying radii. Central to our analysis is a displacement principle that transforms base station locations on different spheres into projected rings while preserving the distance distribution to the typical user. By incorporating the effects of Shadowed-Rician fading on satellite channels and employing orthogonal frequency bands, we derive analytical expressions for coverage in the integrated networks while keeping full generality. Our primary discovery is that network performance reaches its maximum when selecting the optimal density ratio of users associated with the network according to the density and the channel parameters of each network. Through simulations, we validate the precision of our derived expressions.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
Authors:
ChangSu Choi,
Yongbin Jeong,
Seoyoon Park,
InHo Won,
HyeonSeok Lim,
SangMin Kim,
Yejee Kang,
Chanhyuk Yoon,
Jaewan Park,
Yiseul Lee,
HyeJin Lee,
Younggyun Hahm,
Hansaem Kim,
KyungTae Lim
Abstract:
Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly…
▽ More
Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models.
△ Less
Submitted 21 March, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
A Comprehensive Review of Latent Space Dynamics Identification Algorithms for Intrusive and Non-Intrusive Reduced-Order-Modeling
Authors:
Christophe Bonneville,
Xiaolong He,
April Tran,
Jun Sur Park,
William Fries,
Daniel A. Messenger,
Siu Wun Cheung,
Yeonjong Shin,
David M. Bortz,
Debojyoti Ghosh,
Jiun-Shyan Chen,
Jonathan Belof,
Youngsoo Choi
Abstract:
Numerical solvers of partial differential equations (PDEs) have been widely employed for simulating physical systems. However, the computational cost remains a major bottleneck in various scientific and engineering applications, which has motivated the development of reduced-order models (ROMs). Recently, machine-learning-based ROMs have gained significant popularity and are promising for addressi…
▽ More
Numerical solvers of partial differential equations (PDEs) have been widely employed for simulating physical systems. However, the computational cost remains a major bottleneck in various scientific and engineering applications, which has motivated the development of reduced-order models (ROMs). Recently, machine-learning-based ROMs have gained significant popularity and are promising for addressing some limitations of traditional ROM methods, especially for advection dominated systems. In this chapter, we focus on a particular framework known as Latent Space Dynamics Identification (LaSDI), which transforms the high-fidelity data, governed by a PDE, to simpler and low-dimensional latent-space data, governed by ordinary differential equations (ODEs). These ODEs can be learned and subsequently interpolated to make ROM predictions. Each building block of LaSDI can be easily modulated depending on the application, which makes the LaSDI framework highly flexible. In particular, we present strategies to enforce the laws of thermodynamics into LaSDI models (tLaSDI), enhance robustness in the presence of noise through the weak form (WLaSDI), select high-fidelity training data efficiently through active learning (gLaSDI, GPLaSDI), and quantify the ROM prediction uncertainty through Gaussian processes (GPLaSDI). We demonstrate the performance of different LaSDI approaches on Burgers equation, a non-linear heat conduction problem, and a plasma physics problem, showing that LaSDI algorithms can achieve relative errors of less than a few percent and up to thousands of times speed-ups.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)
Authors:
Jeongeun Park,
Taemoon Jeong,
Hyeonseong Kim,
Taehyun Byun,
Seungyoon Shin,
Keunjun Choi,
Jaewoon Kwon,
Taeyoon Lee,
Matthew Pan,
Sungjoon Choi
Abstract:
This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animate…
▽ More
This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animated Social Kinematics (MASK), leverages an anthropomorphic robot which interacts with guests using non-verbal interactions, including facial expressions and gestures. A behavior generation system based upon a finite-state machine structure effectively conditions robotic behavior to convey distinct personas. The MASK framework integrates a perception engine, a behavior selection engine, and a comprehensive action library to enable real-time, dynamic interactions with minimal human intervention in behavior design. Throughout the user subject studies, we examined whether the users could recognize the intended character in film-character-based persona conditions. We conclude by discussing the role of personas in interactive agents and the factors to consider for creating an engaging user experience.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Thermodynamic Integration for Dynamically Unstable Systems Using Interatomic Force Constants without Molecular Dynamics
Authors:
Junsoo Park,
Zhigang Wu,
John W. Lawson
Abstract:
We demonstrate an efficient and accurate, general-purpose first-principles blueprint for calculating anharmonic vibrational free energy and predicting structural phase transition temperatures of solids. Thermodynamic integration is performed without molecular dynamics using only interatomic force constants to model analogues of the true potential and generate their thermal ensembles. By replacing…
▽ More
We demonstrate an efficient and accurate, general-purpose first-principles blueprint for calculating anharmonic vibrational free energy and predicting structural phase transition temperatures of solids. Thermodynamic integration is performed without molecular dynamics using only interatomic force constants to model analogues of the true potential and generate their thermal ensembles. By replacing \textit{ab initio} molecular dynamics (AIMD) with statistical sampling of ensemble configurations and trading density-functional theory (DFT) energy calculations on each configuration for a set of matrix operations, our approach enables a faster thermodynamic integration by 4 orders of magnitude over the traditional route via AIMD. Experimental phase transition temperatures of a variety of strongly anharmonic materials with dynamical instabilities including shape-memory alloys are recovered to largely within 25% error. Such a combination of speed and accuracy enables the method to be deployed at a large-scale for predictive mapping of phase transition temperatures.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report
Authors:
Evi M. C. Huijben,
Maarten L. Terpstra,
Arthur Jr. Galapon,
Suraj Pai,
Adrian Thummerer,
Peter Koopmans,
Manya Afonso,
Maureen van Eijnatten,
Oliver Gurney-Champion,
Zeli Chen,
Yiwen Zhang,
Kaiyi Zheng,
Chuanpu Li,
Haowen Pang,
Chuyang Ye,
Runqi Wang,
Tao Song,
Fuxin Fan,
Jingna Qiu,
Yixing Huang,
Juhyung Ha,
Jong Sung Park,
Alexandra Alain-Beaudoin,
Silvain Bériault,
Pengxin Yu
, et al. (34 additional authors not shown)
Abstract:
Radiation therapy plays a crucial role in cancer treatment, necessitating precise delivery of radiation to tumors while sparing healthy tissues over multiple days. Computed tomography (CT) is integral for treatment planning, offering electron density data crucial for accurate dose calculations. However, accurately representing patient anatomy is challenging, especially in adaptive radiotherapy, wh…
▽ More
Radiation therapy plays a crucial role in cancer treatment, necessitating precise delivery of radiation to tumors while sparing healthy tissues over multiple days. Computed tomography (CT) is integral for treatment planning, offering electron density data crucial for accurate dose calculations. However, accurately representing patient anatomy is challenging, especially in adaptive radiotherapy, where CT is not acquired daily. Magnetic resonance imaging (MRI) provides superior soft-tissue contrast. Still, it lacks electron density information while cone beam CT (CBCT) lacks direct electron density calibration and is mainly used for patient positioning. Adopting MRI-only or CBCT-based adaptive radiotherapy eliminates the need for CT planning but presents challenges. Synthetic CT (sCT) generation techniques aim to address these challenges by using image synthesis to bridge the gap between MRI, CBCT, and CT. The SynthRAD2023 challenge was organized to compare synthetic CT generation methods using multi-center ground truth data from 1080 patients, divided into two tasks: 1) MRI-to-CT and 2) CBCT-to-CT. The evaluation included image similarity and dose-based metrics from proton and photon plans. The challenge attracted significant participation, with 617 registrations and 22/17 valid submissions for tasks 1/2. Top-performing teams achieved high structural similarity indices (>0.87/0.90) and gamma pass rates for photon (>98.1%/99.0%) and proton (>97.3%/97.0%) plans. However, no significant correlation was found between image similarity metrics and dose accuracy, emphasizing the need for dose evaluation when assessing the clinical applicability of sCT. SynthRAD2023 facilitated the investigation and benchmarking of sCT generation techniques, providing insights for developing MRI-only and CBCT-based adaptive radiotherapy.
△ Less
Submitted 11 June, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.