-
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Authors:
Hyun Joon Park,
Jin Sob Kim,
Wooseok Shin,
Sung Won Han
Abstract:
Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a…
▽ More
Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a general diffusion TTS framework, DEX-TTS includes encoders and adapters to handle styles extracted from reference speech. Key innovations contain the differentiation of styles into time-invariant and time-variant categories for effective style extraction, as well as the design of encoders and adapters with high generalization ability. In addition, we introduce overlapping patchify and convolution-frequency patch embedding strategies to improve DiT-based diffusion networks for TTS. DEX-TTS yields outstanding performance in terms of objective and subjective evaluation in English multi-speaker and emotional multi-speaker datasets, without relying on pre-training strategies. Lastly, the comparison results for the general TTS on a single-speaker dataset verify the effectiveness of our enhanced diffusion backbone. Demos are available here.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing
Authors:
Nika Mansouri Ghiasi,
Mohammad Sadrosadati,
Harun Mustafa,
Arvid Gollwitzer,
Can Firtina,
Julien Eudine,
Haiyu Mao,
Joël Lindegger,
Meryem Banu Cavlak,
Mohammed Alser,
Jisung Park,
Onur Mutlu
Abstract:
Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag…
▽ More
Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storage processing can be a fundamental solution for reducing this overhead. However, designing an in-storage processing system for metagenomics is challenging because existing approaches to metagenomic analysis cannot be directly implemented in storage effectively due to the hardware limitations of modern SSDs. We propose MegIS, the first in-storage processing system designed to significantly reduce the data movement overhead of the end-to-end metagenomic analysis pipeline. MegIS is enabled by our lightweight design that effectively leverages and orchestrates processing inside and outside the storage system. We address in-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning, 2) data/computation flow coordination, 3) storage technology-aware algorithmic optimizations, 4) data mapping, and 5) lightweight in-storage accelerators. MegIS's design is flexible, capable of supporting different types of metagenomic input datasets, and can be integrated into various metagenomic analysis pipelines. Our evaluation shows that MegIS outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7$\times$-37.2$\times$ and 6.9$\times$-100.2$\times$, respectively, while matching the accuracy of the accuracy-optimized tool. MegIS achieves 1.5$\times$-5.1$\times$ speedup compared to the state-of-the-art metagenomic hardware-accelerated (using processing-in-memory) tool, while achieving significantly higher accuracy.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
360 in the Wild: Dataset for Depth Prediction and View Synthesis
Authors:
Kibaek Park,
Francois Rameau,
Jaesik Park,
In So Kweon
Abstract:
The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale…
▽ More
The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{\circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Superconducting phase diagram in Bi$_x$Ni$_{1-x}$ thin films$\colon$ the effects of Bi stoichiometry on superconductivity
Authors:
Jihun Park,
Jarryd A. Horn,
Dylan J. Kirsch,
Rohit K. Pant,
Hyeok Yoon,
Sungha Baek,
Suchismita Sarker,
Apurva Mehta,
Xiaohang Zhang,
Seunghun Lee,
Richard Greene,
Johnpierre Paglione,
Ichiro Takeuchi
Abstract:
The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unex…
▽ More
The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unexplored. In this work, we systematically studied the effects of Bi stoichiometry on the superconductivity of Bi$_x$Ni$_{1-x}$ thin films (${x} \approx$ 0.5 to 0.9) fabricated via a composition-spread approach. The superconducting phase map of Bi$_x$Ni$_{1-x}$ thin films exhibited a superconducting composition region attributable to the intermetallic Bi$_3$Ni phase with different amount of excess Bi, revealed by synchrotron X-ray diffraction analysis. Interestingly, the mixed phase region with Bi$_3$Ni and Bi showed unusual increases in the superconducting transition temperature and residual resistance ratio as more Bi impurities were included, with the maximum ${T}_{c}$ ($=$ 4.2 K) observed at $x \approx$ 0.79. A correlation analysis of structural, electrical, and magneto-transport characteristics across the composition variation revealed that the unusual superconducting $"$dome$"$ is due to two competing roles of Bi$\colon$ impurity scattering and carrier doping. We found that the carrier doping effect is dominant in the mild doping regime (0.74 $\leq {x} \leq$ 0.79), while impurity scattering becomes more pronounced at larger Bi stoichiometry.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SAM: Semi-Active Mechanism for Extensible Continuum Manipulator and Real-time Hysteresis Compensation Control Algorithm
Authors:
Junhyun Park,
Seonghyeok Jang,
Myeongbo Park,
Hyojae Park,
Jeonghyeon Yoon,
Minho Hwang
Abstract:
Cable-Driven Continuum Manipulators (CDCMs) enable scar-free procedures via natural orifices and improve target lesion accessibility through curved paths. However, CDCMs face limitations in workspace and control accuracy due to non-linear cable effects causing hysteresis. This paper introduces an extensible CDCM with a Semi-active Mechanism (SAM) to expand the workspace via translational motion wi…
▽ More
Cable-Driven Continuum Manipulators (CDCMs) enable scar-free procedures via natural orifices and improve target lesion accessibility through curved paths. However, CDCMs face limitations in workspace and control accuracy due to non-linear cable effects causing hysteresis. This paper introduces an extensible CDCM with a Semi-active Mechanism (SAM) to expand the workspace via translational motion without additional mechanical elements or actuation. We collect a hysteresis dataset using 8 fiducial markers and RGBD sensing. Based on this dataset, we develop a real-time hysteresis compensation control algorithm using the trained Temporal Convolutional Network (TCN) with a 1ms time latency, effectively estimating the manipulator's hysteresis behavior. Performance validation through random trajectory tracking tests and box pointing tasks shows the proposed controller significantly reduces hysteresis by up to 69.5% in joint space and approximately 26% in the box pointing task.
△ Less
Submitted 27 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Discrete-time thermodynamic speed limit
Authors:
Sangyun Lee,
Jae Sung Lee,
Jong-Min Park
Abstract:
As a fundamental thermodynamic principle, speed limits reveal the lower bound of entropy production (EP) required for a system to transition from a given initial state to a final state. While various speed limits have been developed for continuous-time Markov processes, their application to discrete-time Markov chains remains unexplored. In this study, we investigate the speed limits in discrete-t…
▽ More
As a fundamental thermodynamic principle, speed limits reveal the lower bound of entropy production (EP) required for a system to transition from a given initial state to a final state. While various speed limits have been developed for continuous-time Markov processes, their application to discrete-time Markov chains remains unexplored. In this study, we investigate the speed limits in discrete-time Markov chains, focusing on two types of EP commonly used to measure the irreversibility of a discrete-time process: time-reversed EP and time-backward EP. We find that time-reversed EP satisfies the speed limit for the continuous-time Markov processes, whereas time-backward EP does not. Additionally, for time-reversed EP, we derive practical speed limits applicable to systems driven by cyclic protocols or with unidirectional transitions, where conventional speed limits become meaningless or invalid. We show that these relations also hold for continuous-time Markov processes by taking the time-continuum limit of our results. Finally, we validate our findings through several examples.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
DiffusionPDE: Generative PDE-Solving Under Partial Observation
Authors:
Jiahe Huang,
Guandao Yang,
Zichen Wang,
Jeong Joon Park
Abstract:
We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which…
▽ More
We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Terahertz photocurrent probe of quantum geometry and interactions in magic-angle twisted bilayer graphene
Authors:
Roshan Krishna Kumar,
Geng Li,
Riccardo Bertini,
Swati Chaudhary,
Krystian Nowakowski,
Jeong Min Park,
Sebastian Castilla,
Zhen Zhan,
Pierre A. Pantaleón,
Hitesh Agarwal,
Sergi Battle-Porro,
Eike Icking,
Matteo Ceccanti,
Antoine Reserbat-Plantey,
Giulia Piccinini,
Julien Barrier,
Ekaterina Khestanova,
Takashi Taniguchi,
Kenji Watanabe,
Christoph Stampfer,
Gil Refael,
Francisco Guinea,
Pablo Jarillo-Herrero,
Justin C. W. Song,
Petr Stepanov
, et al. (2 additional authors not shown)
Abstract:
Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompa…
▽ More
Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompasses quantum "textures" of electron wavefunctions. Using terahertz light resonant with optical transitions of its flat bands, we observe bulk photocurrents driven by broken symmetries and reveal the interplay between electron interactions and quantum geometry. We observe inversion-breaking gapped states undetectable through quantum transport, sharp changes in the polarization axes caused by interaction-induced band renormalization, and recurring photocurrent patterns at integer fillings of the moiré unit cell that track the evolution of quantum geometry through the cascade of phase transitions. The large and tunable terahertz response intrinsic to flat-band systems offers direct insights into the quantum geometry of interacting electrons and paves the way for innovative terahertz quantum technologies.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Database-Augmented Query Representation for Information Retrieval
Authors:
Soyeong Jeong,
Jinheon Baek,
Sukmin Cho,
Sung Ju Hwang,
Jong C. Park
Abstract:
Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user…
▽ More
Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user-related) features related to the query. Yet, they may be suboptimal to effectively augment the query, though there is plenty of information available to augment it in a relational database. Motivated by this, we present a novel retrieval framework called Database-Augmented Query representation (DAQu), which augments the original query with various (query-related) metadata across multiple tables. In addition, as the number of features in the metadata can be very large and there is no order among them, we encode them with our graph-based set encoding strategy, which considers hierarchies of features in the database without order. We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database, demonstrating that ours significantly enhances overall retrieval performance, compared to existing query augmentation methods.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Search for charmed baryons in the $Λ_c^+η$ system and measurement of the branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ and $pD^0$ relative to $Σ_c(2455)π$
Authors:
Belle Collaboration,
S. X. Li,
C. P. Shen,
I. Adachi,
J. K. Ahn,
H. Aihara,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
Sw. Banerjee,
K. Belous,
J. Bennett,
M. Bessner,
T. Bilka,
D. Biswas,
D. Bodrov,
A. Bozek,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola,
M. -C. Chang,
B. G. Cheon
, et al. (102 additional authors not shown)
Abstract:
We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and…
▽ More
We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and $Λ_c(2940)^+$ signals are observed in the $pD^0$ mass spectrum. We set upper limits at 90\% credibility level on ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ relative to $Σ_c(2455)π$ of $<0.13$ for the $Λ_c(2880)^+$ and $<1.11$ for the $Λ_c(2940)^+$. We measure ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $pD^0$ relative to $Σ_c(2455)π$ of $0.75 \pm 0.03(\text{stat.}) \pm 0.07(\text{syst.})$ for the $Λ_c(2880)^+$ and $3.59 \pm 0.21(\text{stat.}) \pm 0.56(\text{syst.})$ for the $Λ_c(2940)^+$.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Sub-millisecond Entanglement and iSWAP Gate between Molecular Qubits
Authors:
Lewis R. B. Picard,
Annie J. Park,
Gabriel E. Patenotte,
Samuel Gebretsadkan,
David Wellnitz,
Ana Maria Rey,
Kang-Kuen Ni
Abstract:
Quantum computation (QC) and simulation rely on long-lived qubits with controllable interactions. Early work in quantum computing made use of molecules because of their readily available intramolecular nuclear spin coupling and chemical shifts, along with mature nuclear magnetic resonance techniques. Subsequently, the pursuit of many physical platforms has flourished. Trapped polar molecules have…
▽ More
Quantum computation (QC) and simulation rely on long-lived qubits with controllable interactions. Early work in quantum computing made use of molecules because of their readily available intramolecular nuclear spin coupling and chemical shifts, along with mature nuclear magnetic resonance techniques. Subsequently, the pursuit of many physical platforms has flourished. Trapped polar molecules have been proposed as a promising quantum computing platform, offering scalability and single-particle addressability while still leveraging inherent complexity and strong couplings of molecules. Recent progress in the single quantum state preparation and coherence of the hyperfine-rotational states of individually trapped molecules allows them to serve as promising qubits, with intermolecular dipolar interactions creating entanglement. However, universal two-qubit gates have not been demonstrated with molecules. Here, we harness intrinsic molecular resources to implement a two-qubit iSWAP gate using individually trapped $X^{1}Σ^{+}$ NaCs molecules. We characterize the innate dipolar interaction between rotational states and control its strength by tuning the polarization of the traps. By allowing the molecules to interact for 664 $μ$s at a distance of 1.9 $μ$m, we create a maximally entangled Bell state with a fidelity of 94(3)\%, following postselection to remove trials with empty traps. Using motion-rotation coupling, we measure residual excitation of the lowest few motional states along the axial trapping direction and find them to be the primary source of decoherence. Finally, we identify two non-interacting hyperfine states within the ground rotational level in which we encode a qubit. The interaction is toggled by transferring between interacting and non-interacting states to realize an iSWAP gate. We verify the gate performance by measuring its logical truth table.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model
Authors:
Doyoung Kim,
Jongwon Lee,
Jinho Park,
Minjoon Seo
Abstract:
Language models have demonstrated impressive capabilities across various natural language processing tasks, yet they struggle with planning tasks requiring multi-step simulations. Inspired by human cognitive processes, this paper investigates the optimal planning power of language models that can construct a cognitive map of a given environment. Our experiments demonstrate that cognitive map signi…
▽ More
Language models have demonstrated impressive capabilities across various natural language processing tasks, yet they struggle with planning tasks requiring multi-step simulations. Inspired by human cognitive processes, this paper investigates the optimal planning power of language models that can construct a cognitive map of a given environment. Our experiments demonstrate that cognitive map significantly enhances the performance of both optimal and reachable planning generation ability in the Gridworld path planning task. We observe that our method showcases two key characteristics similar to human cognition: \textbf{generalization of its planning ability to extrapolated environments and rapid adaptation with limited training data.} We hope our findings in the Gridworld task provide insights into modeling human cognitive processes in language models, potentially leading to the development of more advanced and robust systems that better resemble human cognition.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
RouteFinder: Towards Foundation Models for Vehicle Routing Problems
Authors:
Federico Berto,
Chuanbo Hua,
Nayeli Gast Zepeda,
André Hottung,
Niels Wouda,
Leon Lan,
Kevin Tierney,
Jinkyoo Park
Abstract:
Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder,…
▽ More
Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder, a framework for developing foundation models for VRPs. Our key idea is that a foundation model for VRPs should be able to model variants by treating each variant as a subset of a larger VRP problem, equipped with different attributes. We introduce a parallelized environment that can handle any combination of attributes at the same time in a batched manner, and an efficient sampling procedure to train on a mix of problems at each optimization step that can greatly improve convergence robustness. We also introduce novel Global Feature Embeddings that project instance-wise attributes efficiently onto the latent space and help the model understand different VRP variants. Finally, we introduce Efficient Adapter Layers, a simple yet effective technique to finetune pre-trained RouteFinder models to solve novel variants with previously unseen attributes outside of the original feature space. We validate our approach through extensive experiments on 24 VRP variants, demonstrating competitive results over recent multi-task learning models. We make our code openly available at https://github.com/ai4co/routefinder.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
A Unified Framework for Synthesizing Multisequence Brain MRI via Hybrid Fusion
Authors:
Jihoon Cho,
Jonghye Woo,
Jinah Park
Abstract:
Multisequence Magnetic Resonance Imaging (MRI) provides a reliable diagnosis in clinical applications through complementary information within sequences. However, in practice, the absence of certain MR sequences is a common problem that can lead to inconsistent analysis results. In this work, we propose a novel unified framework for synthesizing multisequence MR images, called Hybrid Fusion GAN (H…
▽ More
Multisequence Magnetic Resonance Imaging (MRI) provides a reliable diagnosis in clinical applications through complementary information within sequences. However, in practice, the absence of certain MR sequences is a common problem that can lead to inconsistent analysis results. In this work, we propose a novel unified framework for synthesizing multisequence MR images, called Hybrid Fusion GAN (HF-GAN). We introduce a hybrid fusion encoder designed to ensure the disentangled extraction of complementary and modality-specific information, along with a channel attention-based feature fusion module that integrates the features into a common latent space handling the complexity from combinations of accessible MR sequences. Common feature representations are transformed into a target latent space via the modality infuser to synthesize missing MR sequences. We have performed experiments on multisequence brain MRI datasets from healthy individuals and patients diagnosed with brain tumors. Experimental results show that our method outperforms state-of-the-art methods in both quantitative and qualitative comparisons. In addition, a detailed analysis of our framework demonstrates the superiority of our designed modules and their effectiveness for use in data imputation tasks.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Superfluid stiffness of twisted multilayer graphene superconductors
Authors:
Abhishek Banerjee,
Zeyu Hao,
Mary Kreidel,
Patrick Ledwith,
Isabelle Phinney,
Jeong Min Park,
Andrew M. Zimmerman,
Kenji Watanabe,
Takashi Taniguchi,
Robert M Westervelt,
Pablo Jarillo-Herrero,
Pavel A. Volkov,
Ashvin Vishwanath,
Kin Chung Fong,
Philip Kim
Abstract:
The robustness of the macroscopic quantum nature of a superconductor can be characterized by the superfluid stiffness, $ρ_s$, a quantity that describes the energy required to vary the phase of the macroscopic quantum wave function. In unconventional superconductors, such as cuprates, the low-temperature behavior of $ρ_s$ drastically differs from that of conventional superconductors due to quasipar…
▽ More
The robustness of the macroscopic quantum nature of a superconductor can be characterized by the superfluid stiffness, $ρ_s$, a quantity that describes the energy required to vary the phase of the macroscopic quantum wave function. In unconventional superconductors, such as cuprates, the low-temperature behavior of $ρ_s$ drastically differs from that of conventional superconductors due to quasiparticle excitations from gapless points (nodes) in momentum space. Intensive research on the recently discovered magic-angle twisted graphene family has revealed, in addition to superconducting states, strongly correlated electronic states associated with spontaneously broken symmetries, inviting the study of $ρ_s$ to uncover the potentially unconventional nature of its superconductivity. Here we report the measurement of $ρ_s$ in magic-angle twisted trilayer graphene (TTG), revealing unconventional nodal-gap superconductivity. Utilizing radio-frequency reflectometry techniques to measure the kinetic inductive response of superconducting TTG coupled to a microwave resonator, we find a linear temperature dependence of $ρ_s$ at low temperatures and nonlinear Meissner effects in the current bias dependence, both indicating nodal structures in the superconducting order parameter. Furthermore, the doping dependence shows a linear correlation between the zero temperature $ρ_s$ and the superconducting transition temperature $T_c$, reminiscent of Uemura's relation in cuprates, suggesting phase-coherence-limited superconductivity. Our results provide strong evidence for nodal superconductivity in TTG and put strong constraints on the mechanisms of these graphene-based superconductors.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation
Authors:
Jaehyun Park,
Dabeen Lee
Abstract:
We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an…
▽ More
We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an $\tilde{\mathcal{O}}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm \texttt{OVIFH-MNL} is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{\mathcal{O}}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function.
We also prove a lower bound of $Ω(d\sqrt{DT})$ for learning communicating MDPs with MNL transitions of diameter at most $D$. Furthermore, we show a regret lower bound of $Ω(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Mode Coupling and Breathing Oscillation in Partially Magnetized Cross-Field Plasmas
Authors:
Jong Yoon Park,
June Young Kim
Abstract:
We report on investigations of mode coupling between rotating spokes during the onset of the breathing oscillation. Demonstrating the existence of nonlinear coupling between the sporadic spokes and the breathing oscillation, we suggest the oscillating azimuthal electric field as the energy source for additional ionization within the plasma. Our results indicate that intermittent three-wave couplin…
▽ More
We report on investigations of mode coupling between rotating spokes during the onset of the breathing oscillation. Demonstrating the existence of nonlinear coupling between the sporadic spokes and the breathing oscillation, we suggest the oscillating azimuthal electric field as the energy source for additional ionization within the plasma. Our results indicate that intermittent three-wave coupling is a possible mechanism for triggering low-frequency breathing oscillations in partially magnetized cross-field plasma.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Meent: Differentiable Electromagnetic Simulator for Machine Learning
Authors:
Yongha Kim,
Anthony W. Jung,
Sanmun Kim,
Kevin Octavian,
Doyoung Heo,
Chaejin Park,
Jeongmin Shin,
Sunghyun Nam,
Chanhyung Park,
Juho Park,
Sangjun Han,
Jinmyoung Lee,
Seolho Kim,
Min Seok Jang,
Chan Y. Park
Abstract:
Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin…
▽ More
Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
neuSIM4: A comprehensive GEANT4 based neutron simulation code
Authors:
J. Park,
F. C. E. Teh,
M. B. Tsang,
K. W. Brown,
Z. Chajecki,
B. Hong,
T. Lokotko,
W. G. Lynch,
J. Wieske,
K. Zhu
Abstract:
A new neutron SIMulation program based on the versatile GEANT4 toolkit, neuSIM4, has been developed to describe interactions of neutrons in the NE213 liquid scintillator from 0.1 to 3000 MeV. neuSIM4 is designed to accommodate complicated modern detector geometry setups with multiple scintillator detectors, each of which can be outfitted with more than one photo-multiplier. To address a broad spec…
▽ More
A new neutron SIMulation program based on the versatile GEANT4 toolkit, neuSIM4, has been developed to describe interactions of neutrons in the NE213 liquid scintillator from 0.1 to 3000 MeV. neuSIM4 is designed to accommodate complicated modern detector geometry setups with multiple scintillator detectors, each of which can be outfitted with more than one photo-multiplier. To address a broad spectrum of neutron energies, two new neutron interaction physics models, KSCIN and NxQMD, have been implemented in GEANT4. For neutrons with energy below 110 MeV, we incorporate a total of eleven neutron induced reaction channels on hydrogen and carbon nuclei, including nine carbon inelastic reaction channels, into KSCIN. Beyond 110 MeV, we implement a neutron induced reaction model, NxQMD, in GEANT4. We use its results as reference to evaluate other neutron-interaction physics models in GEANT4. We find that results from an existing cascade physics model (INCL) in GEANT4 agree very well with the results from NxQMD, and results from both codes agree with new and existing light response data. To connect KSCIN to NxQMD or INCL, we introduce a transition region where the contribution of neuSIM4 linearly decreases with corresponding increased contributions from NxQMD or INCL. To demonstrate the application of the new code, we simulate the light response and performance of a 2 x 2 m2 neutron detector wall array consisting of 25 2m-long scintillation bars. We are able to compare the predicted light response functions to the shape of the experimental response functions and calculate the efficiency of the neutron detector array for neutron energies up to 200 MeV. These simulation results will be pivotal for understanding the performance of modern neutron arrays with intricate geometries, especially in the measurements of neutron energy spectra in heavy-ion reactions.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4
Authors:
Sang Won Son,
Jongyeon Park,
Hong Kook Kim,
Sulaiman Vesal,
Jeong Eun Lim
Abstract:
In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de…
▽ More
In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model's output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of logmel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in developing more robust and flexible SED models
△ Less
Submitted 24 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Authors:
Young Jin Ahn,
Jungwoo Park,
Sangha Park,
Jonghyun Choi,
Kee-Eung Kim
Abstract:
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel…
▽ More
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Authors:
Hoyeon Chang,
Jinho Park,
Seonghyeon Ye,
Sohee Yang,
Youngkyung Seo,
Du-Seong Chang,
Minjoon Seo
Abstract:
Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge ac…
▽ More
Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining. First, counterintuitively, we observe that pretraining on more data shows no significant improvement in the model's capability to acquire and maintain factual knowledge. Next, there is a power-law relationship between training steps and forgetting of memorization and generalization of factual knowledge, and LLMs trained with duplicated training data exhibit faster forgetting. Third, training LLMs with larger batch sizes can enhance the models' robustness to forgetting. Overall, our observations suggest that factual knowledge acquisition in LLM pretraining occurs by progressively increasing the probability of factual knowledge presented in the pretraining data at each step. However, this increase is diluted by subsequent forgetting. Based on this interpretation, we demonstrate that we can provide plausible explanations for recently observed behaviors of LLMs, such as the poor performance of LLMs on long-tail knowledge and the benefits of deduplicating the pretraining corpus.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Authors:
Ashish Kumar,
Daneul Kim,
Jaesik Park,
Laxmidhar Behera
Abstract:
Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d…
▽ More
Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for dynamic channel sampling, namely Pick-or-Mix (PiX), which does not require special implementation. PiX divides a set of channels into subsets and then picks from them, where the picking decision is dynamically made per each pixel based on the input activations. We plug PiX into prominent ConvNet architectures and verify its multi-purpose utilities. After replacing 1x1 channel squeezing layers in ResNet with PiX, the network becomes 25% faster without losing accuracy. We show that PiX allows ConvNets to learn better data representation than widely adopted approaches to enhance networks' representation power (e.g., SE, CBAM, AFF, SKNet, and DWP). We also show that PiX achieves state-of-the-art performance on network downscaling and dynamic channel pruning applications.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Federated Learning with Flexible Architectures
Authors:
Jong-Ik Park,
Carlee Joe-Wong
Abstract:
Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To addre…
▽ More
Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To address this need, this paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths. Each client can select a network architecture suitable for its resources, with shallower and thinner networks requiring fewer computing resources for training. Unlike prior work in this area, FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation. Layer grafting ensures that all client contributions are uniformly integrated into the global model, thereby minimizing the risk of any individual client's data skewing the model's parameters disproportionately and introducing security benefits. Moreover, FedFA introduces the scalable aggregation method to manage scale variations in weights among different network architectures. Experimentally, FedFA outperforms previous width and depth flexible aggregation strategies. Furthermore, FedFA demonstrates increased robustness against performance degradation in backdoor attack scenarios compared to earlier strategies.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Self-Knowledge Distillation for Learning Ambiguity
Authors:
Hancheol Park,
Soyeong Jeong,
Sukmin Cho,
Jong C. Park
Abstract:
Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models t…
▽ More
Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately by leveraging knowledge distilled from their lower layers. This approach also includes a learning phase that re-calibrates the unnecessarily strengthened confidence for training samples judged as extremely ambiguous based on the distilled distribution knowledge. We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions. Particularly, through the process of re-calibrating the confidence for highly ambiguous samples, the issue of over-confidence when predictions for unseen samples do not match with their ground-truth labels has been significantly alleviated. This has been shown to contribute to generating better distributions than the existing state-of-the-art method. Moreover, our method is more efficient in training the models compared to the existing method, as it does not involve additional training processes to refine label distributions.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA
Authors:
Jongwoo Park,
Kanchana Ranasinghe,
Kumara Kahatapitiya,
Wonjeong Ryoo,
Donghyun Kim,
Michael S. Ryoo
Abstract:
Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large lan…
▽ More
Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large language models (LLMs) in LVQA benchmarks, achieving exceptional performance, while relying on vision language models (VLMs) to convert all visual content within videos into natural language. Such VLMs often independently caption a large number of frames uniformly sampled from long videos, which is not efficient and can mostly be redundant. Questioning these decision choices, we explore optimal strategies for key-frame selection and sequence-aware captioning, that can significantly reduce these redundancies. We propose two novel approaches that improve each of aspects, namely Hierarchical Keyframe Selector and Sequential Visual LLM. Our resulting framework termed LVNet achieves state-of-the-art performance across three benchmark LVQA datasets. Our code will be released publicly.
△ Less
Submitted 17 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
DeepJEB: 3D Deep Learning-based Synthetic Jet Engine Bracket Dataset
Authors:
Seongjun Hong,
Yongmin Kwon,
Dongju Shin,
Jangseop Park,
Namwoo Kang
Abstract:
Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be…
▽ More
Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be improved for developing robust data-driven surrogate models. This study presents the DeepJEB dataset, which has been created using deep generative models and automated engineering simulation pipelines, to overcome these challenges. Moreover, this study provides comprehensive 3D geometries and their corresponding structural analysis data.
Key experiments validated the effectiveness of the DeepJEB dataset, demonstrating significant improvements in the prediction accuracy and reliability of surrogate models trained on this data. The enhanced dataset showed a broader design space and better generalization capabilities than traditional datasets. These findings highlight the potential of DeepJEB as a benchmark dataset for developing reliable surrogate models in structural engineering. The DeepJEB dataset supports advanced modeling techniques, such as graph neural networks (GNNs) and high-dimensional convolutional networks (CNNs), leveraging node-level field data for precise predictions. This dataset is set to drive innovation in engineering design applications, enabling more accurate and efficient structural performance predictions. The DeepJEB dataset is publicly accessible at: https://www.narnia.ai/dataset
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution
Authors:
Juhee Kim,
Jinbum Park,
Sihyeon Roh,
Jaeyoung Chung,
Youngjoo Lee,
Taesoo Kim,
Byoungyoung Lee
Abstract:
ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential secur…
▽ More
ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential security risks posed by speculative execution attacks against MTE. Specifically, this paper identifies new TikTag gadgets capable of leaking the MTE tags from arbitrary memory addresses through speculative execution. With TikTag gadgets, attackers can bypass the probabilistic defense of MTE, increasing the attack success rate by close to 100%. We demonstrate that TikTag gadgets can be used to bypass MTE-based mitigations in real-world systems, Google Chrome and the Linux kernel. Experimental results show that TikTag gadgets can successfully leak an MTE tag with a success rate higher than 95% in less than 4 seconds. We further propose new defense mechanisms to mitigate the security risks posed by TikTag gadgets.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Heterogeneous Beliefs Model of Stock Market Predictability
Authors:
Jiho Park
Abstract:
This paper proposes a theory of stock market predictability patterns based on a model of heterogeneous beliefs. In a discrete finite time framework, some agents receive news about an asset's fundamental value through a noisy signal. The investors are heterogeneous in that they have different beliefs about the stochastic supply. A momentum in the stock price arises from those agents who incorrectly…
▽ More
This paper proposes a theory of stock market predictability patterns based on a model of heterogeneous beliefs. In a discrete finite time framework, some agents receive news about an asset's fundamental value through a noisy signal. The investors are heterogeneous in that they have different beliefs about the stochastic supply. A momentum in the stock price arises from those agents who incorrectly underestimate the signal accuracy, dampening the initial price impact of the signal. A reversal in price occurs because the price reverts to the fundamental value in the long run. An extension of the model to multiple assets case predicts co-movement and lead-lag effect, in addition to cross-sectional momentum and reversal. The heterogeneous beliefs of investors about news demonstrate how the main predictability anomalies arise endogenously in a model of bounded rationality.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ONNXim: A Fast, Cycle-level Multi-core NPU Simulator
Authors:
Hyungkyu Ham,
Wonhyuk Yang,
Yunseon Shin,
Okkyun Woo,
Guseul Heo,
Sangyeop Lee,
Jongse Park,
Gwangsun Kim
Abstract:
As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing architectural NPU simulators lack support for high-speed simulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or different de…
▽ More
As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing architectural NPU simulators lack support for high-speed simulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or different deep learning frameworks. To address these limitations, this work proposes ONNXim, a fast cycle-level simulator for multi-core NPUs in DNN serving systems. It takes DNN models represented in the ONNX graph format generated from various deep learning frameworks for ease of simulation. In addition, based on the observation that typical NPU cores process tensor tiles from on-chip scratchpad memory with deterministic compute latency, we forgo a detailed modeling for the computation while still preserving simulation accuracy. ONNXim also preserves dependencies between compute and tile DMAs. Meanwhile, the DRAM and NoC are modeled in cycle-level to properly model contention among multiple cores that can execute different DNN models for multi-tenancy. Consequently, ONNXim is significantly faster than existing simulators (e.g., by up to 384x over Accel-sim) and enables various case studies, such as multi-tenant NPUs, that were previously impractical due to slow speed and/or lack of functionalities. ONNXim is publicly available at https://github.com/PSAL-POSTECH/ONNXim.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Authors:
Se Jin Park,
Chae Won Kim,
Hyeongseop Rha,
Minsu Kim,
Joanna Hong,
Jeong Hun Yeo,
Yong Man Ro
Abstract:
In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp…
▽ More
In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corpus containing 340 hours of approximately 9,000 dialogues, recorded based on the open domain dialogue dataset, TopicalChat. The MultiDialog contains parallel audio-visual recordings of conversation partners acting according to the given script with emotion annotations, which we expect to open up research opportunities in multimodal synthesis. Our Face-to-Face spoken dialogue model incorporates a textually pretrained large language model and adapts it into the audio-visual spoken dialogue domain by incorporating speech-text joint pretraining. Through extensive experiments, we validate the effectiveness of our model in facilitating a face-to-face conversation. Demo and data are available at https://multidialog.github.io and https://huggingface.co/datasets/IVLLab/MultiDialog, respectively.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Frustrated phonon with charge density wave in vanadium Kagome metal
Authors:
Seung-Phil Heo,
Choongjae Won,
Heemin Lee,
Hanbyul Kim,
Eunyoung Park,
Sung Yun Lee,
Junha Hwang,
Hyeongi Choi,
Sang-Youn Park,
Byungjune Lee,
Woo-Suk Noh,
Hoyoung Jang,
Jae-Hoon Park,
Dongbin Shin,
Changyong Song
Abstract:
Crystals with unique ionic arrangements and strong electronic correlations serve as a fertile ground for the emergence of exotic phases, as evidenced by the coexistence of charge density wave (CDW) and superconductivity in vanadium Kagome metals, specifically AV3Sb5 (where A represents K, Rb, or Cs). The formation of a star of David CDW superstructure, resulting from the coordinated displacements…
▽ More
Crystals with unique ionic arrangements and strong electronic correlations serve as a fertile ground for the emergence of exotic phases, as evidenced by the coexistence of charge density wave (CDW) and superconductivity in vanadium Kagome metals, specifically AV3Sb5 (where A represents K, Rb, or Cs). The formation of a star of David CDW superstructure, resulting from the coordinated displacements of vanadium ions on a corner sharing triangular lattice, has garnered significant attention in efforts to comprehend the influence of electron phonon interaction within this geometrically intricate lattice. However, understanding of the underlying mechanism behind CDW formation, coupled with symmetry protected lattice vibrations, remains elusive. In this study, we employed time resolved X ray scattering experiments utilising an X ray free electron laser. Our findings reveal that the phonon mode associated with the out of plane motion of Cs ions becomes frustrated in the CDW phase. Furthermore, we observed the photoinduced emergence of a metastable CDW phase, facilitated by the alleviation of frustration through nonadiabatic changes in free energy. By elucidating the longstanding puzzle surrounding the intervention of phonons in CDW ordering, this research offers fresh insights into the competition between phonons and periodic lattice distortions, a phenomenon widespread in other correlated quantum materials including layered high Tc superconductors.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
How is the Pilot Doing: VTOL Pilot Workload Estimation by Multimodal Machine Learning on Psycho-physiological Signals
Authors:
Jong Hoon Park,
Lawrence Chen,
Ian Higgins,
Zhaobo Zheng,
Shashank Mehrotra,
Kevin Salubre,
Mohammadreza Mousaei,
Steven Willits,
Blain Levedahl,
Timothy Buker,
Eliot Xing,
Teruhisa Misu,
Sebastian Scherer,
Jean Oh
Abstract:
Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a cr…
▽ More
Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a critical factor for safe and efficient operation of VTOLs. In this work, we conduct a user study to collect multimodal data from 28 pilots while they perform a variety of VTOL flight tasks. We analyze and interpolate behavioral patterns related to their performance and perceived workload. Finally, we build machine learning models to estimate their workload from the collected data. Our results are promising, suggesting that quantitative and accurate VTOL pilot workload monitoring is viable. Such assistive tools would help the research field understand VTOL operations and serve as a stepping stone for the industry to ensure VTOL safe operations and further remote operations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the branching fractions of $\bar{B}\to D^{(*)} K^- K^{(*)0}_{(S)}$ and $\bar{B}\to D^{(*)}D_s^{-}$ decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (382 additional authors not shown)
Abstract:
We present measurements of the branching fractions of eight $\overline B{}^0\to D^{(*)+} K^- K^{(*)0}_{(S)}$, $B^{-}\to D^{(*)0} K^- K^{(*)0}_{(S)}$ decay channels. The results are based on data from SuperKEKB electron-positron collisions at the $Υ(4S)$ resonance collected with the Belle II detector, corresponding to an integrated luminosity of $362~\text{fb}^{-1}$. The event yields are extracted…
▽ More
We present measurements of the branching fractions of eight $\overline B{}^0\to D^{(*)+} K^- K^{(*)0}_{(S)}$, $B^{-}\to D^{(*)0} K^- K^{(*)0}_{(S)}$ decay channels. The results are based on data from SuperKEKB electron-positron collisions at the $Υ(4S)$ resonance collected with the Belle II detector, corresponding to an integrated luminosity of $362~\text{fb}^{-1}$. The event yields are extracted from fits to the distributions of the difference between expected and observed $B$ meson energy, and are efficiency-corrected as a function of $m(K^-K^{(*)0}_{(S)})$ and $m(D^{(*)}K^{(*)0}_{(S)})$ in order to avoid dependence on the decay model. These results include the first observation of $\overline B{}^0\to D^+K^-K_S^0$, $B^-\to D^{*0}K^-K_S^0$, and $\overline B{}^0\to D^{*+}K^-K_S^0$ decays and a significant improvement in the precision of the other channels compared to previous measurements. The helicity-angle distributions and the invariant mass distributions of the $K^- K^{(*)0}_{(S)}$ systems are compatible with quasi-two-body decays via a resonant transition with spin-parity $J^P=1^-$ for the $K^-K_S^0$ systems and $J^P= 1^+$ for the $K^-K^{*0}$ systems. We also present measurements of the branching fractions of four $\overline B{}^0\to D^{(*)+} D_s^-$, $B^{-}\to D^{(*)0} D_s^- $ decay channels with a precision compatible to the current world averages.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Exclusion of the Cosmological Triangle in Reactor-Based Search for Axion-Like Particles
Authors:
Byung Ju Park,
Jae Jin Choi,
Eunju Jeon,
Jinyu Kim,
Kyungwon Kim,
Sung Hyun Kim,
Sun Kee Kim,
Yeongduk Kim,
Young Ju Ko,
Byoung-Cheol Koh,
Chang Hyon Ha,
Seo Hyun Lee,
In Soo Lee,
Hyunseok Lee,
Hyun Su Lee,
Jaison Lee,
Yoomin Oh,
Doojin Kim
Abstract:
We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the…
▽ More
We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the energy spectra of data collected during reactor-on (1596 kg$\cdot$days exposure) and reactor-off (1467 kg$\cdot$days exposure) periods. No signal consistent with ALP interaction was identified, allowing us to set exclusion limits at the 95% confidence level. Our limits cover previously unexplored regions for both photon couplings (${g_{aγ}}$) and electron couplings (${g_{ae}}$) for axion masses around 1 MeV/c$^2$. Notably, the NEON data excludes the unconstrained region identified by laboratory-based searches for photon couplings within the "cosmological triangle" for the first time. The observed 95\% confidence level limits reach as low as ${g_{aγ}}$ of 4.33$\times$ 10$^{-8}$ GeV$^{-1}$ and ${g_{ae}}$ of 1.10$\times$ 10$^{-9}$ for axion masses of 1.7 MeV/c$^2$ and 1.0 MeV/c$^2$, respectively.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024
Authors:
Jinwoo Ahn,
Junhyeok Park,
Min-Jun Kim,
Kang-Hyeon Kim,
So-Yeong Sohn,
Yun-Ji Lee,
Du-Seong Chang,
Yu-Jung Heo,
Eun-Sol Kim
Abstract:
In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m…
▽ More
In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two main ideas. First, to utilize the reasoning ability of a large-scale language model (LLM), the given visual cues (images) are grounded in the text modality. For this purpose, we generate highly detailed text captions that describe the context of the image and use these captions as input for the LLM. Second, due to the nature of puzzle images, which often contain various geometric visual patterns, we utilize an object detection algorithm to ensure these patterns are not overlooked in the captioning process. We employed the SAM algorithm, which can detect various-size objects, to capture the visual features of these geometric patterns and used this information as input for the LLM. Under the puzzle split configuration, we achieved an option selection accuracy Oacc of 29.5 on the test set and a weighted option selection accuracy (WOSA) of 27.1 on the challenge set.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of $Ξ_{c}^{0}\toΞ^{0}π^{0}$, $Ξ_{c}^{0}\toΞ^{0}η$, and $Ξ_{c}^{0}\toΞ^{0}η^{\prime}$ and asymmetry parameter of $Ξ_{c}^{0}\toΞ^{0}π^{0}$
Authors:
Belle,
Belle II Collaborations,
:,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (360 additional authors not shown)
Abstract:
We present a study of $Ξ_{c}^{0}\toΞ^{0}π^{0}$, $Ξ_{c}^{0}\toΞ^{0}η$, and $Ξ_{c}^{0}\toΞ^{0}η^{\prime}$ decays using the Belle and Belle~II data samples, which have integrated luminosities of 980~$\mathrm{fb}^{-1}$ and 426~$\mathrm{fb}^{-1}$, respectively. We measure the following relative branching fractions…
▽ More
We present a study of $Ξ_{c}^{0}\toΞ^{0}π^{0}$, $Ξ_{c}^{0}\toΞ^{0}η$, and $Ξ_{c}^{0}\toΞ^{0}η^{\prime}$ decays using the Belle and Belle~II data samples, which have integrated luminosities of 980~$\mathrm{fb}^{-1}$ and 426~$\mathrm{fb}^{-1}$, respectively. We measure the following relative branching fractions $${\cal B}(Ξ_{c}^{0}\toΞ^{0}π^{0})/{\cal B}(Ξ_{c}^{0}\toΞ^{-}π^{+}) = 0.48 \pm 0.02 ({\rm stat}) \pm 0.03 ({\rm syst}) ,$$ $${\cal B}(Ξ_{c}^{0}\toΞ^{0}η)/{\cal B}(Ξ_{c}^{0}\toΞ^{-}π^{+}) = 0.11 \pm 0.01 ({\rm stat}) \pm 0.01 ({\rm syst}) ,$$ $${\cal B}(Ξ_{c}^{0}\toΞ^{0}η^{\prime})/{\cal B}(Ξ_{c}^{0}\toΞ^{-}π^{+}) = 0.08 \pm 0.02 ({\rm stat}) \pm 0.01 ({\rm syst}) $$ for the first time, where the uncertainties are statistical ($\rm stat$) and systematic ($\rm syst$). By multiplying by the branching fraction of the normalization mode, ${\mathcal B}(Ξ_{c}^{0}\toΞ^{-}π^{+})$, we obtain the following absolute branching fraction results $(6.9 \pm 0.3 ({\rm stat}) \pm 0.5 ({\rm syst}) \pm 1.3 ({\rm norm})) \times 10^{-3}$, $(1.6 \pm 0.2 ({\rm stat}) \pm 0.2 ({\rm syst}) \pm 0.3 ({\rm norm})) \times 10^{-3}$, and $(1.2 \pm 0.3 ({\rm stat}) \pm 0.1 ({\rm syst}) \pm 0.2 ({\rm norm})) \times 10^{-3}$, for $Ξ_{c}^{0}$ decays to $Ξ^{0}π^{0}$, $Ξ^{0}η$, and $Ξ^{0}η^{\prime}$ final states, respectively. The third errors are from the uncertainty on ${\mathcal B}(Ξ_{c}^{0}\toΞ^{-}π^{+})$. The asymmetry parameter for $Ξ_{c}^{0}\toΞ^{0}π^{0}$ is measured to be $α(Ξ_{c}^{0}\toΞ^{0}π^{0}) = -0.90\pm0.15({\rm stat})\pm0.23({\rm syst})$.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
Authors:
Jisu Shin,
Hoyun Song,
Huije Lee,
Soyeong Jeong,
Jong C. Park
Abstract:
Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities. To fully understand such social bias in large language models (LLMs), it is essential to consider the composite of social perceptions from diverse perspectives among identities. Previous studies have either evaluated biases in LLMs by indirectly assessing the presence of sentiment…
▽ More
Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities. To fully understand such social bias in large language models (LLMs), it is essential to consider the composite of social perceptions from diverse perspectives among identities. Previous studies have either evaluated biases in LLMs by indirectly assessing the presence of sentiments towards demographic identities in the generated text or measuring the degree of alignment with given stereotypes. These methods have limitations in directly quantifying social biases at the level of distinct perspectives among identities. In this paper, we aim to investigate how social perceptions from various viewpoints contribute to the development of social bias in LLMs. To this end, we propose a novel strategy to intuitively quantify these social perceptions and suggest metrics that can evaluate the social biases within LLMs by aggregating diverse social perceptions. The experimental results show the quantitative demonstration of the social attitude in LLMs by examining social perception. The analysis we conducted shows that our proposed metrics capture the multi-dimensional aspects of social bias, enabling a fine-grained and comprehensive investigation of bias in LLMs.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach
Authors:
Saehyung Lee,
Sangwon Yu,
Junsung Park,
Jihun Yi,
Sungroh Yoon
Abstract:
In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling…
▽ More
In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling the use of any arbitrary black-box model. Second, we construct the LLM questioner to generate non-redundant questions about the attributes of the target image, based on the information of retrieval candidate images in the current context. This approach mitigates the issues of noisiness and redundancy in the generated questions. Beyond our methodology, we propose a novel evaluation metric, Best log Rank Integral (BRI), for a comprehensive assessment of the interactive retrieval system. PlugIR demonstrates superior performance compared to both zero-shot and fine-tuned baselines in various benchmarks. Additionally, the two methodologies comprising PlugIR can be flexibly applied together or separately in various situations. Our codes are available at https://github.com/Saehyung-Lee/PlugIR.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction
Authors:
Jeiyoon Park,
Chanjun Park,
Heuiseok Lim
Abstract:
We explore and improve the capabilities of LLMs to generate data for grammatical error correction (GEC). When merely producing parallel sentences, their patterns are too simplistic to be valuable as a corpus. To address this issue, we propose an automated framework that includes a Subject Selector, Grammar Selector, Prompt Manager, and Evaluator. Additionally, we introduce a new dataset for GEC ta…
▽ More
We explore and improve the capabilities of LLMs to generate data for grammatical error correction (GEC). When merely producing parallel sentences, their patterns are too simplistic to be valuable as a corpus. To address this issue, we propose an automated framework that includes a Subject Selector, Grammar Selector, Prompt Manager, and Evaluator. Additionally, we introduce a new dataset for GEC tasks, named ChatLang-8, which encompasses eight types of subject nouns and 23 types of grammar. It consists of 1 million pairs featuring human-like grammatical errors. Our experiments reveal that ChatLang-8 exhibits a more uniform pattern composition compared to existing GEC datasets. Furthermore, we observe improved model performance when using ChatLang-8 instead of existing GEC datasets. The experimental results suggest that our framework and ChatLang-8 are valuable resources for enhancing ChatGPT's data generation capabilities.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy
Authors:
Yunho Kim,
Jeong Hyun Lee,
Choongin Lee,
Juhyeok Mun,
Donghoon Youm,
Jeongsoo Park,
Jemin Hwangbo
Abstract:
For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves m…
▽ More
For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable. The summary video is available at https://youtu.be/EUVoH-wA-lA.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task
Authors:
Unggi Lee,
Jiyeong Bae,
Dohee Kim,
Sookbun Lee,
Jaekwon Park,
Taekyung Ahn,
Gunho Lee,
Damji Stratton,
Hyeoncheol Kim
Abstract:
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that…
▽ More
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.
△ Less
Submitted 9 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Oscillations enhance time-series prediction in reservoir computing with feedback
Authors:
Yuji Kawai,
Takashi Morita,
Jihoon Park,
Minoru Asada
Abstract:
Reservoir computing, a machine learning framework used for modeling the brain, can predict temporal data with little observations and minimal computational resources. However, it is difficult to accurately reproduce the long-term target time series because the reservoir system becomes unstable. This predictive capability is required for a wide variety of time-series processing, including predictio…
▽ More
Reservoir computing, a machine learning framework used for modeling the brain, can predict temporal data with little observations and minimal computational resources. However, it is difficult to accurately reproduce the long-term target time series because the reservoir system becomes unstable. This predictive capability is required for a wide variety of time-series processing, including predictions of motor timing and chaotic dynamical systems. This study proposes oscillation-driven reservoir computing (ODRC) with feedback, where oscillatory signals are fed into a reservoir network to stabilize the network activity and induce complex reservoir dynamics. The ODRC can reproduce long-term target time series more accurately than conventional reservoir computing methods in a motor timing and chaotic time-series prediction tasks. Furthermore, it generates a time series similar to the target in the unexperienced period, that is, it can learn the abstract generative rules from limited observations. Given these significant improvements made by the simple and computationally inexpensive implementation, the ODRC would serve as a practical model of various time series data. Moreover, we will discuss biological implications of the ODRC, considering it as a model of neural oscillations and their cerebellar processors.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
Authors:
Colin Hansen,
Simas Glinskis,
Ashwin Raju,
Micha Kornreich,
JinHyeong Park,
Jayashri Pawar,
Richard Herzog,
Li Zhang,
Benjamin Odry
Abstract:
Data driven models for automated diagnosis in radiology suffer from insufficient and imbalanced datasets due to low representation of pathology in a population and the cost of expert annotations. Datasets can be bolstered through data augmentation. However, even when utilizing a full suite of transformations during model training, typical data augmentations do not address variations in human anato…
▽ More
Data driven models for automated diagnosis in radiology suffer from insufficient and imbalanced datasets due to low representation of pathology in a population and the cost of expert annotations. Datasets can be bolstered through data augmentation. However, even when utilizing a full suite of transformations during model training, typical data augmentations do not address variations in human anatomy. An alternative direction is to synthesize data using generative models, which can potentially craft datasets with specific attributes. While this holds promise, commonly used generative models such as Generative Adversarial Networks may inadvertently produce anatomically inaccurate features. On the other hand, diffusion models, which offer greater stability, tend to memorize training data, raising concerns about privacy and generative diversity. Alternatively, inpainting has the potential to augment data through directly inserting pathology in medical images. However, this approach introduces a new challenge: accurately merging the generated pathological features with the surrounding anatomical context. While inpainting is a well established method for addressing simple lesions, its application to pathologies that involve complex structural changes remains relatively unexplored. We propose an efficient method for inpainting pathological features onto healthy anatomy in MRI through voxelwise noise scheduling in a latent diffusion model. We evaluate the method's ability to insert disc herniation and central canal stenosis in lumbar spine sagittal T2 MRI, and it achieves superior Frechet Inception Distance compared to state-of-the-art methods.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
Authors:
ChaeHun Park,
Koanho Lee,
Hyesu Lim,
Jaeseok Kim,
Junmo Park,
Yu-Jung Heo,
Du-Seong Chang,
Jaegul Choo
Abstract:
Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual…
▽ More
Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual models (i.e., translate-test). However, our analysis reveals that translated texts contain unique characteristics distinct from human-written ones, referred to as translation artifacts. We find that these artifacts can significantly affect the models, confirmed by extensive experiments across diverse models, languages, and translation processes. In light of this, we present a simple data augmentation strategy that can alleviate the adverse impacts of translation artifacts.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Bayesian Mesh Optimization for Graph Neural Networks to Enhance Engineering Performance Prediction
Authors:
Jangseop Park,
Namwoo Kang
Abstract:
In engineering design, surrogate models are widely employed to replace computationally expensive simulations by leveraging design variables and geometric parameters from computer-aided design (CAD) models. However, these models often lose critical information when simplified to lower dimensions and face challenges in parameter definition, especially with the complex 3D shapes commonly found in ind…
▽ More
In engineering design, surrogate models are widely employed to replace computationally expensive simulations by leveraging design variables and geometric parameters from computer-aided design (CAD) models. However, these models often lose critical information when simplified to lower dimensions and face challenges in parameter definition, especially with the complex 3D shapes commonly found in industrial datasets. To address these limitations, we propose a Bayesian graph neural network (GNN) framework for a 3D deep-learning-based surrogate model that predicts engineering performance by directly learning geometric features from CAD using mesh representation. Our framework determines the optimal size of mesh elements through Bayesian optimization, resulting in a high-accuracy surrogate model. Additionally, it effectively handles the irregular and complex structures of 3D CADs, which differ significantly from the regular and uniform pixel structures of 2D images typically used in deep learning. Experimental results demonstrate that the quality of the mesh significantly impacts the prediction accuracy of the surrogate model, with an optimally sized mesh achieving superior performance. We compare the performance of models based on various 3D representations such as voxel, point cloud, and graph, and evaluate the computational costs of Monte Carlo simulation and Bayesian optimization methods to find the optimal mesh size. We anticipate that our proposed framework has the potential to be applied to mesh-based simulations across various engineering fields, leveraging physics-based information commonly used in computer-aided engineering.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Error Field Predictability and Consequences for ITER
Authors:
Matthew Pharr,
Nikolas Logan,
Carlos Paz-Soldan,
Jong-Kyu Park,
Chistopher Hansen
Abstract:
ITER coil tolerances are re-evaluated using the modern understanding of coupling to least-stable plasma modes and an updated center-line-traced model of ITER's coil windings. This reassessment finds the tolerances to be conservative through a statistical, linear study of $n=1$ error fields (EFs) due to tilted, shifted misplacements and nominal windings of central solenoid and poloidal field coils…
▽ More
ITER coil tolerances are re-evaluated using the modern understanding of coupling to least-stable plasma modes and an updated center-line-traced model of ITER's coil windings. This reassessment finds the tolerances to be conservative through a statistical, linear study of $n=1$ error fields (EFs) due to tilted, shifted misplacements and nominal windings of central solenoid and poloidal field coils within tolerance. We also show that a model-based correction scheme remains effective even when metrology quality is sub-optimal, and compare this to projected empirical correction schemes. We begin with an analysis of the necessity of error field correction (EFC) for daily operation in ITER using scaling laws for the EF penetration threshold. We then consider the predictability of EF dominant mode overlap across early planned ITER scenarios and, as measuring EFs in high power scenarios can pose risks to the device, the potential for extrapolation to the ITER Baseline Scenario (IBS). We find that carefully designing a scenario matching currents proportionally to those of the IBS is far more important than plasma shape or profiles in accurately measuring an optimal correction current set.
△ Less
Submitted 9 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency
Authors:
Yeonsung Jung,
Heecheol Yun,
Joonhyung Park,
Jin-Hwa Kim,
Eunho Yang
Abstract:
Neural Radiance Fields (NeRF) have shown remarkable performance in learning 3D scenes. However, NeRF exhibits vulnerability when confronted with distractors in the training images -- unexpected objects are present only within specific views, such as moving entities like pedestrians or birds. Excluding distractors during dataset construction is a straightforward solution, but without prior knowledg…
▽ More
Neural Radiance Fields (NeRF) have shown remarkable performance in learning 3D scenes. However, NeRF exhibits vulnerability when confronted with distractors in the training images -- unexpected objects are present only within specific views, such as moving entities like pedestrians or birds. Excluding distractors during dataset construction is a straightforward solution, but without prior knowledge of their types and quantities, it becomes prohibitively expensive. In this paper, we propose PruNeRF, a segment-centric dataset pruning framework via 3D spatial consistency, that effectively identifies and prunes the distractors. We first examine existing metrics for measuring pixel-wise distraction and introduce Influence Functions for more accurate measurements. Then, we assess 3D spatial consistency using a depth-based reprojection technique to obtain 3D-aware distraction. Furthermore, we incorporate segmentation for pixel-to-segment refinement, enabling more precise identification. Our experiments on benchmark datasets demonstrate that PruNeRF consistently outperforms state-of-the-art methods in robustness against distractors.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.