Search | arXiv e-print repository

arXiv:2406.19135 [pdf, other]

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a general diffusion TTS framework, DEX-TTS includes encoders and adapters to handle styles extracted from reference speech. Key innovations contain the differentiation of styles into time-invariant and time-variant categories for effective style extraction, as well as the design of encoders and adapters with high generalization ability. In addition, we introduce overlapping patchify and convolution-frequency patch embedding strategies to improve DiT-based diffusion networks for TTS. DEX-TTS yields outstanding performance in terms of objective and subjective evaluation in English multi-speaker and emotional multi-speaker datasets, without relying on pre-training strategies. Lastly, the comparison results for the general TTS on a single-speaker dataset verify the effectiveness of our enhanced diffusion backbone. Demos are available here. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.19113 [pdf, other]

MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storage processing can be a fundamental solution for reducing this overhead. However, designing an in-storage processing system for metagenomics is challenging because existing approaches to metagenomic analysis cannot be directly implemented in storage effectively due to the hardware limitations of modern SSDs. We propose MegIS, the first in-storage processing system designed to significantly reduce the data movement overhead of the end-to-end metagenomic analysis pipeline. MegIS is enabled by our lightweight design that effectively leverages and orchestrates processing inside and outside the storage system. We address in-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning, 2) data/computation flow coordination, 3) storage technology-aware algorithmic optimizations, 4) data mapping, and 5) lightweight in-storage accelerators. MegIS's design is flexible, capable of supporting different types of metagenomic input datasets, and can be integrated into various metagenomic analysis pipelines. Our evaluation shows that MegIS outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7$\times$-37.2$\times$ and 6.9$\times$-100.2$\times$, respectively, while matching the accuracy of the accuracy-optimized tool. MegIS achieves 1.5$\times$-5.1$\times$ speedup compared to the state-of-the-art metagenomic hardware-accelerated (using processing-in-memory) tool, while achieving significantly higher accuracy. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

arXiv:2406.18898 [pdf, other]

360 in the Wild: Dataset for Depth Prediction and View Synthesis

Authors: Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

Abstract: The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale… ▽ More The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{\circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18704 [pdf]

Superconducting phase diagram in Bi$_x$Ni$_{1-x}$ thin films$\colon$ the effects of Bi stoichiometry on superconductivity

Authors: Jihun Park, Jarryd A. Horn, Dylan J. Kirsch, Rohit K. Pant, Hyeok Yoon, Sungha Baek, Suchismita Sarker, Apurva Mehta, Xiaohang Zhang, Seunghun Lee, Richard Greene, Johnpierre Paglione, Ichiro Takeuchi

Abstract: The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unex… ▽ More The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unexplored. In this work, we systematically studied the effects of Bi stoichiometry on the superconductivity of Bi$_x$Ni$_{1-x}$ thin films (${x} \approx$ 0.5 to 0.9) fabricated via a composition-spread approach. The superconducting phase map of Bi$_x$Ni$_{1-x}$ thin films exhibited a superconducting composition region attributable to the intermetallic Bi$_3$Ni phase with different amount of excess Bi, revealed by synchrotron X-ray diffraction analysis. Interestingly, the mixed phase region with Bi$_3$Ni and Bi showed unusual increases in the superconducting transition temperature and residual resistance ratio as more Bi impurities were included, with the maximum ${T}_{c}$ ($=$ 4.2 K) observed at $x \approx$ 0.79. A correlation analysis of structural, electrical, and magneto-transport characteristics across the composition variation revealed that the unusual superconducting $"$dome$"$ is due to two competing roles of Bi$\colon$ impurity scattering and carrier doping. We found that the carrier doping effect is dominant in the mild doping regime (0.74 $\leq {x} \leq$ 0.79), while impurity scattering becomes more pronounced at larger Bi stoichiometry. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18388 [pdf, other]

SAM: Semi-Active Mechanism for Extensible Continuum Manipulator and Real-time Hysteresis Compensation Control Algorithm

Authors: Junhyun Park, Seonghyeok Jang, Myeongbo Park, Hyojae Park, Jeonghyeon Yoon, Minho Hwang

Abstract: Cable-Driven Continuum Manipulators (CDCMs) enable scar-free procedures via natural orifices and improve target lesion accessibility through curved paths. However, CDCMs face limitations in workspace and control accuracy due to non-linear cable effects causing hysteresis. This paper introduces an extensible CDCM with a Semi-active Mechanism (SAM) to expand the workspace via translational motion wi… ▽ More Cable-Driven Continuum Manipulators (CDCMs) enable scar-free procedures via natural orifices and improve target lesion accessibility through curved paths. However, CDCMs face limitations in workspace and control accuracy due to non-linear cable effects causing hysteresis. This paper introduces an extensible CDCM with a Semi-active Mechanism (SAM) to expand the workspace via translational motion without additional mechanical elements or actuation. We collect a hysteresis dataset using 8 fiducial markers and RGBD sensing. Based on this dataset, we develop a real-time hysteresis compensation control algorithm using the trained Temporal Convolutional Network (TCN) with a 1ms time latency, effectively estimating the manipulator's hysteresis behavior. Performance validation through random trajectory tracking tests and box pointing tasks shows the proposed controller significantly reduces hysteresis by up to 69.5% in joint space and approximately 26% in the box pointing task. △ Less

Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 12 pages, 14 figures, 6 tables

arXiv:2406.17966 [pdf, other]

Discrete-time thermodynamic speed limit

Authors: Sangyun Lee, Jae Sung Lee, Jong-Min Park

Abstract: As a fundamental thermodynamic principle, speed limits reveal the lower bound of entropy production (EP) required for a system to transition from a given initial state to a final state. While various speed limits have been developed for continuous-time Markov processes, their application to discrete-time Markov chains remains unexplored. In this study, we investigate the speed limits in discrete-t… ▽ More As a fundamental thermodynamic principle, speed limits reveal the lower bound of entropy production (EP) required for a system to transition from a given initial state to a final state. While various speed limits have been developed for continuous-time Markov processes, their application to discrete-time Markov chains remains unexplored. In this study, we investigate the speed limits in discrete-time Markov chains, focusing on two types of EP commonly used to measure the irreversibility of a discrete-time process: time-reversed EP and time-backward EP. We find that time-reversed EP satisfies the speed limit for the continuous-time Markov processes, whereas time-backward EP does not. Additionally, for time-reversed EP, we derive practical speed limits applicable to systems driven by cyclic protocols or with unidirectional transitions, where conventional speed limits become meaningless or invalid. We show that these relations also hold for continuous-time Markov processes by taking the time-continuum limit of our results. Finally, we validate our findings through several examples. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures

arXiv:2406.17763 [pdf, other]

DiffusionPDE: Generative PDE-Solving Under Partial Observation

Authors: Jiahe Huang, Guandao Yang, Zichen Wang, Jeong Joon Park

Abstract: We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which… ▽ More We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Project page: https://jhhuangchloe.github.io/Diffusion-PDE/

arXiv:2406.16532 [pdf]

Terahertz photocurrent probe of quantum geometry and interactions in magic-angle twisted bilayer graphene

Authors: Roshan Krishna Kumar, Geng Li, Riccardo Bertini, Swati Chaudhary, Krystian Nowakowski, Jeong Min Park, Sebastian Castilla, Zhen Zhan, Pierre A. Pantaleón, Hitesh Agarwal, Sergi Battle-Porro, Eike Icking, Matteo Ceccanti, Antoine Reserbat-Plantey, Giulia Piccinini, Julien Barrier, Ekaterina Khestanova, Takashi Taniguchi, Kenji Watanabe, Christoph Stampfer, Gil Refael, Francisco Guinea, Pablo Jarillo-Herrero, Justin C. W. Song, Petr Stepanov , et al. (2 additional authors not shown)

Abstract: Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompa… ▽ More Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompasses quantum "textures" of electron wavefunctions. Using terahertz light resonant with optical transitions of its flat bands, we observe bulk photocurrents driven by broken symmetries and reveal the interplay between electron interactions and quantum geometry. We observe inversion-breaking gapped states undetectable through quantum transport, sharp changes in the polarization axes caused by interaction-induced band renormalization, and recurring photocurrent patterns at integer fillings of the moiré unit cell that track the evolution of quantum geometry through the cascade of phase transitions. The large and tunable terahertz response intrinsic to flat-band systems offers direct insights into the quantum geometry of interacting electrons and paves the way for innovative terahertz quantum technologies. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16013 [pdf, other]

Database-Augmented Query Representation for Information Retrieval

Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

Abstract: Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user… ▽ More Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user-related) features related to the query. Yet, they may be suboptimal to effectively augment the query, though there is plenty of information available to augment it in a relational database. Motivated by this, we present a novel retrieval framework called Database-Augmented Query representation (DAQu), which augments the original query with various (query-related) metadata across multiple tables. In addition, as the number of features in the metadata can be very large and there is no order among them, we encode them with our graph-based set encoding strategy, which considers hierarchies of features in the database without order. We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database, demonstrating that ours significantly enhances overall retrieval performance, compared to existing query augmentation methods. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15965 [pdf, other]

Search for charmed baryons in the $Λ_c^+η$ system and measurement of the branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ and $pD^0$ relative to $Σ_c(2455)π$

Authors: Belle Collaboration, S. X. Li, C. P. Shen, I. Adachi, J. K. Ahn, H. Aihara, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, Sw. Banerjee, K. Belous, J. Bennett, M. Bessner, T. Bilka, D. Biswas, D. Bodrov, A. Bozek, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola, M. -C. Chang, B. G. Cheon , et al. (102 additional authors not shown)

Abstract: We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and… ▽ More We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and $Λ_c(2940)^+$ signals are observed in the $pD^0$ mass spectrum. We set upper limits at 90\% credibility level on ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ relative to $Σ_c(2455)π$ of $<0.13$ for the $Λ_c(2880)^+$ and $<1.11$ for the $Λ_c(2940)^+$. We measure ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $pD^0$ relative to $Σ_c(2455)π$ of $0.75 \pm 0.03(\text{stat.}) \pm 0.07(\text{syst.})$ for the $Λ_c(2880)^+$ and $3.59 \pm 0.21(\text{stat.}) \pm 0.56(\text{syst.})$ for the $Λ_c(2940)^+$. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures

Report number: Belle Preprint: 2024-06;KEK Preprint: 2024-15

arXiv:2406.15345 [pdf, other]

Sub-millisecond Entanglement and iSWAP Gate between Molecular Qubits

Authors: Lewis R. B. Picard, Annie J. Park, Gabriel E. Patenotte, Samuel Gebretsadkan, David Wellnitz, Ana Maria Rey, Kang-Kuen Ni

Abstract: Quantum computation (QC) and simulation rely on long-lived qubits with controllable interactions. Early work in quantum computing made use of molecules because of their readily available intramolecular nuclear spin coupling and chemical shifts, along with mature nuclear magnetic resonance techniques. Subsequently, the pursuit of many physical platforms has flourished. Trapped polar molecules have… ▽ More Quantum computation (QC) and simulation rely on long-lived qubits with controllable interactions. Early work in quantum computing made use of molecules because of their readily available intramolecular nuclear spin coupling and chemical shifts, along with mature nuclear magnetic resonance techniques. Subsequently, the pursuit of many physical platforms has flourished. Trapped polar molecules have been proposed as a promising quantum computing platform, offering scalability and single-particle addressability while still leveraging inherent complexity and strong couplings of molecules. Recent progress in the single quantum state preparation and coherence of the hyperfine-rotational states of individually trapped molecules allows them to serve as promising qubits, with intermolecular dipolar interactions creating entanglement. However, universal two-qubit gates have not been demonstrated with molecules. Here, we harness intrinsic molecular resources to implement a two-qubit iSWAP gate using individually trapped $X^{1}Σ^{+}$ NaCs molecules. We characterize the innate dipolar interaction between rotational states and control its strength by tuning the polarization of the traps. By allowing the molecules to interact for 664 $μ$s at a distance of 1.9 $μ$m, we create a maximally entangled Bell state with a fidelity of 94(3)\%, following postselection to remove trials with empty traps. Using motion-rotation coupling, we measure residual excitation of the lowest few motional states along the axial trapping direction and find them to be the primary source of decoherence. Finally, we identify two non-interacting hyperfine states within the ground rotational level in which we encode a qubit. The interaction is toggled by transferring between interacting and non-interacting states to realize an iSWAP gate. We verify the gate performance by measuring its logical truth table. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15275 [pdf, other]

Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model

Authors: Doyoung Kim, Jongwon Lee, Jinho Park, Minjoon Seo

Abstract: Language models have demonstrated impressive capabilities across various natural language processing tasks, yet they struggle with planning tasks requiring multi-step simulations. Inspired by human cognitive processes, this paper investigates the optimal planning power of language models that can construct a cognitive map of a given environment. Our experiments demonstrate that cognitive map signi… ▽ More Language models have demonstrated impressive capabilities across various natural language processing tasks, yet they struggle with planning tasks requiring multi-step simulations. Inspired by human cognitive processes, this paper investigates the optimal planning power of language models that can construct a cognitive map of a given environment. Our experiments demonstrate that cognitive map significantly enhances the performance of both optimal and reachable planning generation ability in the Gridworld path planning task. We observe that our method showcases two key characteristics similar to human cognition: \textbf{generalization of its planning ability to extrapolated environments and rapid adaptation with limited training data.} We hope our findings in the Gridworld task provide insights into modeling human cognitive processes in language models, potentially leading to the development of more advanced and robust systems that better resemble human cognition. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15007 [pdf, other]

RouteFinder: Towards Foundation Models for Vehicle Routing Problems

Authors: Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels Wouda, Leon Lan, Kevin Tierney, Jinkyoo Park

Abstract: Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder,… ▽ More Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder, a framework for developing foundation models for VRPs. Our key idea is that a foundation model for VRPs should be able to model variants by treating each variant as a subset of a larger VRP problem, equipped with different attributes. We introduce a parallelized environment that can handle any combination of attributes at the same time in a batched manner, and an efficient sampling procedure to train on a mix of problems at each optimization step that can greatly improve convergence robustness. We also introduce novel Global Feature Embeddings that project instance-wise attributes efficiently onto the latent space and help the model understand different VRP variants. Finally, we introduce Efficient Adapter Layers, a simple yet effective technique to finetune pre-trained RouteFinder models to solve novel variants with previously unseen attributes outside of the original feature space. We validate our approach through extensive experiments on 24 VRP variants, demonstrating competitive results over recent multi-task learning models. We make our code openly available at https://github.com/ai4co/routefinder. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14954 [pdf, other]

A Unified Framework for Synthesizing Multisequence Brain MRI via Hybrid Fusion

Authors: Jihoon Cho, Jonghye Woo, Jinah Park

Abstract: Multisequence Magnetic Resonance Imaging (MRI) provides a reliable diagnosis in clinical applications through complementary information within sequences. However, in practice, the absence of certain MR sequences is a common problem that can lead to inconsistent analysis results. In this work, we propose a novel unified framework for synthesizing multisequence MR images, called Hybrid Fusion GAN (H… ▽ More Multisequence Magnetic Resonance Imaging (MRI) provides a reliable diagnosis in clinical applications through complementary information within sequences. However, in practice, the absence of certain MR sequences is a common problem that can lead to inconsistent analysis results. In this work, we propose a novel unified framework for synthesizing multisequence MR images, called Hybrid Fusion GAN (HF-GAN). We introduce a hybrid fusion encoder designed to ensure the disentangled extraction of complementary and modality-specific information, along with a channel attention-based feature fusion module that integrates the features into a common latent space handling the complexity from combinations of accessible MR sequences. Common feature representations are transformed into a target latent space via the modality infuser to synthesize missing MR sequences. We have performed experiments on multisequence brain MRI datasets from healthy individuals and patients diagnosed with brain tumors. Experimental results show that our method outperforms state-of-the-art methods in both quantitative and qualitative comparisons. In addition, a detailed analysis of our framework demonstrates the superiority of our designed modules and their effectiveness for use in data imputation tasks. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

arXiv:2406.13742 [pdf, other]

Superfluid stiffness of twisted multilayer graphene superconductors

Authors: Abhishek Banerjee, Zeyu Hao, Mary Kreidel, Patrick Ledwith, Isabelle Phinney, Jeong Min Park, Andrew M. Zimmerman, Kenji Watanabe, Takashi Taniguchi, Robert M Westervelt, Pablo Jarillo-Herrero, Pavel A. Volkov, Ashvin Vishwanath, Kin Chung Fong, Philip Kim

Abstract: The robustness of the macroscopic quantum nature of a superconductor can be characterized by the superfluid stiffness, $ρ_s$, a quantity that describes the energy required to vary the phase of the macroscopic quantum wave function. In unconventional superconductors, such as cuprates, the low-temperature behavior of $ρ_s$ drastically differs from that of conventional superconductors due to quasipar… ▽ More The robustness of the macroscopic quantum nature of a superconductor can be characterized by the superfluid stiffness, $ρ_s$, a quantity that describes the energy required to vary the phase of the macroscopic quantum wave function. In unconventional superconductors, such as cuprates, the low-temperature behavior of $ρ_s$ drastically differs from that of conventional superconductors due to quasiparticle excitations from gapless points (nodes) in momentum space. Intensive research on the recently discovered magic-angle twisted graphene family has revealed, in addition to superconducting states, strongly correlated electronic states associated with spontaneously broken symmetries, inviting the study of $ρ_s$ to uncover the potentially unconventional nature of its superconductivity. Here we report the measurement of $ρ_s$ in magic-angle twisted trilayer graphene (TTG), revealing unconventional nodal-gap superconductivity. Utilizing radio-frequency reflectometry techniques to measure the kinetic inductive response of superconducting TTG coupled to a microwave resonator, we find a linear temperature dependence of $ρ_s$ at low temperatures and nonlinear Meissner effects in the current bias dependence, both indicating nodal structures in the superconducting order parameter. Furthermore, the doping dependence shows a linear correlation between the zero temperature $ρ_s$ and the superconducting transition temperature $T_c$, reminiscent of Uemura's relation in cuprates, suggesting phase-coherence-limited superconductivity. Our results provide strong evidence for nodal superconductivity in TTG and put strong constraints on the mechanisms of these graphene-based superconductors. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13633 [pdf, ps, other]

Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Authors: Jaehyun Park, Dabeen Lee

Abstract: We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an… ▽ More We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an $\tilde{\mathcal{O}}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm \texttt{OVIFH-MNL} is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{\mathcal{O}}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $Ω(d\sqrt{DT})$ for learning communicating MDPs with MNL transitions of diameter at most $D$. Furthermore, we show a regret lower bound of $Ω(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13134 [pdf]

Mode Coupling and Breathing Oscillation in Partially Magnetized Cross-Field Plasmas

Authors: Jong Yoon Park, June Young Kim

Abstract: We report on investigations of mode coupling between rotating spokes during the onset of the breathing oscillation. Demonstrating the existence of nonlinear coupling between the sporadic spokes and the breathing oscillation, we suggest the oscillating azimuthal electric field as the energy source for additional ionization within the plasma. Our results indicate that intermittent three-wave couplin… ▽ More We report on investigations of mode coupling between rotating spokes during the onset of the breathing oscillation. Demonstrating the existence of nonlinear coupling between the sporadic spokes and the breathing oscillation, we suggest the oscillating azimuthal electric field as the energy source for additional ionization within the plasma. Our results indicate that intermittent three-wave coupling is a possible mechanism for triggering low-frequency breathing oscillations in partially magnetized cross-field plasma. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12904 [pdf, other]

Meent: Differentiable Electromagnetic Simulator for Machine Learning

Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: under review

arXiv:2406.12885 [pdf, other]

neuSIM4: A comprehensive GEANT4 based neutron simulation code

Authors: J. Park, F. C. E. Teh, M. B. Tsang, K. W. Brown, Z. Chajecki, B. Hong, T. Lokotko, W. G. Lynch, J. Wieske, K. Zhu

Abstract: A new neutron SIMulation program based on the versatile GEANT4 toolkit, neuSIM4, has been developed to describe interactions of neutrons in the NE213 liquid scintillator from 0.1 to 3000 MeV. neuSIM4 is designed to accommodate complicated modern detector geometry setups with multiple scintillator detectors, each of which can be outfitted with more than one photo-multiplier. To address a broad spec… ▽ More A new neutron SIMulation program based on the versatile GEANT4 toolkit, neuSIM4, has been developed to describe interactions of neutrons in the NE213 liquid scintillator from 0.1 to 3000 MeV. neuSIM4 is designed to accommodate complicated modern detector geometry setups with multiple scintillator detectors, each of which can be outfitted with more than one photo-multiplier. To address a broad spectrum of neutron energies, two new neutron interaction physics models, KSCIN and NxQMD, have been implemented in GEANT4. For neutrons with energy below 110 MeV, we incorporate a total of eleven neutron induced reaction channels on hydrogen and carbon nuclei, including nine carbon inelastic reaction channels, into KSCIN. Beyond 110 MeV, we implement a neutron induced reaction model, NxQMD, in GEANT4. We use its results as reference to evaluate other neutron-interaction physics models in GEANT4. We find that results from an existing cascade physics model (INCL) in GEANT4 agree very well with the results from NxQMD, and results from both codes agree with new and existing light response data. To connect KSCIN to NxQMD or INCL, we introduce a transition region where the contribution of neuSIM4 linearly decreases with corresponding increased contributions from NxQMD or INCL. To demonstrate the application of the new code, we simulate the light response and performance of a 2 x 2 m2 neutron detector wall array consisting of 25 2m-long scintillation bars. We are able to compare the predicted light response functions to the shape of the experimental response functions and calculate the efficiency of the neutron detector array for neutron energies up to 200 MeV. These simulation results will be pivotal for understanding the performance of modern neutron arrays with intricate geometries, especially in the measurements of neutron energy spectra in heavy-ion reactions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.12721 [pdf]

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

Abstract: In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model's output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of logmel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in developing more robust and flexible SED models △ Less

Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: DCASE 2024 challenge Task4, 4 pages

arXiv:2406.12233 [pdf, other]

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11813 [pdf, other]

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Authors: Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo

Abstract: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge ac… ▽ More Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining. First, counterintuitively, we observe that pretraining on more data shows no significant improvement in the model's capability to acquire and maintain factual knowledge. Next, there is a power-law relationship between training steps and forgetting of memorization and generalization of factual knowledge, and LLMs trained with duplicated training data exhibit faster forgetting. Third, training LLMs with larger batch sizes can enhance the models' robustness to forgetting. Overall, our observations suggest that factual knowledge acquisition in LLM pretraining occurs by progressively increasing the probability of factual knowledge presented in the pretraining data at each step. However, this increase is diluted by subsequent forgetting. Based on this interpretation, we demonstrate that we can provide plausible explanations for recently observed behaviors of LLMs, such as the poor performance of LLMs on long-tail knowledge and the benefits of deduplicating the pretraining corpus. △ Less

Submitted 17 June, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2406.10935 [pdf, other]

Pick-or-Mix: Dynamic Channel Sampling for ConvNets

Authors: Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

Abstract: Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d… ▽ More Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for dynamic channel sampling, namely Pick-or-Mix (PiX), which does not require special implementation. PiX divides a set of channels into subsets and then picks from them, where the picking decision is dynamically made per each pixel based on the input activations. We plug PiX into prominent ConvNet architectures and verify its multi-purpose utilities. After replacing 1x1 channel squeezing layers in ResNet with PiX, the network becomes 25% faster without losing accuracy. We show that PiX allows ConvNets to learn better data representation than widely adopted approaches to enhance networks' representation power (e.g., SE, CBAM, AFF, SKNet, and DWP). We also show that PiX achieves state-of-the-art performance on network downscaling and dynamic channel pruning applications. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Published in Computer Vision and Pattern Recognition (CVPR 2024)

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2406.09877 [pdf, other]

Federated Learning with Flexible Architectures

Authors: Jong-Ik Park, Carlee Joe-Wong

Abstract: Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To addre… ▽ More Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To address this need, this paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths. Each client can select a network architecture suitable for its resources, with shallower and thinner networks requiring fewer computing resources for training. Unlike prior work in this area, FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation. Layer grafting ensures that all client contributions are uniformly integrated into the global model, thereby minimizing the risk of any individual client's data skewing the model's parameters disproportionately and introducing security benefits. Moreover, FedFA introduces the scalable aggregation method to manage scale variations in weights among different network architectures. Experimentally, FedFA outperforms previous width and depth flexible aggregation strategies. Furthermore, FedFA demonstrates increased robustness against performance degradation in backdoor attack scenarios compared to earlier strategies. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09719 [pdf, other]

Self-Knowledge Distillation for Learning Ambiguity

Authors: Hancheol Park, Soyeong Jeong, Sukmin Cho, Jong C. Park

Abstract: Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models t… ▽ More Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately by leveraging knowledge distilled from their lower layers. This approach also includes a learning phase that re-calibrates the unnecessarily strengthened confidence for training samples judged as extremely ambiguous based on the distilled distribution knowledge. We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions. Particularly, through the process of re-calibrating the confidence for highly ambiguous samples, the issue of over-confidence when predictions for unseen samples do not match with their ground-truth labels has been significantly alleviated. This has been shown to contribute to generating better distributions than the existing state-of-the-art method. Moreover, our method is more efficient in training the models compared to the existing method, as it does not involve additional training processes to refine label distributions. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures

arXiv:2406.09396 [pdf, other]

Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

Authors: Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo

Abstract: Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large lan… ▽ More Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large language models (LLMs) in LVQA benchmarks, achieving exceptional performance, while relying on vision language models (VLMs) to convert all visual content within videos into natural language. Such VLMs often independently caption a large number of frames uniformly sampled from long videos, which is not efficient and can mostly be redundant. Questioning these decision choices, we explore optimal strategies for key-frame selection and sequence-aware captioning, that can significantly reduce these redundancies. We propose two novel approaches that improve each of aspects, namely Hierarchical Keyframe Selector and Sequential Visual LLM. Our resulting framework termed LVNet achieves state-of-the-art performance across three benchmark LVQA datasets. Our code will be released publicly. △ Less

Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09047 [pdf, other]

DeepJEB: 3D Deep Learning-based Synthetic Jet Engine Bracket Dataset

Authors: Seongjun Hong, Yongmin Kwon, Dongju Shin, Jangseop Park, Namwoo Kang

Abstract: Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be… ▽ More Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be improved for developing robust data-driven surrogate models. This study presents the DeepJEB dataset, which has been created using deep generative models and automated engineering simulation pipelines, to overcome these challenges. Moreover, this study provides comprehensive 3D geometries and their corresponding structural analysis data. Key experiments validated the effectiveness of the DeepJEB dataset, demonstrating significant improvements in the prediction accuracy and reliability of surrogate models trained on this data. The enhanced dataset showed a broader design space and better generalization capabilities than traditional datasets. These findings highlight the potential of DeepJEB as a benchmark dataset for developing reliable surrogate models in structural engineering. The DeepJEB dataset supports advanced modeling techniques, such as graph neural networks (GNNs) and high-dimensional convolutional networks (CNNs), leveraging node-level field data for precise predictions. This dataset is set to drive innovation in engineering design applications, enabling more accurate and efficient structural performance predictions. The DeepJEB dataset is publicly accessible at: https://www.narnia.ai/dataset △ Less