Search | arXiv e-print repository

Coherent Zero-Shot Visual Instruction Generation

Authors: Quynh Phung, Songwei Ge, Jia-Bin Huang

Abstract: Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language… ▽ More Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language models (LLMs). Our approach systematically integrates text comprehension and image generation to ensure visual instructions are visually appealing and maintain consistency and accuracy throughout the instruction sequence. We validate the effectiveness by testing multi-step instructions and comparing the text alignment and consistency with several baselines. Our experiments show that our approach can visualize coherent and visually pleasing instructions △ Less

Submitted 8 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: https://instruct-vis-zero.github.io/

arXiv:2406.04160 [pdf, other]

Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): PDS 111, an old T Tauri star with a young-looking disk

Authors: Annelotte Derkink, Christian Ginski, Paola Pinilla, Nicolas Kurtovic, Lex Kaper, Alex de Koter, Per-Gunnar Valegård, Eric Mamajek, Frank Backs, Myriam Benisty, Til Birnstiel, Gabriele Columba, Carsten Dominik, Antonio Garufi, Michiel Hogerheijde, Rob van Holstein, Jane Huang, François Ménard, Christian Rab, María Claudia Ramírez-Tannus, Álvaro Ribas, Jonathan P. Williams, Alice Zurlo

Abstract: The interplay between T Tauri stars and their circumstellar disks, and how this impacts the onset of planet formation has yet to be established. We studied a seemingly old T Tauri star, PDS 111, and its disk. We analyzed optical, infrared, and sub-millimeter observations obtained with VLT/X-shooter, Mercator/HERMES, TESS, VLT/SPHERE, and ALMA, providing a new view on PDS 111 and its protoplanetary… ▽ More The interplay between T Tauri stars and their circumstellar disks, and how this impacts the onset of planet formation has yet to be established. We studied a seemingly old T Tauri star, PDS 111, and its disk. We analyzed optical, infrared, and sub-millimeter observations obtained with VLT/X-shooter, Mercator/HERMES, TESS, VLT/SPHERE, and ALMA, providing a new view on PDS 111 and its protoplanetary disk. The multi-epoch spectroscopy yields photospheric lines to classify the star, and emission lines to study variability in the hot inner disk and to determine the mass-accretion rate. The SPHERE and ALMA observations are used to characterize the dust distribution of the small and large grains, respectively. PDS 111 is a weak-line T Tauri star with spectral type G2, exhibits strong H$α$ variability and with a low mass-accretion rate of $1-5\times10^{-10}$\,M$_{\odot}$\,yr$^{-1}$. We measured an age of the system of 15.9$^{+1.7}_{-3.7}$ Myr using pre-main sequence tracks. The SPHERE observations show a strongly flaring disk with an asymmetric substructure. The ALMA observations reveal a 30 au cavity in the dust continuum emission with a low contrast asymmetry in the South-West of the disk and a dust disk mass of 45.8\,$M_\oplus$. The $^{12}$CO radial extension is at least three times larger than that of the dust emission. Although the measured age is younger than suggested in literature, PDS 111 still seems relatively old; this provides insight into disk properties at an advanced stage of pre-main sequence evolution. The characteristics of this disk are very similar to its younger counterparts: strongly flaring, an average disk mass, a typical radial extent of the disk gas and dust, and the presence of common substructures. This suggests that disk evolution has not significantly changed the disk properties. These results show similarities with the "Peter Pan disks" around M-dwarfs. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 23 pages, 23 figures, accepted by A&A (abstract shortened)

arXiv:2406.03711 [pdf, other]

Pi-fusion: Physics-informed diffusion model for learning fluid dynamics

Authors: Jing Qiu, Jiancheng Huang, Xiangdong Zhang, Zeng Lin, Minglei Pan, Zengding Liu, Fen Miao

Abstract: Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particle… ▽ More Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particles. Inspired by the advantage of diffusion model in learning the distribution of data, we first propose Pi-fusion, a physics-informed diffusion model for predicting the temporal evolution of velocity and pressure field in fluid dynamics. Physics-informed guidance sampling is proposed in the inference procedure of Pi-fusion to improve the accuracy and interpretability of learning fluid dynamics. Furthermore, we introduce a training strategy based on reciprocal learning to learn the quasiperiodical pattern of fluid motion and thus improve the generalizability of the model. The proposed approach are then evaluated on both synthetic and real-world dataset, by comparing it with state-of-the-art physics-informed deep learning methods. Experimental results show that the proposed approach significantly outperforms existing methods for predicting temporal evolution of velocity and pressure field, confirming its strong generalization by drawing probabilistic inference of forward process and physics-informed guidance sampling. The proposed Pi-fusion can also be generalized in learning other physical dynamics governed by partial differential equations. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03683 [pdf, other]

Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Authors: Ding Huang, Ting Li, Jian Huang

Abstract: We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowle… ▽ More We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowledge from a pre-trained model's learned prior distribution. It efficiently leverages large diffusion models, differentially intervening different hidden features with a head-heavy and foot-light configuration. Experiments highlight the superiority of BPS over contemporary methods across a range of tasks even with limited amount of data. Notably, BPS attains an FID score of 10.49 under the sketch condition on the COCO17 dataset. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 25 pages, 26 figures, and 4 tables

MSC Class: 62G05; 68T07

arXiv:2406.02987 [pdf, other]

Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

Authors: Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Shioulin Sam, Karim Bouyarmane, Ismail Tutar, Junzhou Huang

Abstract: Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose… ▽ More Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose a general component termed Multi-instance Visual Prompt Generator (MIVPG) to incorporate enriched visual representations into LLMs by taking advantage of instance correlation between images or patches for the same sample. Quantatitive evaluation on three public vision-language (VL) datasets from different scenarios shows that the proposed MIVPG improves Q-former in main VL tasks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02885 [pdf, other]

Homotopic Path Set Planning for Robot Manipulation and Navigation

Authors: Jing Huang, Yunxi Tang, Kwok Wai Samuel Au

Abstract: This paper addresses path set planning that yields important applications in robot manipulation and navigation such as path generation for deformable object keypoints and swarms. A path set refers to the collection of finite agent paths to represent the overall spatial path of a group of keypoints or a swarm, whose collective properties meet spatial and topological constraints. As opposed to plann… ▽ More This paper addresses path set planning that yields important applications in robot manipulation and navigation such as path generation for deformable object keypoints and swarms. A path set refers to the collection of finite agent paths to represent the overall spatial path of a group of keypoints or a swarm, whose collective properties meet spatial and topological constraints. As opposed to planning a single path, simultaneously planning multiple paths with constraints poses nontrivial challenges in complex environments. This paper presents a systematic planning pipeline for homotopic path sets, a widely applicable path set class in robotics. An extended visibility check condition is first proposed to attain a sparse passage distribution amidst dense obstacles. Passage-aware optimal path planning compatible with sampling-based planners is then designed for single path planning with adjustable costs. Large accessible free space for path set accommodation can be achieved by the planned path while having a sufficiently short path length. After specifying the homotopic properties of path sets, path set generation based on deformable path transfer is proposed in an efficient centralized manner. The effectiveness of these methods is validated by extensive simulated and experimental results. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 19 figures, 2 tables, conference

arXiv:2406.02546 [pdf, other]

Dark photon limits from patchy dark screening of the cosmic microwave background

Authors: Fiona McCarthy, Dalila Pirvu, J. Colin Hill, Junwu Huang, Matthew C. Johnson, Keir K. Rogers

Abstract: Dark photons that kinetically mix with the Standard Model photon give rise to new spectral anisotropies (patchy dark screening) in the cosmic microwave background (CMB) due to conversion of photons to dark photons within large-scale structure. We utilize predictions for this patchy dark screening signal to provide the tightest constraints to date on the dark photon kinetic mixing parameter (… ▽ More Dark photons that kinetically mix with the Standard Model photon give rise to new spectral anisotropies (patchy dark screening) in the cosmic microwave background (CMB) due to conversion of photons to dark photons within large-scale structure. We utilize predictions for this patchy dark screening signal to provide the tightest constraints to date on the dark photon kinetic mixing parameter ($\varepsilon \lesssim 4\times 10^{-8}$ (95% confidence level)) over the mass range $10^{-13} \,\, {\rm eV} \lesssim m_{A^\prime} \lesssim 10^{-11}$ eV, almost an order of magnitude stronger than previous limits, by applying state-of-the-art component separation techniques to the cross-correlation of $\textit{Planck}$ CMB and $\textit{unWISE}$ galaxy survey data. △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 7+12 pages, 3+13 figures. Data products available at https://users.flatironinstitute.org/~fmccarthy/dark_photon_screening_maps/ V2 only has minor changes to these comments

arXiv:2406.02252 [pdf, other]

Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale

Authors: Jinghan Sun, Zibo Gong, Anup Agarwal, Shadi Noghabi, Ranveer Chandra, Marc Snir, Jian Huang

Abstract: Modular data centers (MDCs) that can be placed right at the energy farms and powered mostly by renewable energy, are proven to be a flexible and effective approach to lowering the carbon footprint of data centers. However, the main challenge of using renewable energy is the high variability of power produced, which implies large volatility in powering computing resources at MDCs, and degraded appl… ▽ More Modular data centers (MDCs) that can be placed right at the energy farms and powered mostly by renewable energy, are proven to be a flexible and effective approach to lowering the carbon footprint of data centers. However, the main challenge of using renewable energy is the high variability of power produced, which implies large volatility in powering computing resources at MDCs, and degraded application performance due to the task evictions and migrations. This causes challenges for platform operators to decide the MDC deployment. To this end, we present SkyBox, a framework that employs a holistic and learning-based approach for platform operators to explore the efficient use of renewable energy with MDC deployment across geographical regions. SkyBox is driven by the insights based on our study of real-world power traces from a variety of renewable energy farms -- the predictable production of renewable energy and the complementary nature of energy production patterns across different renewable energy sources and locations. With these insights, SkyBox first uses the coefficient of variation metric to select the qualified renewable farms, and proposes a subgraph identification algorithm to identify a set of farms with complementary energy production patterns. After that, SkyBox enables smart workload placement and migrations to further tolerate the power variability. Our experiments with real power traces and datacenter workloads show that SkyBox has the lowest carbon emissions in comparison with current MDC deployment approaches. SkyBox also minimizes the impact of the power variability on cloud virtual machines, enabling rMDCs a practical solution of efficiently using renewable energy. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01894 [pdf, other]

SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks

Authors: Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang

Abstract: Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to gen… ▽ More Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to generate adversarial videos through spatio-temporal feature space information exchanging. It consists of a Guided Target Video Learning (GTVL) module to balance the perturbation budget and optimization speed and a Spatio-Temporal Invertible Neural Network (STIN) module to perform spatio-temporal feature space information exchanging between a source video and the target feature tensor learned by GTVL module. Extensive experiments on UCF-101 and Kinetics-400 demonstrate that our proposed SVASTIN can generate adversarial examples with higher imperceptibility than the state-of-the-art methods with the higher fooling rate. Code is available at \href{https://github.com/Brittany-Chen/SVASTIN}{https://github.com/Brittany-Chen/SVASTIN}. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01791 [pdf, other]

Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels

Authors: Weitong Cai, Jiabo Huang, Shaogang Gong

Abstract: Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence). Existing studies either start from collecting exhaustive frame-wise annotations on the temporal boundary of target moments (fully-supervised), or learn with only the video-level video-text pairing labels (weakly-supervised). The former is poor in generalisa… ▽ More Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence). Existing studies either start from collecting exhaustive frame-wise annotations on the temporal boundary of target moments (fully-supervised), or learn with only the video-level video-text pairing labels (weakly-supervised). The former is poor in generalisation to unknown concepts and/or novel scenes due to restricted dataset scale and diversity under expensive annotation costs; the latter is subject to visual-textual mis-correlations from incomplete labels. In this work, we introduce a new approach called hybrid-learning video moment retrieval to solve the problem by knowledge transfer through adapting the video-text matching relationships learned from a fully-supervised source domain to a weakly-labelled target domain when they do not share a common label space. Our aim is to explore shared universal knowledge between the two domains in order to improve model learning in the weakly-labelled target domain. Specifically, we introduce a multiplE branch Video-text Alignment model (EVA) that performs cross-modal (visual-textual) matching information sharing and multi-modal feature alignment to optimise domain-invariant visual and textual features as well as per-task discriminative joint video-text representations. Experiments show EVA's effectiveness in exploring temporal segment annotations in a source domain to help learn video moment retrieval without temporal labels in a target domain. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted by BMVC2022

arXiv:2406.01602 [pdf, other]

Effectiveness of denoising diffusion probabilistic models for fast and high-fidelity whole-event simulation in high-energy heavy-ion experiments

Authors: Yeonju Go, Dmitrii Torbunov, Timothy Rinn, Yi Huang, Haiwang Yu, Brett Viren, Meifeng Lin, Yihui Ren, Jin Huang

Abstract: Artificial intelligence (AI) generative models, such as generative adversarial networks (GANs), variational auto-encoders, and normalizing flows, have been widely used and studied as efficient alternatives for traditional scientific simulations. However, they have several drawbacks, including training instability and inability to cover the entire data distribution, especially for regions where dat… ▽ More Artificial intelligence (AI) generative models, such as generative adversarial networks (GANs), variational auto-encoders, and normalizing flows, have been widely used and studied as efficient alternatives for traditional scientific simulations. However, they have several drawbacks, including training instability and inability to cover the entire data distribution, especially for regions where data are rare. This is particularly challenging for whole-event, full-detector simulations in high-energy heavy-ion experiments, such as sPHENIX at the Relativistic Heavy Ion Collider and Large Hadron Collider experiments, where thousands of particles are produced per event and interact with the detector. This work investigates the effectiveness of Denoising Diffusion Probabilistic Models (DDPMs) as an AI-based generative surrogate model for the sPHENIX experiment that includes the heavy-ion event generation and response of the entire calorimeter stack. DDPM performance in sPHENIX simulation data is compared with a popular rival, GANs. Results show that both DDPMs and GANs can reproduce the data distribution where the examples are abundant (low-to-medium calorimeter energies). Nonetheless, DDPMs significantly outperform GANs, especially in high-energy regions where data are rare. Additionally, DDPMs exhibit superior stability compared to GANs. The results are consistent between both central and peripheral centrality heavy-ion collision events. Moreover, DDPMs offer a substantial speedup of approximately a factor of 100 compared to the traditional Geant4 simulation method. △ Less

Submitted 23 May, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

arXiv:2406.01502 [pdf]

Spatiotemporal evolution of PM2.5 diffusion in Cheng-Yu urban agglomeration in response to COVID-19 lockdown using complex network

Authors: Jiaxian Huang, Yi Huang, Yong Zhang, Jiao Zhang

Abstract: As the decrease in human activities resulting from the COVID-19 control measures had a significant impact on air quality, the epidemic provided an opportunity to investigate the extent to which air pollution is influenced by human activities and review existing measures. However, the corresponding diffusion pattern on a city scale is seldom mentioned at present stage, therefore, we chose the Cheng… ▽ More As the decrease in human activities resulting from the COVID-19 control measures had a significant impact on air quality, the epidemic provided an opportunity to investigate the extent to which air pollution is influenced by human activities and review existing measures. However, the corresponding diffusion pattern on a city scale is seldom mentioned at present stage, therefore, we chose the Cheng-Yu urban agglomeration, which is the largest city cluster in Southwest China, as our study area during the COVID-19 period, and attempted to investigate the process of PM2.5 diffusion using a complex network method. The results displayed that there was an evident external spillover effect of PM2.5 across all regions, and the PM2.5 spillovers were concentrated in several cities in the Cheng-Yu urban agglomeration during the lockdown period, whereas they are more dispersed during the recovery period. The overall decline in the impact of PM2.5 pollution source areas on receptor areas from a normal year to the pandemic year, and the intensity of PM2.5 spillover decreases gradually as the distance from the center increases. The implementation of the lockdown measures had an impact on both the input and output patterns of PM2.5 pollution in the region, the input pattern of PM2.5 pollution exhibited higher vulnerability, while the output pattern showed higher resilience. Additionally, the spillover relationship of PM2.5 pollution varies between different blocks, with relatively simple spillover relationships observed during the lockdown period and more complex dynamics during the recovery period. These findings have highlighted the importance of joint controls in combating regional air pollution. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01159 [pdf, other]

Dimba: Transformer-Mamba Diffusion Models

Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang

Abstract: This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and integrate conditional information through the cross-attention layer, thus capitalizing on the advantages of both architectural paradigms. We investig… ▽ More This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and integrate conditional information through the cross-attention layer, thus capitalizing on the advantages of both architectural paradigms. We investigate several optimization strategies, including quality tuning, resolution adaption, and identify critical configurations necessary for large-scale image generation. The model's flexible design supports scenarios that cater to specific resource constraints and objectives. When scaled appropriately, Dimba offers substantial throughput and a reduced memory footprint relative to conventional pure Transformers-based benchmarks. Extensive experiments indicate that Dimba achieves comparable performance compared with benchmarks in terms of image quality, artistic rendering, and semantic control. We also report several intriguing properties of architecture discovered during evaluation and release checkpoints in experiments. Our findings emphasize the promise of large-scale hybrid Transformer-Mamba architectures in the foundational stage of diffusion models, suggesting a bright future for text-to-image generation. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00993 [pdf]

Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology

Authors: Jiaming Wei, Tong Liu, Jipeng Huang, Xiaowei Li, Yurui Qi, Gangyin Luo

Abstract: With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for dia… ▽ More With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 9 pages, 14 figures

arXiv:2406.00857 [pdf, other]

Modeling the refractive index profile n(z) of polar ice for ultra-high energy neutrino experiments

Authors: S. Ali, P. Allison, S. Archambault, J. J. Beatty, D. Z. Besson, A. Bishop, P. Chen, Y. C. Chen, B. A. Clark, W. Clay, A. Connolly, K. Couberly, L. Cremonesi, A. Cummings, P. Dasgupta, R. Debolt, S. de Kockere, K. D. de Vries, C. Deaconu, M. A. DuVernois, J. Flaherty, E. Friedman, R. Gaior, P. Giri, J. Hanson , et al. (45 additional authors not shown)

Abstract: We develop an in-situ index of refraction profile using the transit time of radio signals broadcast from an englacial transmitter to 2-5 km distant radio-frequency receivers, deployed at depths up to 200 m. Maxwell's equations generally admit two ray propagation solutions from a given transmitter, corresponding to a direct path (D) and a refracted path (R); the measured D vs. R (dt(D,R)) timing di… ▽ More We develop an in-situ index of refraction profile using the transit time of radio signals broadcast from an englacial transmitter to 2-5 km distant radio-frequency receivers, deployed at depths up to 200 m. Maxwell's equations generally admit two ray propagation solutions from a given transmitter, corresponding to a direct path (D) and a refracted path (R); the measured D vs. R (dt(D,R)) timing differences provide constraints on the index of refraction profile near South Pole, where the Askaryan Radio Array (ARA) neutrino observatory is located. We constrain the refractive index profile by simulating D and R ray paths via ray tracing and comparing those to measured dt(D,R) signals. Using previous ice density data as a proxy for n(z), we demonstrate that our data strongly favors a glaciologically-motivated three-phase densification model rather than a single exponential scale height model. Simulations show that the single exponential model overestimates ARA neutrino sensitivity compared to the three-phase model. △ Less

Submitted 11 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00320 [pdf, other]

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Authors: Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, Zhou Zhao

Abstract: Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and c… ▽ More Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and conducts sampling by solving ODE, outperforming autoregressive and score-based models in terms of audio quality. By employing a non-autoregressive vector field estimator based on a feed-forward transformer and channel-level cross-modal feature fusion with strong temporal alignment, our model generates audio that is highly synchronized with the input video. Furthermore, through reflow and one-step distillation with guided vector field, our model can generate decent audio in a few, or even only one sampling step. Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment on VGGSound, with alignment accuracy reaching 97.22%, and 6.2% improvement in inception score over the strong diffusion-based baseline. Audio samples are available at http://frieren-v2a.github.io . △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.21050 [pdf, other]

Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

Authors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas

Abstract: Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and the… ▽ More Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. Using the Kronecker product and efficient Stiefel optimizers, we achieve parameter-efficient adaptation of orthogonal matrices. We introduce Spectral Orthogonal Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity. Extensive evaluations on text-to-image diffusion models demonstrate SODA's effectiveness, offering a spectrum-aware alternative to existing fine-tuning methods. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20871 [pdf, other]

A Nearest-neighbor Expansion of Lepton Flavor Mixing in Powers of the $μ$-$τ$ Permutation Symmetry Breaking Effect

Authors: Jihong Huang

Abstract: We point out that the observed pattern of lepton flavor mixing can be well described by a proper nearest-neighbor expansion of a constant $3\times 3$ unitary matrix in powers of a small parameter characterizing the fine effect of $μ$-$τ$ permutation symmetry breaking. We take an example of this kind for illustration, and provide complete discussions on the usefulness in the study of leptonic CP vi… ▽ More We point out that the observed pattern of lepton flavor mixing can be well described by a proper nearest-neighbor expansion of a constant $3\times 3$ unitary matrix in powers of a small parameter characterizing the fine effect of $μ$-$τ$ permutation symmetry breaking. We take an example of this kind for illustration, and provide complete discussions on the usefulness in the study of leptonic CP violation and unitarity triangles in matter. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 16 pages, 3 figures, 2 tables

arXiv:2405.20618 [pdf, other]

CPAFT: A Consistent Parallel Advancing Front Technique for Unstructured Triangular/Tetrahedral Mesh Generation

Authors: Chengdi Ma, Jizu Huang, Hao Luo, Chao Yang

Abstract: Compared with the remarkable progress made in parallel numerical solvers of partial differential equations,the development of algorithms for generating unstructured triangular/tetrahedral meshes has been relatively sluggish. In this paper, we propose a novel, consistent parallel advancing front technique (CPAFT) by combining the advancing front technique, the domain decomposition method based on s… ▽ More Compared with the remarkable progress made in parallel numerical solvers of partial differential equations,the development of algorithms for generating unstructured triangular/tetrahedral meshes has been relatively sluggish. In this paper, we propose a novel, consistent parallel advancing front technique (CPAFT) by combining the advancing front technique, the domain decomposition method based on space-filling curves, the distributed forest-of-overlapping-trees approach, and the consistent parallel maximal independent set algorithm. The newly proposed CPAFT algorithm can mathematically ensure that the generated unstructured triangular/tetrahedral meshes are independent of the number of processors and the implementation of domain decomposition. Several numerical tests are conducted to validate the parallel consistency and outstanding parallel efficiency of the proposed algorithm, which scales effectively up to two thousand processors. This is, as far as we know, the first parallel unstructured triangular/tetrahedral mesh generator with scalability to O(1,000) CPU processors. △ Less

Submitted 31 May, 2024; originally announced May 2024.

MSC Class: 65M50; 65M55; 68W10

arXiv:2405.20588 [pdf, other]

DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

Authors: Taolin Zhang, Qizhou Chen, Dongyang Li, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

Abstract: Recently, while large language models (LLMs) have demonstrated impressive results, they still suffer from hallucination, i.e., the generation of false information. Model editing is the task of fixing factual mistakes in LLMs; yet, most previous works treat it as a one-time task, paying little attention to ever-emerging mistakes generated by LLMs. We address the task of sequential model editing (SM… ▽ More Recently, while large language models (LLMs) have demonstrated impressive results, they still suffer from hallucination, i.e., the generation of false information. Model editing is the task of fixing factual mistakes in LLMs; yet, most previous works treat it as a one-time task, paying little attention to ever-emerging mistakes generated by LLMs. We address the task of sequential model editing (SME) that aims to rectify mistakes continuously. A Dynamic Auxiliary Fusion Network (DAFNet) is designed to enhance the semantic interaction among the factual knowledge within the entire sequence, preventing catastrophic forgetting during the editing process of multiple knowledge triples. Specifically, (1) for semantic fusion within a relation triple, we aggregate the intra-editing attention flow into auto-regressive self-attention with token-level granularity in LLMs. We further leverage multi-layer diagonal inter-editing attention flow to update the weighted representations of the entire sequence-level granularity. (2) Considering that auxiliary parameters are required to store the knowledge for sequential editing, we construct a new dataset named \textbf{DAFSet}, fulfilling recent, popular, long-tail and robust properties to enhance the generality of sequential editing. Experiments show DAFNet significantly outperforms strong baselines in single-turn and sequential editing. The usage of DAFSet also consistently improves the performance of other auxiliary network-based methods in various scenarios △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: ACL2024 findings

arXiv:2405.20380 [pdf, other]

Gradient Inversion of Federated Diffusion Models

Authors: Jiyue Huang, Chi Hong, Lydia Y. Chen, Stefanie Roos

Abstract: Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. Training effective diffusion models require massive real data, which is privately owned by distributed parties. Each data party can collaboratively train diffusion models in a federated learning manner by sharing gradients instead of the raw data. In this paper, we study the privacy l… ▽ More Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. Training effective diffusion models require massive real data, which is privately owned by distributed parties. Each data party can collaboratively train diffusion models in a federated learning manner by sharing gradients instead of the raw data. In this paper, we study the privacy leakage risk of gradient inversion attacks. First, we design a two-phase fusion optimization, GIDM, to leverage the well-trained generative model itself as prior knowledge to constrain the inversion search (latent) space, followed by pixel-wise fine-tuning. GIDM is shown to be able to reconstruct images almost identical to the original ones. Considering a more privacy-preserving training scenario, we then argue that locally initialized private training noise $ε$ and sampling step t may raise additional challenges for the inversion attack. To solve this, we propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data, $ε$ and $t$. Our extensive evaluation results demonstrate the vulnerability of sharing gradient for data protection of diffusion models, even high-resolution images can be reconstructed with high quality. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20334 [pdf, other]

VividDream: Generating 3D Scene with Ambient Dynamics

Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Project page: https://vivid-dream-4d.github.io

arXiv:2405.19879 [pdf, other]

Refractive index in the JUNO liquid scintillator

Authors: H. S. Zhang, M. Beretta, S. Cialdi, C. X. Yang, J. H. Huang, F. Ferraro, G. F. Cao, G. Reina, Z. Y. Deng, E. Suerra, S. Altilia, V. Antonelli, D. Basilico, A. Brigatti, B. Caccianiga, M. G. Giammarchi, C. Landini, P. Lombardi, L. Miramonti, E. Percalli, G. Ranucci, A. C. Re, P. Saggese, M. D. C. Torri, S. Aiello , et al. (51 additional authors not shown)

Abstract: In the field of rare event physics, it is common to have huge masses of organic liquid scintillator as detection medium. In particular, they are widely used to study neutrino properties or astrophysical neutrinos. Thanks to its safety properties (such as low toxicity and high flash point) and easy scalability, linear alkyl benzene is the most common solvent used to produce liquid scintillators for… ▽ More In the field of rare event physics, it is common to have huge masses of organic liquid scintillator as detection medium. In particular, they are widely used to study neutrino properties or astrophysical neutrinos. Thanks to its safety properties (such as low toxicity and high flash point) and easy scalability, linear alkyl benzene is the most common solvent used to produce liquid scintillators for large mass experiments. The knowledge of the refractive index is a pivotal point to understand the detector response, as this quantity (and its wavelength dependence) affects the Cherenkov radiation and photon propagation in the medium. In this paper, we report the measurement of the refractive index of the JUNO liquid scintillator between 260-1064 nm performed with two different methods (an ellipsometer and a refractometer), with a sub percent level precision. In addition, we used an interferometer to measure the group velocity in the JUNO liquid scintillator and verify the expected value derived from the refractive index measurements. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 6 pages, 9 figures

arXiv:2405.19665 [pdf]

A novel fault localization with data refinement for hydroelectric units

Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

arXiv:2405.19642 [pdf]

Few-shot fault diagnosis based on multi-scale graph convolution filtering for industry

Authors: Mengjie Gan, Penglong Lian, Zhiheng Su, Jiyang Zhang, Jialong Huang, Benhao Wang, Jianxiao Zou, Shicai Fan

Abstract: Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure mode… ▽ More Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure modes. To effectively leverage information and extract the intrinsic characteristics of faults across different domains under limited sample conditions, this paper introduces a fault diagnosis approach employing Multi-Scale Graph Convolution Filtering (MSGCF). MSGCF enhances the traditional Graph Neural Network (GNN) framework by integrating both local and global information fusion modules within the graph convolution filter block. This advancement effectively mitigates the over-smoothing issue associated with excessive layering of graph convolutional layers while preserving a broad receptive field. It also reduces the risk of overfitting in few-shot diagnosis, thereby augmenting the model's representational capacity. Experiments on the University of Paderborn bearing dataset (PU) demonstrate that the MSGCF method proposed herein surpasses alternative approaches in accuracy, thereby offering valuable insights for industrial fault diagnosis in few-shot learning scenarios. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6 pages, 2 figures, 2 tables, 63rd IEEE Conference on Decision and Control

arXiv:2405.19574 [pdf, other]

On Kinematic Measurements of Self-Gravity in Protoplanetary Disks

Authors: Sean M. Andrews, Richard Teague, Christopher P. Wirth, Jane Huang, Zhaohuan Zhu

Abstract: Using controlled injection and recovery experiments, we devised an analysis prescription to assess the quality of dynamical measurements of protoplanetary disk gas masses based on resolved (CO) spectral line data, given observational limitations (resolution, sampling, noise), measurement bias, and ambiguities in the geometry and physical conditions. With sufficient data quality, this approach perf… ▽ More Using controlled injection and recovery experiments, we devised an analysis prescription to assess the quality of dynamical measurements of protoplanetary disk gas masses based on resolved (CO) spectral line data, given observational limitations (resolution, sampling, noise), measurement bias, and ambiguities in the geometry and physical conditions. With sufficient data quality, this approach performed well for massive disks ($M_{\rm d}/M_\ast=0.1$): we inferred $M_{\rm d}$ posteriors that recovered the true values with little bias ($\lesssim$ 20%) and uncertainties within a factor of two (2$σ$). The gas surface density profiles for such cases are recovered with remarkable fidelity. Some experimentation indicates that this approach becomes insensitive when $M_{\rm d}/M_\ast\lesssim5$%, due primarily to degeneracies in the surface density profile parameters. Including multiple lines that probe different vertical layers, along with some improvements in the associated tools, might push that practical boundary down by another factor of $\sim$two in ideal scenarios. We also demonstrated this analysis approach using archival ALMA observations of the MWC 480 disk (Öberg et al. 2021): we measured $M_{\rm d}=0.13^{\: +0.04}_{\: -0.01} \: M_\odot$ (corresponding to $M_{\rm d}/M_\ast=7\pm1$%) and identified kinematic substructures consistent with surface density gaps around 65 and 135 au. Overall, this (and similar work) suggests that these dynamical measurements offer powerful new constraints with sufficient accuracy and precision to quantify gas masses and surface densities at the high end of the $M_{\rm d}/M_\ast$ distribution, and therefore can serve as key benchmarks for detailed thermo-chemical modeling. We address some prospects for improvements, and discuss various caveats and limitations to guide future work. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: ApJ, in press; 32 pages, 29 figures

arXiv:2405.19465 [pdf, other]

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

Authors: Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li

Abstract: Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient… ▽ More Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient text-video Retrieval with a sparse-andcorrelated AdaPter (RAP), i.e., fine-tuning the pre-trained model with a few parameterized layers. To accommodate the text-video scenario, we equip our RAP with two indispensable characteristics: temporal sparsity and correlation. Specifically, we propose a low-rank modulation module to refine the per-image features from the frozen CLIP backbone, which accentuates salient frames within the video features while alleviating temporal redundancy. Besides, we introduce an asynchronous self-attention mechanism that first selects the top responsive visual patches and augments the correlation modeling between them with learnable temporal and patch offsets. Extensive experiments on four TVR datasets demonstrate that RAP achieves superior or comparable performance compared to the fully fine-tuned counterpart and other parameter-efficient fine-tuning methods. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted by ACL 2024 Findings

arXiv:2405.18991 [pdf, other]

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the production of consistent frames and seamless motion transitions. The motion module can be adapted to various DiT baseline methods to generate video with different styles. It can also generate videos with different frame rates and resolutions during both training and inference phases, suitable for both images and videos. Moreover, we introduce slice VAE, a novel approach to condense the temporal axis, facilitating the generation of long duration videos. Currently, EasyAnimate exhibits the proficiency to generate videos with 144 frames. We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training (both the baseline model and LoRA model), and end-to-end video inference. Code is available at: https://github.com/aigc-apps/EasyAnimate. We are continuously working to enhance the performance of our method. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6 pages, 5 figures

arXiv:2405.18440 [pdf, other]

Spatiotemporal Diffusion Metamaterials: Theories and Applications

Authors: Jinrong Liu, Liujun Xu, Jiping Huang

Abstract: Diffusion metamaterials with artificial spatial structures have significant potential in controlling energy and mass transfer. Those static structures may lead to functionality and tunability constraints, impeding the application scope of diffusion metamaterials. Dynamic structures, adding the temporal dimension, have recently provided a new possibility for electric charge and heat diffusion regul… ▽ More Diffusion metamaterials with artificial spatial structures have significant potential in controlling energy and mass transfer. Those static structures may lead to functionality and tunability constraints, impeding the application scope of diffusion metamaterials. Dynamic structures, adding the temporal dimension, have recently provided a new possibility for electric charge and heat diffusion regulation. This perspective introduces the fundamental theories and practical constructions of spatiotemporal diffusion metamaterials for achieving nonreciprocal, topological, or tunable properties. Compared with traditional static design, spatiotemporal modulation is promising to manipulate diffusion processes dynamically, with applications of real-time thermal coding and programming. Existing spatiotemporal diffusion explorations are primarily at macroscopic systems, and we may envision extending these results to microscale and other physical domains like thermal radiation and mass diffusion shortly. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 figures. A perspective accepted by APL in press

arXiv:2405.18284 [pdf, other]

Adaptive debiased SGD in high-dimensional GLMs with streaming data

Authors: Ruijian Han, Lan Luo, Yuanhang Luo, Yuanyuan Lin, Jian Huang

Abstract: Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing… ▽ More Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing methods that either require full dataset access or large-dimensional summary statistics storage, our method operates in a single-pass mode, significantly reducing both time and space complexity. The core of our methodological innovation lies in an adaptive stochastic gradient descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure. This allows us to maintain low-dimensional summary statistics while effectively controlling optimization errors introduced by the dynamically changing loss functions. We demonstrate that our method, termed the Approximated Debiased Lasso (ADL), not only mitigates the need for the bounded individual probability condition but also significantly improves numerical performance. Numerical experiments demonstrate that the proposed ADL method consistently exhibits robust performance across various covariance matrix structures. △ Less

Submitted 1 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 37 pages, 4 figures

arXiv:2405.18258 [pdf, other]

Text-only Synthesis for Image Captioning

Authors: Qing Zhou, Junlin Huang, Qiang Li, Junyu Gao, Qi Wang

Abstract: From paired image-text training to text-only training for image captioning, the pursuit of relaxing the requirements for high-cost and large-scale annotation of good quality data remains consistent. In this paper, we propose Text-only Synthesis for Image Captioning (ToCa), which further advances this relaxation with fewer human labor and less computing time. Specifically, we deconstruct caption te… ▽ More From paired image-text training to text-only training for image captioning, the pursuit of relaxing the requirements for high-cost and large-scale annotation of good quality data remains consistent. In this paper, we propose Text-only Synthesis for Image Captioning (ToCa), which further advances this relaxation with fewer human labor and less computing time. Specifically, we deconstruct caption text into structures and lexical words, which serve as the fundamental components of the caption. By combining different structures and lexical words as inputs to the large language model, massive captions that contain various patterns of lexical words are generated. This method not only approaches the target domain but also surpasses it by generating new captions, thereby enhancing the zero-shot generalization ability of the model. Considering the different levels of data access in the real world, we define three synthesis scenarios: cross-domain synthesis, in-domain synthesis, and data-efficient synthesis. Experiments in these scenarios demonstrate the generalizability, transferability and practicability of ToCa with a nearly 5 CIDEr improvement for zero-shot cross-domain captioning and a maximum increase of over 20 CIDEr for data-efficient captioning. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.17659 [pdf, other]

Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba

Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yang Nan, Weiwen Wu, Chengyan Wang, Kuangyu Shi, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

Abstract: Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has sh… ▽ More Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority in learning visual representation, which combines the advantages of linear scalability and global sensitivity. In this study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decomposition for joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM) mechanism "masks out" redundant information to introduce randomness for further uncertainty estimation. Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertainty map without the need for hyperparameter tuning and mitigates the performance drop typically observed when applying dropout to low-level tasks. For further texture preservation and better perceptual quality, we employ the wavelet transformation into MambaMIR and explore its variant based on the Generative Adversarial Network, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GAN outperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIR achieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition, our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typical performance drop caused by the commonly used dropout. △ Less

Submitted 25 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17532 [pdf, other]

ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

Authors: Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Yunchao Wei

Abstract: Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g. headphone is missing when generating a <sks> dog wearing a headphone'). Interestingly, we notice that the bas… ▽ More Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g. headphone is missing when generating a <sks> dog wearing a headphone'). Interestingly, we notice that the base model before fine-tuning exhibits the capability to compose the base concept with other elements (e.g. a dog wearing a headphone) implying that the compositional ability only disappears after personalization tuning. Inspired by this observation, we present ClassDiffusion, a simple technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept. Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts. Extensive qualitative and quantitative experiments demonstrate that the use of semantic preservation loss effectively improves the compositional abilities of the fine-tune models. In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric, a more equitable and effective evaluation metric for this particular domain. We also provide in-depth empirical study and theoretical analysis to better understand the role of the proposed loss. Lastly, we also extend our ClassDiffusion to personalized video generation, demonstrating its flexibility. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17509 [pdf, other]

Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

Authors: Ze Cheng, Zhongkai Hao, Xiaoqiang Wang, Jianing Huang, Youjia Wu, Xudan Liu, Yiru Zhao, Songming Liu, Hang Su

Abstract: For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the r… ▽ More For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the requirement since even a single simulation may take hours or days of computation. To address this issue, we propose reference neural operators (RNO), a novel way of implementing neural operators, i.e., to learn the smooth dependence of solutions on geometric deformations. Specifically, given a reference solution, RNO can predict solutions corresponding to arbitrary deformations of the referred geometry. This approach turns out to be much more data efficient. Through extensive experiments, we show that RNO can learn the dependence across various types and different numbers of geometry objects with relatively small datasets. RNO outperforms baseline models in accuracy by a large lead and achieves up to 80% error reduction. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16464 [pdf, other]

Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge

Authors: Tianchen Deng, Yi Zhou, Wenhua Wu, Mingrui Li, Jingwei Huang, Shuhong Liu, Yanzeng Song, Hao Zuo, Yanbo Wang, Yutao Yue, Hesheng Wang, Weidong Chen

Abstract: This technical report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge. This challenge faces difficulties in drone detection, UAV-type classification and 2D/3D trajectory estimation in extreme weather conditions with multi-modal sensor information, including stereo vision, various Lidars, Radars, and audio arrays. Leveraging this information… ▽ More This technical report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge. This challenge faces difficulties in drone detection, UAV-type classification and 2D/3D trajectory estimation in extreme weather conditions with multi-modal sensor information, including stereo vision, various Lidars, Radars, and audio arrays. Leveraging this information, we propose a multi-modal UAV detection, classification, and 3D tracking method for accurate UAV classification and tracking. A novel classification pipeline which incorporates sequence fusion, region of interest (ROI) cropping, and keyframe selection is proposed. Our system integrates cutting-edge classification techniques and sophisticated post-processing steps to boost accuracy and robustness. The designed pose estimation pipeline incorporates three modules: dynamic points analysis, a multi-object tracker, and trajectory completion techniques. Extensive experiments have validated the effectiveness and precision of our approach. In addition, we also propose a novel dataset pre-processing method and conduct a comprehensive ablation study for our design. We finally achieved the best performance in the classification and tracking of the MMUAD dataset. The code and configuration of our method are available at https://github.com/dtc111111/Multi-Modal-UAV. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR 2024 workshop. The 1st winning model in CVPR 2024 UG2+ challenge. The code and configuration of our method are available at https://github.com/dtc111111/Multi-Modal-UAV

arXiv:2405.15967 [pdf, other]

3-Minute Oscillations in the Upper Corona: Evidence from Parker Solar Probe

Authors: Zesen Huang, Marco Velli, Chen Shi, Yingjie Zhu, B. D. G. Chandran, Victor Réville, Trevor Bowen, Nikos Sioulas, Marc Pulupa, Jia Huang, Sheng Huang

Abstract: Recent observations of Parker Solar Probe (PSP) from around the Alfvén surface have shown that the trace magnetic power spectrum density (PSD) is often characterized by a shallow-inertial double power law, where in the low frequency energy injection range, the power spectrum is shallow (flatter than $1/f$), and in the inertial range the spectrum is steep, with a scaling index of [1.5, 1.67]. Conse… ▽ More Recent observations of Parker Solar Probe (PSP) from around the Alfvén surface have shown that the trace magnetic power spectrum density (PSD) is often characterized by a shallow-inertial double power law, where in the low frequency energy injection range, the power spectrum is shallow (flatter than $1/f$), and in the inertial range the spectrum is steep, with a scaling index of [1.5, 1.67]. Consequently, close to the sun, the majority of the fluctuation energy concentrates in a small frequency range around the low frequency power spectral break. In this work, we conduct a systematic survey of PSP observations for the first 17 encounters to statistically study the energy behaviors of the magnetic fluctuations. Our results show that the center frequency of fluctuation energy systematically drifts to around 3-minute for the most pristine solar wind (smallest solar wind advection time). Moreover, the center frequency rapidly drifts to lower frequency as solar wind advection time increases, as expected for active turbulence. The concentration of fluctuation energy around 3-minutes suggests that Alfvénic fluctuations in solar wind might mostly be coming from resonant p-mode oscillations in the photosphere, though other potential sources are discussed. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15954 [pdf, other]

Searches for new physics below twice the electron mass with GERDA

Authors: GERDA Collaboration, M. Agostini, A. Alexander, G. R. Araujo, A. M. Bakalyarov, M. Balata, I. Barabanov, L. Baudis, C. Bauer, S. Belogurov, A. Bettini, L. Bezrukov, V. Biancacci, E. Bossio, V. Bothe, R. Brugnera, A. Caldwell, S. Calgaro, C. Cattadori, A. Chernogorov, P. -J. Chiu, T. Comellato, V. D'Andrea, E. V. Demidova, N. Di Marco , et al. (86 additional authors not shown)

Abstract: A search for full energy depositions from bosonic keV-scale dark matter candidates of masses between 65 keV and 1021 keV has been performed with data collected during Phase II of the GERmanium Detector Array (GERDA) experiment. Our analysis includes direct dark matter absorption as well as dark Compton scattering. With a total exposure of 105.5 kg yr, no evidence for a signal above the background… ▽ More A search for full energy depositions from bosonic keV-scale dark matter candidates of masses between 65 keV and 1021 keV has been performed with data collected during Phase II of the GERmanium Detector Array (GERDA) experiment. Our analysis includes direct dark matter absorption as well as dark Compton scattering. With a total exposure of 105.5 kg yr, no evidence for a signal above the background has been observed. The resulting exclusion limits deduced with either Bayesian or Frequentist statistics are the most stringent direct constraints in the major part of the 140-1021 keV mass range. As an example, at a mass of 150 keV the dimensionless coupling of dark photons and axion-like particles to electrons has been constrained to $α$'/$α$ < 8.7x10$^{-24}$ and g$_{ae}$ < 3.3x10$^{-12}$ at 90% credible interval (CI), respectively. Additionally, a search for peak-like signals from beyond the Standard Model decays of nucleons and electrons is performed. We find for the inclusive decay of a single neutron in $^{76}$Ge a lower lifetime limit of $τ_n$ > 1.5x10$^{24}$ yr and for a proton $τ_p$ > 1.3x10$^{24}$ yr at 90% CI. For the electron decay e$^-\rightarrowν_eγ$ a lower limit of $τ_e$ > 5.4x10$^{25}$ yr at 90% CI has been determined. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 20 pages, 12 figures, 7 tables

arXiv:2405.15895 [pdf, other]

Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

Authors: Pranshu Malviya, Jerry Huang, Quentin Fournier, Sarath Chandar

Abstract: The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced s… ▽ More The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced statistics to measure these effects, they remain flawed. To rectify this, we offer a new approach for understanding and quantifying the impact of expansion through the lens of the loss landscape, which has been shown to contain a manifold of linearly connected minima. Building on this new perspective, we propose a metric to study the impact of expansion by estimating the size of the manifold. Experimental results show a clear relationship between gains in performance and manifold size, enabling the comparison of candidate models and presenting a first step towards expanding models more reliably based on geometric properties of the loss landscape. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15844 [pdf, other]

Near subsonic solar wind outflow from an active region

Authors: Tamar Ervin, Stuart D. Bale, Samuel T. Badman, Trevor A. Bowen, Pete Riley, Kristoff Paulson, Yeimy J. Rivera, Orlando Romeo, Nikos Sioulas, Davin E. Larson, Jaye L. Verniero, Ryan M. Dewey, Jia Huang

Abstract: During Parker Solar Probe (Parker) Encounter 15 (E15), we observe an 18-hour period of near subsonic ($\mathrm{M_S \sim}$ 1) and sub-Alfvénic (SA), $\mathrm{M_A}$ <<< 1, slow speed solar wind from 22 to 15.6 R$_\odot$. As the most extreme SA interval measured to date and skirting the solar wind sonic point, it is the deepest Parker has probed into the formation and acceleration region of the solar… ▽ More During Parker Solar Probe (Parker) Encounter 15 (E15), we observe an 18-hour period of near subsonic ($\mathrm{M_S \sim}$ 1) and sub-Alfvénic (SA), $\mathrm{M_A}$ <<< 1, slow speed solar wind from 22 to 15.6 R$_\odot$. As the most extreme SA interval measured to date and skirting the solar wind sonic point, it is the deepest Parker has probed into the formation and acceleration region of the solar wind in the corona. The stream is also measured by Wind and MMS near 1AU at times consistent with ballistic propagation of this slow stream. We investigate the stream source, properties and potential coronal heating consequences via combining these observations with coronal modeling and turbulence analysis. Through source mapping, in situ evidence and multi-point arrival time considerations of a candidate CME, we determine the stream is a steady (non-transient), long-lived and approximately Parker spiral aligned and arises from overexpanded field lines mapping back to an active region. Turbulence analysis of the Elsässer variables shows the inertial range scaling of the $\mathrm{\mathbf{z}^{+}}$ mode ($\mathrm{f \sim ^{-3/2}}$) to be dominated by the slab component. We discuss the spectral flattening and difficulties associated with measuring the $\mathrm{\mathbf{z}^{-}}$ spectra, cautioning against making definitive conclusions from the $\mathrm{\mathbf{z}^{-}}$ mode. Despite being more extreme than prior sub-Alfvénic intervals, its turbulent nature does not appear to be qualitatively different from previously observed streams. We conclude that this extreme low dynamic pressure solar wind interval (which has the potential for extreme space weather conditions) is a large, steady structure spanning at least to 1AU. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 14 figures

arXiv:2405.15491 [pdf, other]

GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting

Authors: Jiajun Huang, Hongchuan Yu

Abstract: We present GSDeformer, a method that achieves free-form deformation on 3D Gaussian Splatting(3DGS) without requiring any architectural changes. Our method extends cage-based deformation, a traditional mesh deformation method, to 3DGS. This is done by converting 3DGS into a novel proxy point cloud representation, where its deformation can be used to infer the transformations to apply on the 3D gaus… ▽ More We present GSDeformer, a method that achieves free-form deformation on 3D Gaussian Splatting(3DGS) without requiring any architectural changes. Our method extends cage-based deformation, a traditional mesh deformation method, to 3DGS. This is done by converting 3DGS into a novel proxy point cloud representation, where its deformation can be used to infer the transformations to apply on the 3D gaussians making up 3DGS. We also propose an automatic cage construction algorithm for 3DGS to minimize manual work. Our method does not modify the underlying architecture of 3DGS. Therefore, any existing trained vanilla 3DGS can be easily edited by our method. We compare the deformation capability of our method against other existing methods, demonstrating the ease of use and comparable quality of our method, despite being more direct and thus easier to integrate with other concurrent developments on 3DGS. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: For project page, see https://jhuangbu.github.io/gsdeformer

arXiv:2405.15484 [pdf, other]

Spin, inclination, and magnetic field evolution of magnetar population in vacuum and plasma-filled magnetospheres

Authors: Jun-Xiang Huang, Hou-Jun Lü, Jared Rice, En-Wei Liang

Abstract: Magnetars are potential energy sources or central engines for numerous transient phenomena in the Universe. How newborn magnetars evolve in different environments remains an open question. Based on both observed and candidate magnetars, it is found that the periods of all magnetars or candidates appear as a bimodal distribution, and are defined as the ``long-P'' and ``short-P'' magnetar subclasses… ▽ More Magnetars are potential energy sources or central engines for numerous transient phenomena in the Universe. How newborn magnetars evolve in different environments remains an open question. Based on both observed and candidate magnetars, it is found that the periods of all magnetars or candidates appear as a bimodal distribution, and are defined as the ``long-P'' and ``short-P'' magnetar subclasses, respectively. We find that for the ``short-P'' subclass of magnetars, the $\dot{P}$ values also appear as a bimodal distribution, and therefore can be classified as ``high-$\dot{P}$ short-P'' and ``low-$\dot{P}$ short-P'' magnetar subclasses. In this paper, we use Monte Carlo simulations to generate synthetic magnetar populations and investigate the evolution of the ``high-$\dot{P}$ short-P'' and ``low-$\dot{P}$ short-P'' magnetar subclasses by considering both the magnetar spin and inclination, as well as the decay of their magnetic field within their evolution in both vacuum and plasma-filled magnetospheres. We find that the magnetar evolution is dependent on both spin and magnetic field, but seems to be insensitive to inclination evolution and magnetospheric environment for the ``high-$\dot{P}$ short-P'' sub-class. In comparison for the case of ``high-$\dot{P}$ short-P'', the magnetar evolution is dependent on spin, magnetic field, and inclination evolution, as well as the magnetospheric environment. The best evolution model should be the case of inclination evolution in vacuum with a small value of $\overline{\mathrm{FOM}}$. The differences in the best-fit parameters also suggest that the ``high-$\dot{P}$ short-P'' and ``low-$\dot{P}$ short-P'' magnetar subclasses may be tracking with different evolution channels. △ Less

Submitted 18 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 18 pages, 2 Tables, and 9 Figures. PRD in press, and matched with the published verison

arXiv:2405.15347 [pdf, ps, other]

Normalized ground states for the mass supercritical Schrödinger-Bopp-Podolsky system: existence, limit behavior, strong instability

Authors: Juan Huang, Sheng Wang

Abstract: This paper concerns the normalized ground states for the nonlinear Schrödinger equation in the Bopp-Podolsky electrodynamics. This equation has a nonlocal nonlinearity and a mass supercritical power nonlinearity, both of which have deep impact on the geometry of the corresponding functional, and thus on the existence, limit behavior and stability of the normalized ground states. In the present stu… ▽ More This paper concerns the normalized ground states for the nonlinear Schrödinger equation in the Bopp-Podolsky electrodynamics. This equation has a nonlocal nonlinearity and a mass supercritical power nonlinearity, both of which have deep impact on the geometry of the corresponding functional, and thus on the existence, limit behavior and stability of the normalized ground states. In the present study, the existence of critical points is obtained by a mountain-pass argument developed on the $L^2$-spheres. To be specific, we show that normalized ground states exist, provided that spherical radius of the $L^2$-spheres is sufficiently small. Then, by discussing the relation between the normalized ground states of the Schrödinger-Bopp-Podolsky system and the classical Schrödinger equation, we show a precise description of the asymptotic behavior of the normalized ground states as the mass vanishes or tends to infinity. Finally, the strong instability of standing waves at the mountain-pass energy level is studied by constructing an equivalent minimizing problem. Also, as a byproduct, we prove that the mountain-pass energy level gives a threshold for global existence based on this equivalent minimizing problem. △ Less

Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14961 [pdf, other]

SFDDM: Single-fold Distillation for Diffusion models

Authors: Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen

Abstract: While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3… ▽ More While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14572 [pdf, other]

Multicontinuum Homogenization for Coupled Flow and Transport Equations

Authors: Dmitry Ammosov, W. T. Leung, Buzheng Shan, Jian Huang

Abstract: In this paper, we present the derivation of a multicontinuum model for the coupled flow and transport equations by applying multicontinuum homogenization. We perform the multicontinuum expansion for both flow and transport solutions and formulate novel coupled constraint cell problems to capture the multiscale property, where oversampled regions are utilized to avoid boundary effects. Assuming the… ▽ More In this paper, we present the derivation of a multicontinuum model for the coupled flow and transport equations by applying multicontinuum homogenization. We perform the multicontinuum expansion for both flow and transport solutions and formulate novel coupled constraint cell problems to capture the multiscale property, where oversampled regions are utilized to avoid boundary effects. Assuming the smoothness of macroscopic variables, we obtain a multicontinuum system composed of macroscopic elliptic equations and convection-diffusion-reaction equations with homogenized effective properties. Finally, we present numerical results for various coefficient fields and boundary conditions to validate our proposed algorithm. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14303 [pdf, other]

Similarity-Navigated Conformal Prediction for Graph Neural Networks

Authors: Jianqing Song, Jianguo Huang, Wenyu Jiang, Baoming Zhang, Shuangjie Li, Chongjun Wang

Abstract: Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically s… ▽ More Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically show that for each node, aggregating the non-conformity scores of nodes with the same label can improve the efficiency of conformal prediction sets. This observation motivates us to propose a novel algorithm named Similarity-Navigated Adaptive Prediction Sets (SNAPS), which aggregates the non-conformity scores based on feature similarity and structural neighborhood. The key idea behind SNAPS is that nodes with high feature similarity or direct connections tend to have the same label. By incorporating adaptive similar nodes information, SNAPS can generate compact prediction sets and increase the singleton hit ratio (correct prediction sets of size one). Moreover, we theoretically provide a finite-sample coverage guarantee of SNAPS. Extensive experiments demonstrate the superiority of SNAPS, improving the efficiency of prediction sets and singleton hit ratio while maintaining valid coverage. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14135 [pdf, other]

Learning Geospatial Region Embedding with Heterogeneous Graph

Authors: Xingchen Zou, Jiani Huang, Xixuan Hao, Yuhao Yang, Haomin Wen, Yibo Yan, Chao Huang, Yuxuan Liang

Abstract: Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we… ▽ More Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we present GeoHG, an effective heterogeneous graph structure for learning comprehensive region embeddings for various downstream tasks. Specifically, we tailor satellite image representation learning through geo-entity segmentation and point-of-interest (POI) integration for expressive intra-regional features. Furthermore, GeoHG unifies informative spatial interdependencies and socio-environmental attributes into a powerful heterogeneous graph to encourage explicit modeling of higher-order inter-regional relationships. The intra-regional features and inter-regional correlations are seamlessly integrated by a model-agnostic graph learning framework for diverse downstream tasks. Extensive experiments demonstrate the effectiveness of GeoHG in geo-prediction tasks compared to existing methods, even under extreme data scarcity (with just 5% of training data). With interpretable region representations, GeoHG exhibits strong generalization capabilities across regions. We will release code and data upon paper notification. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13448 [pdf, other]

Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

Authors: Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang

Abstract: The process of instruction tuning aligns pre-trained large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from more powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of task distributions and the varying difficulty of instructions of the t… ▽ More The process of instruction tuning aligns pre-trained large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from more powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of task distributions and the varying difficulty of instructions of the training sets. This oversight can lead to imbalanced knowledge capabilities and poor generalization powers of small student LLMs. To address this challenge, we introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR), a multi-round distillation framework with balanced task distributions and dynamic difficulty adjustment. This approach utilizes an oracle LLM to select instructions that are difficult for a student LLM to follow and distill instructions with balanced task distributions. By incorporating curriculum planning, our approach systematically escalates the difficulty levels, progressively enhancing the student LLM's capabilities. We rigorously evaluate TAPIR using two widely recognized benchmarks, including AlpacaEval 2.0 and MT-Bench. The empirical results demonstrate that the student LLMs, trained with our method and less training data, outperform larger instruction-tuned models and strong distillation baselines. The improvement is particularly notable in complex tasks, such as logical reasoning and code generation. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13372 [pdf, other]

Ada-HGNN: Adaptive Sampling for Scalable Hypergraph Neural Networks

Authors: Shuai Wang, David W. Zhang, Jia-Hong Huang, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Abstract: Hypergraphs serve as an effective model for depicting complex connections in various real-world scenarios, from social to biological networks. The development of Hypergraph Neural Networks (HGNNs) has emerged as a valuable method to manage the intricate associations in data, though scalability is a notable challenge due to memory limitations. In this study, we introduce a new adaptive sampling str… ▽ More Hypergraphs serve as an effective model for depicting complex connections in various real-world scenarios, from social to biological networks. The development of Hypergraph Neural Networks (HGNNs) has emerged as a valuable method to manage the intricate associations in data, though scalability is a notable challenge due to memory limitations. In this study, we introduce a new adaptive sampling strategy specifically designed for hypergraphs, which tackles their unique complexities in an efficient manner. We also present a Random Hyperedge Augmentation (RHA) technique and an additional Multilayer Perceptron (MLP) module to improve the robustness and generalization capabilities of our approach. Thorough experiments with real-world datasets have proven the effectiveness of our method, markedly reducing computational and memory demands while maintaining performance levels akin to conventional HGNNs and other baseline models. This research paves the way for improving both the scalability and efficacy of HGNNs in extensive applications. We will also make our codebase publicly accessible. △ Less

Submitted 14 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Showing 51–100 of 4,184 results for author: Huang, J