Search | arXiv e-print repository

arXiv:2409.03728 [pdf, other]

Multiplicity dependent $J/ψ$ and $ψ(2S)$ production at forward and backward rapidity in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, C. Aidala, Y. Akiba, M. Alfred, V. Andrieux, S. Antsupov, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, N. S. Bandara, E. Bannikov, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau, J. S. Bok , et al. (276 additional authors not shown)

Abstract: The $J/ψ$ and $ψ(2S)$ charmonium states, composed of $c\bar{c}$ quark pairs and known since the 1970s, are widely believed to serve as ideal probes to test quantum chromodynamics in high-energy hadronic interactions. However, there is not yet a complete understanding of the charmonium-production mechanism. Recent measurements of $J/ψ$ production as a function of event charged-particle multiplicity… ▽ More The $J/ψ$ and $ψ(2S)$ charmonium states, composed of $c\bar{c}$ quark pairs and known since the 1970s, are widely believed to serve as ideal probes to test quantum chromodynamics in high-energy hadronic interactions. However, there is not yet a complete understanding of the charmonium-production mechanism. Recent measurements of $J/ψ$ production as a function of event charged-particle multiplicity at the collision energies of both the Large Hadron Collider (LHC) and the Relativistic Heavy Ion Collider (RHIC) show enhanced $J/ψ$ production yields with increasing multiplicity. One potential explanation for this type of dependence is multiparton interactions (MPI). We carry out the first measurements of self-normalized $J/ψ$ yields and the $ψ(2S)$ to $J/ψ$ ratio at both forward and backward rapidities as a function of self-normalized charged-particle multiplicity in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV. In addition, detailed {\sc pythia} studies tuned to RHIC energies were performed to investigate the MPI impacts. We find that the PHENIX data at RHIC are consistent with recent LHC measurements and can only be described by {\sc pythia} calculations that include MPI effects. The forward and backward $ψ(2S)$ to $J/ψ$ ratio, which serves as a unique and powerful approach to study final-state effects on charmonium production, is found to be less dependent on the charged-particle multiplicity. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 301 authors from 69 institutions, 8 pages, 3 figures. v1 is version submitted to Physical Review D Letters. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2408.16211 [pdf, ps, other]

Hecke growth diagrams, and maximal increasing and decreasing sequences in fillings of stack polyominoes

Authors: Ting Guo, Gaofan Li

Abstract: We establish a bijection between $01$-fillings of stack polyominoes with at most one $1$ per column and labelings of the corners along the top-right border of stack polyominoes. These labellings indicate the lengths of the longest increasing and decreasing chains of the largest rectangular region below and to the left of the corners. Our results provide an alternative proof of Guo and Poznanović's… ▽ More We establish a bijection between $01$-fillings of stack polyominoes with at most one $1$ per column and labelings of the corners along the top-right border of stack polyominoes. These labellings indicate the lengths of the longest increasing and decreasing chains of the largest rectangular region below and to the left of the corners. Our results provide an alternative proof of Guo and Poznanović's theorem on the lengths of the longest increasing and decreasing chains have a symmetric joint distribution over $01$-fillings of stack polyomino. Moreover, our results offer new perspective to Chen, Guo and Pang's result on the crossing number and the nesting number have a symmetric joint distribution over linked partitions. In particular, our construction generalizes the growth diagram techniques of Rubey for the $01$-fillings of stack polyominoes with at most one $1$ per column and row. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.12830 [pdf, other]

SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning

Authors: Wang Luo, Haoran Li, Zicheng Zhang, Congying Han, Jiayu Lv, Tiande Guo

Abstract: Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics, without direct real-world environment interactions. However, this method is inherently challenged by distribution shift. Previous approaches have primarily focused on tackling this issue directly leveraging off-policy mechanisms and heuristic uncertainty in model dynamics, but they resulted in… ▽ More Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics, without direct real-world environment interactions. However, this method is inherently challenged by distribution shift. Previous approaches have primarily focused on tackling this issue directly leveraging off-policy mechanisms and heuristic uncertainty in model dynamics, but they resulted in inconsistent objectives and lacked a unified theoretical foundation. This paper offers a comprehensive analysis that disentangles the problem into two key components: model bias and policy shift. We provide both theoretical insights and empirical evidence to demonstrate how these factors lead to inaccuracies in value function estimation and impose implicit restrictions on policy learning. To address these challenges, we derive adjustment terms for model bias and policy shift within a unified probabilistic inference framework. These adjustments are seamlessly integrated into the vanilla reward function to create a novel Shifts-aware Reward (SAR), aiming at refining value learning and facilitating policy training. Furthermore, we introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL), a practical framework that efficiently trains classifiers to approximate the SAR for policy optimization. Empirically, we show that SAR effectively mitigates distribution shift, and SAMBO-RL demonstrates superior performance across various benchmarks, underscoring its practical effectiveness and validating our theoretical analysis. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.11694 [pdf, ps, other]

Common fixed point theorems for a commutative family of nonexpansive mappings in complete random normed modules

Authors: Xiaohuan Mu, Qiang Tu, Tiexin Guo, Hong-Kun Xu

Abstract: In this paper, we first introduce and study the notion of random Chebyshev centers. Further, based on the recently developed theory of stable sets, we introduce the notion of random complete normal structure so that we can prove the two deeper theorems: one of which states that random complete normal structure is equivalent to random normal structure for an $L^0$-convexly compact set in a complete… ▽ More In this paper, we first introduce and study the notion of random Chebyshev centers. Further, based on the recently developed theory of stable sets, we introduce the notion of random complete normal structure so that we can prove the two deeper theorems: one of which states that random complete normal structure is equivalent to random normal structure for an $L^0$-convexly compact set in a complete random normed module; the other of which states that if $G$ is an $L^0$-convexly compact subset with random normal structure of a complete random normed module, then every commutative family of nonexpansive mappings from $G$ to $G$ has a common fixed point. We also consider the fixed point problems for isometric mappings in complete random normed modules. Finally, as applications of the fixed point theorems established in random normed modules, when the measurable selection theorems fail to work, we can still prove that a family of strong random nonexpansive operators from $(Ω,\mathcal{F},P)\times C$ to $C$ has a common random fixed point, where $(Ω,\mathcal{F},P)$ is a probability space and $C$ is a weakly compact convex subset with normal structure of a Banach space. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11144 [pdf, other]

Measurement of inclusive jet cross section and substructure in $p$$+$$p$ collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, J. Alexander, M. Alfred, V. Andrieux, S. Antsupov, K. Aoki, N. Apadula, H. Asano, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, X. Bai, N. S. Bandara, B. Bannier, E. Bannikov, K. N. Barish, S. Bathe , et al. (422 additional authors not shown)

Abstract: The jet cross-section and jet-substructure observables in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV were measured by the PHENIX Collaboration at the Relativistic Heavy Ion Collider (RHIC). Jets are reconstructed from charged-particle tracks and electromagnetic-calorimeter clusters using the anti-$k_{t}$ algorithm with a jet radius $R=0.3$ for jets with transverse momentum within $8.0<p_T<40.0$ Ge… ▽ More The jet cross-section and jet-substructure observables in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV were measured by the PHENIX Collaboration at the Relativistic Heavy Ion Collider (RHIC). Jets are reconstructed from charged-particle tracks and electromagnetic-calorimeter clusters using the anti-$k_{t}$ algorithm with a jet radius $R=0.3$ for jets with transverse momentum within $8.0<p_T<40.0$ GeV/$c$ and pseudorapidity $|η|<0.15$. Measurements include the jet cross section, as well as distributions of SoftDrop-groomed momentum fraction ($z_g$), charged-particle transverse momentum with respect to jet axis ($j_T$), and radial distributions of charged particles within jets ($r$). Also meaureed was the distribution of $ξ=-ln(z)$, where $z$ is the fraction of the jet momentum carried by the charged particle. The measurements are compared to theoretical next-to and next-to-next-to-leading-order calculatios, PYTHIA event generator, and to other existing experimental results. Indicated from these meaurements is a lower particle multiplicity in jets at RHIC energies when compared to models. Also noted are implications for future jet measurements with sPHENIX at RHIC as well as at the future Election-Ion Collider. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 446 authors from 77 institutions, 11 pages, 8 figures. v1 is version submitted to Physical Review D. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2408.10486 [pdf, ps, other]

Revisiting Evolutionary Program Repair via Code Language Model

Authors: Yunan Wang, Tingyu Guo, Zilong Huang, Yuan Yuan

Abstract: Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of Large Language Models, Code Language Models (CLMs) trained on code corpora excels in code generation, making them suitable for APR applications. Despite this progress, a significant limitation remain… ▽ More Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of Large Language Models, Code Language Models (CLMs) trained on code corpora excels in code generation, making them suitable for APR applications. Despite this progress, a significant limitation remains: many bugs necessitate multi-point edits for repair, yet current CLM-based APRs are restricted to single-point bug fixes, which severely narrows the scope of repairable bugs. Moreover, these tools typically only consider the direct context of the buggy line when building prompts for the CLM, leading to suboptimal repair outcomes due to the limited information provided. This paper introduces a novel approach, ARJA-CLM, which integrates the multiobjective evolutionary algorithm with CLM to fix multilocation bugs in Java projects. We also propose a context-aware prompt construction stratege, which enriches the prompt with additional information about accessible fields and methods for the CLM generating candidate statements. Our experiments on the Defects4J and APR-2024 competition benchmark demonstrate that ARJA-CLM surpasses many state-of-the-art repair systems, and performs well on multi-point bugs. The results also reveal that CLMs effectively utilize the provided field and method information within context-aware prompts to produce candidate statements. △ Less

Submitted 26 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.09987 [pdf]

doi 10.1364/OE.534828

Ultrabroadband Coherent Perfect Absorption with Composite Graphene Metasurfaces

Authors: Wei Zou, Tianjing Guo, Christos Argyropoulos

Abstract: We investigate the design and performance of a new multilayer graphene metasurface for achieving ultrabroadband coherent perfect absorption (CPA) in the THz regime. The proposed structure comprises of three graphene patterned metasurfaces separated by thin dielectric spacer layers. The top and bottom metasurfaces have cross shape unit cells with varying sizes, while the middle graphene metasurface… ▽ More We investigate the design and performance of a new multilayer graphene metasurface for achieving ultrabroadband coherent perfect absorption (CPA) in the THz regime. The proposed structure comprises of three graphene patterned metasurfaces separated by thin dielectric spacer layers. The top and bottom metasurfaces have cross shape unit cells with varying sizes, while the middle graphene metasurface is square-shaped. This distinctive geometrical asymmetry and the presence of multiple layers within the structure facilitate the achievement of wideband asymmetric reflection under incoherent illumination. This interesting property serves as a crucial step towards achieving near-total absorption under coherent illumination across a broad frequency range. Numerical simulations demonstrate that the absorption efficiency surpasses 90% across an ultrabroadband frequency range from 2.8 to 5.7 THz, i.e., bandwidth of 2.9 THz. The CPA effect can be selectively tuned by manipulating the phase difference between the two incident coherent beams. Moreover, the absorption response can be dynamically adjusted by altering the Fermi level of graphene. The study also examines the influence of geometric parameters on the absorption characteristics. The results of this research work offer valuable insights into the design of broadband graphene metasurfaces for coherent absorption applications, and they contribute to the advancement of sophisticated optical devices operating in the THz frequency range. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Journal ref: Opt. Express, vol. 32, No. 19, pp. 32667-32679, 2024

arXiv:2408.05385 [pdf, other]

Expected $1.x$-Makespan-Optimal MAPF on Grids in Low-Poly Time

Authors: Teng Guo, Jingjin Yu

Abstract: Multi-Agent Path Finding (MAPF) is NP-hard to solve optimally, even on graphs, suggesting no polynomial-time algorithms can compute exact optimal solutions for them. This raises a natural question: How optimal can polynomial-time algorithms reach? Whereas algorithms for computing constant-factor optimal solutions have been developed, the constant factor is generally very large, limiting their appl… ▽ More Multi-Agent Path Finding (MAPF) is NP-hard to solve optimally, even on graphs, suggesting no polynomial-time algorithms can compute exact optimal solutions for them. This raises a natural question: How optimal can polynomial-time algorithms reach? Whereas algorithms for computing constant-factor optimal solutions have been developed, the constant factor is generally very large, limiting their application potential. In this work, among other breakthroughs, we propose the first low-polynomial-time MAPF algorithms delivering $1$-$1.5$ (resp., $1$-$1.67$) asymptotic makespan optimality guarantees for 2D (resp., 3D) grids for random instances at a very high $1/3$ agent density, with high probability. Moreover, when regularly distributed obstacles are introduced, our methods experience no performance degradation. These methods generalize to support $100\%$ agent density. Regardless of the dimensionality and density, our high-quality methods are enabled by a unique hierarchical integration of two key building blocks. At the higher level, we apply the labeled Grid Rearrangement Algorithm (RTA), capable of performing efficient reconfiguration on grids through row/column shuffles. At the lower level, we devise novel methods that efficiently simulate row/column shuffles returned by RTA. Our implementations of RTA-based algorithms are highly effective in extensive numerical evaluations, demonstrating excellent scalability compared to other SOTA methods. For example, in 3D settings, \rta-based algorithms readily scale to grids with over $370,000$ vertices and over $120,000$ agents and consistently achieve conservative makespan optimality approaching $1.5$, as predicted by our theoretical analysis. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.08976

arXiv:2408.02907 [pdf, other]

Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

Authors: Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen

Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we pr… ▽ More Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we propose a new retrieval framework IIER, that leverages Inter-chunk Interactions to Enhance Retrieval. This framework captures the internal connections between document chunks by considering three types of interactions: structural, keyword, and semantic. We then construct a unified Chunk-Interaction Graph to represent all external documents comprehensively. Additionally, we design a graph-based evidence chain retriever that utilizes previous paths and chunk interactions to guide the retrieval process. It identifies multiple seed nodes based on the target question and iteratively searches for relevant chunks to gather supporting evidence. This retrieval process refines the context and reasoning chain, aiding the large language model in reasoning and answer generation. Extensive experiments demonstrate that IIER outperforms strong baselines across four datasets, highlighting its effectiveness in improving retrieval and reasoning capabilities. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.01572 [pdf]

Electrical excitation of color centers in phosphorus-doped diamond Schottky diodes

Authors: Florian Sledz, Igor A. Khramtsov, Assegid M. Flatae, Stefano Lagomarsino, Silvio Sciortino, Shannon S. Nicley, Rozita Rouzbahani, Paulius Pobedinskas, Tianxiao Guo, Xin Jiang, Paul Kienitz, Peter Haring Bolivar, Ken Haenen, Dmitry Yu. Fedyanin, Mario Agio

Abstract: A robust quantum light source operating upon electrical injection at ambient conditions is desirable for practical implementation of quantum technologies, such as quantum key distribution or metrology. Color centers in diamond are promising candidates as they are photostable emitters at room and elevated temperatures. The possibility of their electrical excitation has already been demonstrated wit… ▽ More A robust quantum light source operating upon electrical injection at ambient conditions is desirable for practical implementation of quantum technologies, such as quantum key distribution or metrology. Color centers in diamond are promising candidates as they are photostable emitters at room and elevated temperatures. The possibility of their electrical excitation has already been demonstrated within p-i-n diodes. However, this requires the growth of complex diamond structures. In contrast to these conventional approaches, we demonstrate the emission of color centers under electrical pumping in a novel Schottky diode configuration based on hydrogen passivated n-type diamond, which holds promise for integrated single-photon emitting devices based on color centers in diamond. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 14 pages, 4 figures

arXiv:2407.18443 [pdf, other]

HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors

Authors: Ashkan Ganj, Hang Su, Tian Guo

Abstract: We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses the unique challenges of depth estimation for mobile AR, such as scale ambiguity, hardware heterogeneity, and generalizability. HYBRIDDEPTH leverages the camera features available on mobile devices. It effectively combines the scale accuracy inherent in Depth from Focus (DFF) methods with the generalization capabilities enab… ▽ More We propose HYBRIDDEPTH, a robust depth estimation pipeline that addresses the unique challenges of depth estimation for mobile AR, such as scale ambiguity, hardware heterogeneity, and generalizability. HYBRIDDEPTH leverages the camera features available on mobile devices. It effectively combines the scale accuracy inherent in Depth from Focus (DFF) methods with the generalization capabilities enabled by strong single-image depth priors. By utilizing the focal planes of a mobile camera, our approach accurately captures depth values from focused pixels and applies these values to compute scale and shift parameters for transforming relative depths into metric depths. We test our pipeline as an end-to-end system, with a newly developed mobile client to capture focal stacks, which are then sent to a GPU-powered server for depth estimation. Through comprehensive quantitative and qualitative analyses, we demonstrate that HYBRIDDEPTH not only outperforms state-of-the-art (SOTA) models in common datasets (DDFF12, NYU Depth v2) and a real-world AR dataset ARKitScenes but also demonstrates strong zero-shot generalization. For example, HYBRIDDEPTH trained on NYU Depth v2 achieves comparable performance on the DDFF12 to existing models trained on DDFF12. it also outperforms all the SOTA models in zero-shot performance on the ARKitScenes dataset. Additionally, we conduct a qualitative comparison between our model and the ARCore framework, demonstrating that our models output depth maps are significantly more accurate in terms of structural details and metric accuracy. The source code of this project is available at github. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.18103 [pdf, other]

Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow

Authors: Tian Guo, Emmanuel Hauptmann

Abstract: Large language models (LLMs) and their fine-tuning techniques have demonstrated superior performance in various language understanding and generation tasks. This paper explores fine-tuning LLMs for stock return forecasting with financial newsflow. In quantitative investing, return forecasting is fundamental for subsequent tasks like stock picking, portfolio optimization, etc. We formulate the mode… ▽ More Large language models (LLMs) and their fine-tuning techniques have demonstrated superior performance in various language understanding and generation tasks. This paper explores fine-tuning LLMs for stock return forecasting with financial newsflow. In quantitative investing, return forecasting is fundamental for subsequent tasks like stock picking, portfolio optimization, etc. We formulate the model to include text representation and forecasting modules. We propose to compare the encoder-only and decoder-only LLMs, considering they generate text representations in distinct ways. The impact of these different representations on forecasting performance remains an open question. Meanwhile, we compare two simple methods of integrating LLMs' token-level representations into the forecasting module. The experiments on real news and investment universes reveal that: (1) aggregated representations from LLMs' token-level embeddings generally produce return predictions that enhance the performance of long-only and long-short portfolios; (2) in the relatively large investment universe, the decoder LLMs-based prediction model leads to stronger portfolios, whereas in the small universes, there are no consistent winners. Among the three LLMs studied (DeBERTa, Mistral, Llama), Mistral performs more robustly across different universes; (3) return predictions derived from LLMs' text representations are a strong signal for portfolio construction, outperforming conventional sentiment scores. △ Less

Submitted 5 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.16931 [pdf, other]

ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering

Authors: Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Mingchen Zhuge, Jürgen Schmidhuber, Xin Gao, Xiangliang Zhang

Abstract: Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce S… ▽ More Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. We first address the issue of imbalanced label distribution by re-weighting the instance-wise loss based on the inverse frequency of each class, ensuring minority classes are not dominated by majority ones during optimization. Next, we utilize the unlabeled data to enrich the learning process, generating a variety of augmentations based on a SoftMix operation and ensuring their predictions align with the same target, i.e., pseudo-labels. To ensure the quality of the pseudo-labels, we propose a calibration procedure aimed at closely aligning the pseudo-label estimates of individual samples with a desired ground truth distribution. Experiments show that our QAMatch significantly outperforms the recent similar-scale baselines and Large Language Models (LLMs) not only on our ScholarChemQA dataset but also on four benchmark datasets. We hope our benchmark and model can facilitate and promote more research on chemical QA. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 14 pages

arXiv:2407.16569 [pdf]

Regulated magnetic anisotropy and charge density wave in uniformly fabricated Janus CrTeSe monolayer

Authors: Jin-Hua Nie, Cong Wang, Mao-Peng Miao, Kang-Di Niu, Tao Xie, Ting-Fei Guo, Wen-Hao Zhang, Chao-Fei Liu, Rui-Jing Sun, Jian-Wang Zhou, Jun-Hao Lin, Wei Ji, Ying-Shuang Fu

Abstract: Two-dimensional materials with Janus structure host novel physical properties due to their inversional symmetry breaking. However, it remains elusive to synthesize Janus monolayer crystals with tailored long-range magnetic orders. Here, we have developed a general method to fabricate uniform Janus CrTeSe monolayers by selective selenization of preformed CrTe2 monolayers with molecular beam epitaxy… ▽ More Two-dimensional materials with Janus structure host novel physical properties due to their inversional symmetry breaking. However, it remains elusive to synthesize Janus monolayer crystals with tailored long-range magnetic orders. Here, we have developed a general method to fabricate uniform Janus CrTeSe monolayers by selective selenization of preformed CrTe2 monolayers with molecular beam epitaxy. The uniform Janus structure of CrTeSe with high crystal quality is confirmed by high-resolution scanning transmission electron microscopy. Spin-polarized scanning tunneling microscopy/spectroscopy measurements unveil that the Janus CrTeSe undergoes a charge density wave (CDW) transition and a robust antiferromagnetic order. The magnetic anisotropy of CrTeSe is drastically altered compared to monolayer CrTe2 by the breaking symmetries induced from the Janus structure and the CDW transition, as is substantiated with first principles calculations. Our research achieves the construction of large-area Janus structures, and artificially tailors the electronic and magnetic properties of Janus systems at the two-dimensional limit. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 19 pages, 4 figures

arXiv:2407.15185 [pdf, other]

A Spatio-Temporal Approach with Self-Corrective Causal Inference for Flight Delay Prediction

Authors: Qihui Zhu, Shenwen Chen, Tong Guo, Yisheng Lv, Wenbo Du

Abstract: Accurate flight delay prediction is crucial for the secure and effective operation of the air traffic system. Recent advances in modeling inter-airport relationships present a promising approach for investigating flight delay prediction from the multi-airport scenario. However, the previous prediction works only accounted for the simplistic relationships such as traffic flow or geographical distan… ▽ More Accurate flight delay prediction is crucial for the secure and effective operation of the air traffic system. Recent advances in modeling inter-airport relationships present a promising approach for investigating flight delay prediction from the multi-airport scenario. However, the previous prediction works only accounted for the simplistic relationships such as traffic flow or geographical distance, overlooking the intricate interactions among airports and thus proving inadequate. In this paper, we leverage causal inference to precisely model inter-airport relationships and propose a self-corrective spatio-temporal graph neural network (named CausalNet) for flight delay prediction. Specifically, Granger causality inference coupled with a self-correction module is designed to construct causality graphs among airports and dynamically modify them based on the current airport's delays. Additionally, the features of the causality graphs are adaptively extracted and utilized to address the heterogeneity of airports. Extensive experiments are conducted on the real data of top-74 busiest airports in China. The results show that CausalNet is superior to baselines. Ablation studies emphasize the power of the proposed self-correction causality graph and the graph feature extraction module. All of these prove the effectiveness of the proposed methodology. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.13221 [pdf, other]

Multimodal Label Relevance Ranking via Reinforcement Learning

Authors: Taian Guo, Taolin Zhang, Haoqian Wu, Hanjun Li, Ruizhi Qiao, Xing Sun

Abstract: Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal label relevance ranking, named Label Relevance Ranking with Proximal Policy Optimization (LR\textsuperscript{2}PPO), which effectively discerns partial o… ▽ More Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal label relevance ranking, named Label Relevance Ranking with Proximal Policy Optimization (LR\textsuperscript{2}PPO), which effectively discerns partial order relations among labels. LR\textsuperscript{2}PPO first utilizes partial order pairs in the target domain to train a reward model, which aims to capture human preference intrinsic to the specific scenario. Furthermore, we meticulously design state representation and a policy loss tailored for ranking tasks, enabling LR\textsuperscript{2}PPO to boost the performance of label relevance ranking model and largely reduce the requirement of partial order annotation for transferring to new scenes. To assist in the evaluation of our approach and similar methods, we further propose a novel benchmark dataset, LRMovieNet, featuring multimodal labels and their corresponding partial order data. Extensive experiments demonstrate that our LR\textsuperscript{2}PPO algorithm achieves state-of-the-art performance, proving its effectiveness in addressing the multimodal label relevance ranking problem. Codes and the proposed LRMovieNet dataset are publicly available at \url{https://github.com/ChazzyGordon/LR2PPO}. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV2024

arXiv:2407.09057 [pdf, other]

PersonificationNet: Making customized subject act like a person

Authors: Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua

Abstract: Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a Personifi… ▽ More Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. It contains a customized branch, a pose condition branch and a structure alignment module. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage. Experimental results show our proposed PersonificationNet outperforms the state-of-the-art methods. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08586 [pdf, other]

Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, A. Angerami, K. Aoki, N. Apadula, Y. Aramaki, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, B. Bannier, K. N. Barish, B. Bassalleck, S. Bathe , et al. (377 additional authors not shown)

Abstract: The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability… ▽ More The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 401 authors from 75 institutions, 20 pages, 15 figures, 2 tables. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2407.02301 [pdf, other]

CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

Authors: Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao

Abstract: Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b… ▽ More Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to better align with the career trajectory of Chinese financial practitioners, we build a systematic evaluation from 4 first-level categories: (1) Financial Subject: whether LLMs can memorize the necessary basic knowledge of financial subjects, such as economics, statistics and auditing. (2) Financial Qualification: whether LLMs can obtain the needed financial qualified certifications, such as certified public accountant, securities qualification and banking qualification. (3) Financial Practice: whether LLMs can fulfill the practical financial jobs, such as tax consultant, junior accountant and securities analyst. (4) Financial Law: whether LLMs can meet the requirement of financial laws and regulations, such as tax law, insurance law and economic law. CFinBench comprises 99,100 questions spanning 43 second-level categories with 3 question types: single-choice, multiple-choice and judgment. We conduct extensive experiments of 50 representative LLMs with various model size on CFinBench. The results show that GPT4 and some Chinese-oriented models lead the benchmark, with the highest average accuracy being 60.16%, highlighting the challenge presented by CFinBench. The dataset and evaluation code are available at https://cfinbench.github.io/. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00468 [pdf, other]

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial performance, undermining the credibility of these evaluations. To address this issue while maintaining the efficiency of MCQ evaluations, we propose MMEvalPro, a benchmark designed to avoid Type-I errors through a trilogy evaluation pipeline and more rigorous metrics. For each original question from existing benchmarks, human annotators augment it by creating one perception question and one knowledge anchor question through a meticulous annotation process. MMEvalPro comprises $2,138$ question triplets, totaling $6,414$ distinct questions. Two-thirds of these questions are manually labeled by human experts, while the rest are sourced from existing benchmarks (MMMU, ScienceQA, and MathVista). Compared with the existing benchmarks, our experiments with the latest LLMs and LMMs demonstrate that MMEvalPro is more challenging (the best LMM lags behind human performance by $31.73\%$, compared to an average gap of $8.03\%$ in previous benchmarks) and more trustworthy (the best LLM trails the best LMM by $23.09\%$, whereas the gap for previous benchmarks is just $14.64\%$). Our in-depth analysis explains the reason for the large performance gap and justifies the trustworthiness of evaluation, underscoring its significant potential for advancing future research. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

arXiv:2407.00466 [pdf, other]

BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an AI Scientist perspective remains largely unexplored. To this end, we draw inspiration from one most important abilities of scientists, understanding the literature, and introduce BioKGBench. In contrast to traditional evaluation benchmark that only focuses on factual QA, where the LLMs are known to have hallucination issues, we first disentangle "Understanding Literature" into two atomic abilities, i) "Understanding" the unstructured text from research papers by performing scientific claim verification, and ii) Ability to interact with structured Knowledge-Graph Question-Answering (KGQA) as a form of "Literature" grounding. We then formulate a novel agent task, dubbed KGCheck, using KGQA and domain-based Retrieval-Augmented Generation (RAG) to identify the factual errors of existing large-scale knowledge graph databases. We collect over two thousand data for two atomic tasks and 225 high-quality annotated data for the agent task. Surprisingly, we discover that state-of-the-art agents, both daily scenarios and biomedical ones, have either failed or inferior performance on our benchmark. We then introduce a simple yet effective baseline, dubbed BKGAgent. On the widely used popular knowledge graph, we discover over 90 factual errors which provide scenarios for agents to make discoveries and demonstrate the effectiveness of our approach. The code and data are available at https://github.com/westlake-autolab/BioKGBench. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17431 [pdf, other]

A Large-scale Investigation of Semantically Incompatible APIs behind Compatibility Issues in Android Apps

Authors: Shidong Pan, Tianchen Guo, Lihong Zhang, Pei Liu, Zhenchang Xing, Xiaoyu Sun

Abstract: Application Programming Interface (API) incompatibility is a long-standing issue in Android application development. The rapid evolution of Android APIs results in a significant number of API additions, removals, and changes between adjacent versions. Unfortunately, this high frequency of alterations may lead to compatibility issues, often without adequate notification to developers regarding thes… ▽ More Application Programming Interface (API) incompatibility is a long-standing issue in Android application development. The rapid evolution of Android APIs results in a significant number of API additions, removals, and changes between adjacent versions. Unfortunately, this high frequency of alterations may lead to compatibility issues, often without adequate notification to developers regarding these changes. Although researchers have proposed some work on detecting compatibility issues caused by changes in API signatures, they often overlook compatibility issues stemming from sophisticated semantic changes. In response to this challenge, we conducted a large-scale discovery of incompatible APIs in the Android Open Source Project (AOSP) by leveraging static analysis and pre-trained Large Language Models (LLMs) across adjacent versions. We systematically formulate the problem and propose a unified framework to detect incompatible APIs, especially for semantic changes. It's worth highlighting that our approach achieves a 0.83 F1-score in identifying semantically incompatible APIs in the Android framework. Ultimately, our approach detects 5,481 incompatible APIs spanning from version 4 to version 33. We further demonstrate its effectiveness in supplementing the state-of-the-art methods in detecting a broader spectrum of compatibility issues (+92.3%) that have been previously overlooked. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.05355 [pdf, other]

Revisit to the WGVC schemes: a nonlinear order-preserving and spectral-property-optimized methodology and its enhancement

Authors: Kang He, Hongwei Liu, Tongbiao Guo, Xinliang Li, Zhiwei He

Abstract: The numerical simulation of supersonic complex flow problems demands capabilities in identifying multiscale structures and capturing shocks, imposing stringent requirements on the numerical scheme. The capability to identify multiscale structures is closely related to the spectral properties of the numerical scheme. Currently, existing methods to improve the spectral properties of finite differenc… ▽ More The numerical simulation of supersonic complex flow problems demands capabilities in identifying multiscale structures and capturing shocks, imposing stringent requirements on the numerical scheme. The capability to identify multiscale structures is closely related to the spectral properties of the numerical scheme. Currently, existing methods to improve the spectral properties of finite difference schemes face shortcomings such as parallel difficulties (compact methods) or introducing unnecessary dispersion errors at low wavenumbers due to accuracy loss (spectral-like optimization methods). In this paper, we proposed an order-preserving spectral properties optimization method based on the group velocity control theory: the weighted group velocity control (WGVC) scheme. This method, centered around the concept of group velocity, achieves low-wavenumber accuracy control and mid-wavenumber group velocity control by designing smoothness indicators and nonlinear weighting approach for wave packets. Furthermore, by embedding the WGVC scheme into shock-capturing schemes such as WENO/TENO scheme, we not only preserve the spectral properties of the WGVC scheme at medium to low wavenumbers but also enhance the shock-capturing capability of the scheme. Theoretical and numerical experiments verify that the new method has advantages such as order-preserving, small dispersion and dissipation errors, and is very suitable for numerical simulation of complex flow problems such as turbulence-shock boundary layer interactions. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.01451 [pdf, other]

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

Abstract: In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarca… ▽ More In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarcation, to improve the accuracy of these pseudo-labels. Within SemiRES, we offer two alternative matching strategies: IoU-based Optimal Matching (IOM) and Composite Parts Integration (CPI). These strategies are designed to extract the most accurate masks from SAM's output, thus guiding the training of the student model with enhanced precision. In instances where a precise mask cannot be matched from the available candidates, we develop the Pixel-Wise Adjustment (PWA) strategy, guiding the student model's training directly by the pseudo-labels. Extensive experiments on three RES benchmarks--RefCOCO, RefCOCO+, and G-Ref reveal its superior performance compared to fully supervised methods. Remarkably, with only 1% labeled data, our SemiRES outperforms the supervised baseline by a large margin, e.g. +18.64% gains on RefCOCO val set. The project code is available at \url{https://github.com/nini0919/SemiRES}. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted by ICML2024

arXiv:2406.01414 [pdf, other]

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

Authors: Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

Abstract: This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le… ▽ More This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For open-domain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO2. On ImageNet, our searched model achieves 80.6% top-1 accuracy with a 0.78 ms TensorRT latency using FP16 on NVIDIA V100, consuming only 909.86 lbs of CO2, making it comparable to other one-shot-based NAS baselines. △ Less

Submitted 17 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.04131

arXiv:2406.00291 [pdf, other]

Multi-Objective Neural Architecture Search by Learning Search Space Partitions

Authors: Yiyang Zhao, Linnan Wang, Tian Guo

Abstract: Deploying deep learning models requires taking into consideration neural network metrics such as model size, inference latency, and #FLOPs, aside from inference accuracy. This results in deep learning model designers leveraging multi-objective optimization to design effective deep neural networks in multiple criteria. However, applying multi-objective optimizations to neural architecture search (N… ▽ More Deploying deep learning models requires taking into consideration neural network metrics such as model size, inference latency, and #FLOPs, aside from inference accuracy. This results in deep learning model designers leveraging multi-objective optimization to design effective deep neural networks in multiple criteria. However, applying multi-objective optimizations to neural architecture search (NAS) is nontrivial because NAS tasks usually have a huge search space, along with a non-negligible searching cost. This requires effective multi-objective search algorithms to alleviate the GPU costs. In this work, we implement a novel multi-objectives optimizer based on a recently proposed meta-algorithm called LaMOO on NAS tasks. In a nutshell, LaMOO speedups the search process by learning a model from observed samples to partition the search space and then focusing on promising regions likely to contain a subset of the Pareto frontier. Using LaMOO, we observe an improvement of more than 200% sample efficiency compared to Bayesian optimization and evolutionary-based multi-objective optimizers on different NAS datasets. For example, when combined with LaMOO, qEHVI achieves a 225% improvement in sample efficiency compared to using qEHVI alone in NasBench201. For real-world tasks, LaMOO achieves 97.36% accuracy with only 1.62M #Params on CIFAR10 in only 600 search samples. On ImageNet, our large model reaches 80.4% top-1 accuracy with only 522M #FLOPs. △ Less

Submitted 17 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

Journal ref: Journal of Machine Learning Research 25 (2024) 1-41

arXiv:2405.09180 [pdf]

doi 10.1038/s41467-024-48224-1

Integrated and DC-powered superconducting microcomb

Authors: Chen-Guang Wang, Wuyue Xu, Chong Li, Lili Shi, Junliang Jiang, Tingting Guo, Wen-Cheng Yue, Tianyu Li, Ping Zhang, Yang-Yang Lyu, Jiazheng Pan, Xiuhao Deng, Ying Dong, Xuecou Tu, Sining Dong, Chunhai Cao, Labao Zhang, Xiaoqing Jia, Guozhu Sun, Lin Kang, Jian Chen, Yong-Lei Wang, Huabing Wang, Peiheng Wu

Abstract: Frequency combs, specialized laser sources emitting multiple equidistant frequency lines, have revolutionized science and technology with unprecedented precision and versatility. Recently, integrated frequency combs are emerging as scalable solutions for on-chip photonics. Here, we demonstrate a fully integrated superconducting microcomb that is easy to manufacture, simple to operate, and consumes… ▽ More Frequency combs, specialized laser sources emitting multiple equidistant frequency lines, have revolutionized science and technology with unprecedented precision and versatility. Recently, integrated frequency combs are emerging as scalable solutions for on-chip photonics. Here, we demonstrate a fully integrated superconducting microcomb that is easy to manufacture, simple to operate, and consumes ultra-low power. Our turnkey apparatus comprises a basic nonlinear superconducting device, a Josephson junction, directly coupled to a superconducting microstrip resonator. We showcase coherent comb generation through self-started mode-locking. Therefore, comb emission is initiated solely by activating a DC bias source, with power consumption as low as tens of picowatts. The resulting comb spectrum resides in the microwave domain and spans multiple octaves. The linewidths of all comb lines can be narrowed down to 1 Hz through a unique coherent injection-locking technique. Our work represents a critical step towards fully integrated microwave photonics and offers the potential for integrated quantum processors. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Journal ref: Nature Communications 15, 4009 (2024)

arXiv:2405.09170 [pdf]

doi 10.1088/1674-1056/ad2f21

Tunable superconducting resonators via on-chip control of local magnetic field

Authors: Chen-Guang Wang, Wen-Cheng Yue, Xuecou Tu, Tianyuan Chi, Tingting Guo, Yang-Yang Lyu, Sining Dong, Chunhai Cao, Labao Zhang, Xiaoqing Jia, Guozhu Sun, Lin Kang, Jian Chen, Yong-Lei Wang, Huabing Wang, Peiheng Wu

Abstract: Superconducting microwave resonators play a pivotal role in superconducting quantum circuits. The ability to fine-tune their resonant frequencies provides enhanced control and flexibility. Here, we introduce a frequency-tunable superconducting coplanar waveguide resonator. By applying electrical currents through specifically designed ground wires, we achieve the generation and control of a localiz… ▽ More Superconducting microwave resonators play a pivotal role in superconducting quantum circuits. The ability to fine-tune their resonant frequencies provides enhanced control and flexibility. Here, we introduce a frequency-tunable superconducting coplanar waveguide resonator. By applying electrical currents through specifically designed ground wires, we achieve the generation and control of a localized magnetic field on the central line of the resonator, enabling continuous tuning of its resonant frequency. We demonstrate a frequency tuning range of 54.85 MHz in a 6.21 GHz resonator. This integrated and tunable resonator holds great potential as a dynamically tunable filter and as a key component of communication buses and memory elements in superconducting quantum computing. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Journal ref: Chin. Phys. B 33, 058402 (2024)

arXiv:2405.07303 [pdf, other]

Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment

Authors: L. T. Yang, S. K. Liu, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio… ▽ More We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.00437 [pdf, other]

Reduced-order modeling for second-order computational homogenization with applications to geometrically parameterized elastomeric metamaterials

Authors: T. Guo, V. G. Kouznetsova, M. G. D. Geers, K. Veroy, O. Rokoš

Abstract: The structural properties of mechanical metamaterials are typically studied with two-scale methods based on computational homogenization. Because such materials have a complex microstructure, enriched schemes such as second-order computational homogenization are required to fully capture their non-linear behavior, which arises from non-local interactions due to the buckling or patterning of the mi… ▽ More The structural properties of mechanical metamaterials are typically studied with two-scale methods based on computational homogenization. Because such materials have a complex microstructure, enriched schemes such as second-order computational homogenization are required to fully capture their non-linear behavior, which arises from non-local interactions due to the buckling or patterning of the microstructure. In the two-scale formulation, the effective behavior of the microstructure is captured with a representative volume element (RVE), and a homogenized effective continuum is considered on the macroscale. Although an effective continuum formulation is introduced, solving such two-scale models concurrently is still computationally demanding due to the many repeated solutions for each RVE at the microscale level. In this work, we propose a reduced-order model for the microscopic problem arising in second-order computational homogenization, using proper orthogonal decomposition and a novel hyperreduction method that is specifically tailored for this problem and inspired by the empirical cubature method. Two numerical examples are considered, in which the performance of the reduced-order model is carefully assessed by comparing its solutions with direct numerical simulations (entirely resolving the underlying microstructure) and the full second-order computational homogenization model. The reduced-order model is able to approximate the result of the full computational homogenization well, provided that the training data is representative for the problem at hand. Any remaining errors, when compared with the direct numerical simulation, can be attributed to the inherent approximation errors in the computational homogenization scheme. Regarding run times for one thread, speed-ups on the order of 100 are achieved with the reduced-order model as compared to direct numerical simulations. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.19383 [pdf, other]

Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition

Authors: Zhendong Liu, Haifeng Xia, Tong Guo, Libo Sun, Ming Shao, Siyu Xia

Abstract: Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purpos… ▽ More Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purposes. Although the global features learned through the procedure are suitable for the general classification, they have difficulty capturing fine-grained action change across adjacent frames -- decisive factors in sports actions. In this paper, we propose a novel ``Cross-block Fine-grained Semantic Cascade (CFSC)'' module to overcome this challenge. In summary, the proposed CFSC progressively integrates shallow visual knowledge into high-level blocks to allow networks to focus on action details. In particular, the CFSC module utilizes the GCN feature maps produced at different levels, as well as aggregated features from proceeding levels to consolidate fine-grained features. In addition, a dedicated temporal convolution is applied at each level to learn short-term temporal features, which will be carried over from shallow to deep layers to maximize the leverage of low-level details. This cross-block feature aggregation methodology, capable of mitigating the loss of fine-grained information, has resulted in improved performance. Last, FD-7, a new action recognition dataset for fencing sports, was collected and will be made publicly available. Experimental results and empirical analysis on public benchmarks (FSD-10) and self-collected (FD-7) demonstrate the advantage of our CFSC module on learning discriminative patterns for action classification over others. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.15746 [pdf, other]

Collaborative Heterogeneous Causal Inference Beyond Meta-analysis

Authors: Tianyu Guo, Sai Praneeth Karimireddy, Michael I. Jordan

Abstract: Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on th… ▽ More Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on the concept of traditional meta-analysis after adjusting for the distribution shift. In this work, we propose a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. To account for the vulnerable density estimation, we further discuss the double machine method and show the possibility of using nonparametric density estimation with d<8 and a flexible machine learning method to guarantee asymptotic normality. We propose a federated learning algorithm to collaboratively train the outcome model while preserving privacy. Using synthetic and real datasets, we demonstrate the advantages of our method. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: submitted to ICML

arXiv:2404.09793 [pdf, other]

First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment

Authors: J. X. Liu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne… ▽ More We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.06901 [pdf]

Multi-interface engineering to realize all-solution processed highly efficient Kesterite solar cells

Authors: Licheng Lou, Kang Yin, Jinlin Wang, Yuan Li, Xiao Xu, Bowen Zhang, Menghan Jiao, Shudan Chen, Tan Guo, Jiangjian Shi, Huijue Wu, Yanhong Luo, Dongmei Li, Qingbo Meng

Abstract: With the rapid development of Kesterite Cu2ZnSn(S, Se)4 solar cells in the past few years, how to achieve higher cost-performance ratio has become an important topic in the future development and industrialization of this technology. Herein, we demonstrate an all-solution route for the cell fabrication, in particular targeting at the solution processed window layer comprised of ZnO nanoparticles/A… ▽ More With the rapid development of Kesterite Cu2ZnSn(S, Se)4 solar cells in the past few years, how to achieve higher cost-performance ratio has become an important topic in the future development and industrialization of this technology. Herein, we demonstrate an all-solution route for the cell fabrication, in particular targeting at the solution processed window layer comprised of ZnO nanoparticles/Ag nanowires. A multi-interface engineering strategy assisted by organic polymers and molecules is explored to synergistically improve the film deposition, passivate the surface defects and facilitate the charge transfer. These efforts help us achieve high-performance and robust Kesterite solar cells at extremely low time and energy costs, with efficiency records of 14.37% and 13.12% being realized in rigid and flexible Kesterite solar cells, respectively. Our strategy here is also promising to be transplanted into other solar cells with similar geometric and energy band structures, helping reduce production costs and shorten the production cycle (i.e. increasing production capacity) of these photovoltaic industries. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05105 [pdf, other]

VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

Authors: Ziyang Wang, Jian-Qing Zheng, Chao Ma, Tao Guo

Abstract: Image registration, a critical process in medical imaging, involves aligning different sets of medical imaging data into a single unified coordinate system. Deep learning networks, such as the Convolutional Neural Network (CNN)-based VoxelMorph, Vision Transformer (ViT)-based TransMorph, and State Space Model (SSM)-based MambaMorph, have demonstrated effective performance in this domain. The recen… ▽ More Image registration, a critical process in medical imaging, involves aligning different sets of medical imaging data into a single unified coordinate system. Deep learning networks, such as the Convolutional Neural Network (CNN)-based VoxelMorph, Vision Transformer (ViT)-based TransMorph, and State Space Model (SSM)-based MambaMorph, have demonstrated effective performance in this domain. The recent Visual State Space Model (VMamba), which incorporates a cross-scan module with SSM, has exhibited promising improvements in modeling global-range dependencies with efficient computational cost in computer vision tasks. This paper hereby introduces an exploration of VMamba with image registration, named VMambaMorph. This novel hybrid VMamba-CNN network is designed specifically for 3D image registration. Utilizing a U-shaped network architecture, VMambaMorph computes the deformation field based on target and source volumes. The VMamba-based block with 2D cross-scan module is redesigned for 3D volumetric feature processing. To overcome the complex motion and structure on multi-modality images, we further propose a fine-tune recursive registration framework. We validate VMambaMorph using a public benchmark brain MR-CT registration dataset, comparing its performance against current state-of-the-art methods. The results indicate that VMambaMorph achieves competitive registration quality. The code for VMambaMorph with all baseline methods is available on GitHub. △ Less

Submitted 14 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.00924 [pdf, other]

BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

Authors: Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

Abstract: Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in th… ▽ More Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in the black-box scenario. In this work, we introduce the first unified black-box adversarial patch attack framework against pixel-wise regression tasks, aiming to identify the vulnerabilities of these models under query-based black-box attacks. We propose a novel square-based adversarial patch optimization framework and employ probabilistic square sampling and score-based gradient estimation techniques to generate the patch effectively and efficiently, overcoming the scalability problem of previous black-box patch attacks. Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models. BadPart surpasses 3 baseline methods in terms of both attack performance and efficiency. We also apply BadPart on the Google online service for portrait depth estimation, causing 43.5% relative distance error with 50K queries. State-of-the-art (SOTA) countermeasures cannot defend our attack effectively. △ Less

Submitted 24 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: Paper accepted at ICML 2024

arXiv:2403.20276 [pdf, other]

Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment

Authors: R. Xu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to… ▽ More We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures

arXiv:2403.20263 [pdf, other]

Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment

Authors: Z. H. Zhang, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses ran… ▽ More Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China Jinping Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 8 pages, 6 figures

arXiv:2403.10825 [pdf, other]

Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

Authors: Wei Zhang, Feng Qiu, Chen Liu, Lincheng Li, Heming Du, Tiancheng Guo, Xin Yu

Abstract: Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments, the 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic… ▽ More Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments, the 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets to set up five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation. In this paper, we present our method designs for the five tasks. Specifically, our design mainly includes three aspects: 1) Utilizing a transformer-based feature fusion module to fully integrate emotional information provided by audio signals, visual images, and transcripts, offering high-quality expression features for the downstream tasks. 2) To achieve high-quality facial feature representations, we employ Masked-Auto Encoder as the visual features extraction model and fine-tune it with our facial dataset. 3) Considering the complexity of the video collection scenes, we conduct a more detailed dataset division based on scene characteristics and train the classifier for each scene. Extensive experiments demonstrate the superiority of our designs. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 11 pages, 1 figure

arXiv:2403.07608 [pdf, other]

Couler: Unified Machine Learning Workflow Optimization in Cloud

Authors: Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang, Jingji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang

Abstract: Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, num… ▽ More Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, numerous workflow engines are available (with over ten being widely recognized). This variety poses a challenge for end-users in terms of mastering different engine APIs. While efforts have primarily focused on optimizing ML Operations (MLOps) for a specific workflow engine, current methods largely overlook workflow optimization across different engines. In this work, we design and implement Couler, a system designed for unified ML workflow optimization in the cloud. Our main insight lies in the ability to generate an ML workflow using natural language (NL) descriptions. We integrate Large Language Models (LLMs) into workflow generation, and provide a unified programming interface for various workflow engines. This approach alleviates the need to understand various workflow engines' APIs. Moreover, Couler enhances workflow computation efficiency by introducing automated caching at multiple stages, enabling large workflow auto-parallelization and automatic hyperparameters tuning. These enhancements minimize redundant computational costs and improve fault tolerance during deep learning workflow training. Couler is extensively deployed in real-world production scenarios at Ant Group, handling approximately 22k workflows daily, and has successfully improved the CPU/Memory utilization by more than 15% and the workflow completion rate by around 17%. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06243 [pdf, other]

BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering

Authors: Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie

Abstract: Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for… ▽ More Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for a compact representation beyond pixel values to advance BVD research and applications. Inspired by the classic scale-time equalization (STE), our work introduces the histogram-assisted solution, called BlazeBVD, for high-fidelity and rapid BVD. Compared with STE, which directly corrects pixel values by temporally smoothing color histograms, BlazeBVD leverages smoothed illumination histograms within STE filtering to ease the challenge of learning temporal data using neural networks. In technique, BlazeBVD begins by condensing pixel values into illumination histograms that precisely capture flickering and local exposure variations. These histograms are then smoothed to produce singular frames set, filtered illumination maps, and exposure maps. Resorting to these deflickering priors, BlazeBVD utilizes a 2D network to restore faithful and consistent texture impacted by lighting changes or localized exposure issues. BlazeBVD also incorporates a lightweight 3D network to amend slight temporal inconsistencies, avoiding the resource consumption issue. Comprehensive experiments on synthetic, real-world and generated videos, showcase the superior qualitative and quantitative results of BlazeBVD, achieving inference speeds up to 10x faster than state-of-the-arts. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.01960 [pdf, other]

A robust audio deepfake detection system via multi-view feature

Authors: Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Abstract: With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, includi… ▽ More With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, including various handcrafted features and learning-based features. Experiments show that learning-based audio features pretrained on a large amount of data generalize better than hand-crafted features on out-of-domain scenarios. Subsequently, we further improve the generalizability of the ADD system using proposed multi-feature approaches to incorporate complimentary information from features of different views. The model trained on ASV2019 data achieves an equal error rate of 24.27\% on the In-the-Wild dataset. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 5 pages, 2 figures

arXiv:2403.00818 [pdf, other]

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

Authors: Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang

Abstract: Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to en… ▽ More Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by DenseRetNet outperforming the original RetNet with up to 5% accuracy improvement on public benchmarks. code is avalaible at https://github.com/WailordHe/DenseSSM △ Less

Submitted 5 March, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

arXiv:2402.18679 [pdf, other]

Data Interpreter: An LLM Agent For Data Science

Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu

Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de… ▽ More Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at https://github.com/geekan/MetaGPT. △ Less

Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17487 [pdf, other]

Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

Authors: Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Jing Wang, Elena Alshina, Andre Kaup

Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties… ▽ More The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties evoked the attention of both scientific and industrial communities, resulting in the standardization activity JPEG-AI. The verification model for the standardization process of JPEG-AI is already in development and has surpassed the advanced VVC intra codec. To generate reconstructed images with the desired bits per pixel and assess the BD-rate performance of both the JPEG-AI verification model and VVC intra, bit rate matching is employed. However, the current state of the JPEG-AI verification model experiences significant slowdowns during bit rate matching, resulting in suboptimal performance due to an unsuitable model. The proposed methodology offers a gradual algorithmic optimization for matching bit rates, resulting in a fourfold acceleration and over 1% improvement in BD-rate at the base operation point. At the high operation point, the acceleration increases up to sixfold. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted at (IEEE) PCS 2024; 6 pages

arXiv:2402.17364 [pdf, other]

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

Authors: Zicheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang

Abstract: Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we i… ▽ More Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks to ensure geometric consistency across various motions and viewpoints. DynTet is parameterized by the coordinate-based networks which learn signed distance, deformation, and material texture, anchoring the training data into a predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently decodes textured meshes with a consistent topology, enabling fast rendering through a differentiable rasterizer and supervision via a pixel loss. To enhance training efficiency, we incorporate classical 3D Morphable Models to facilitate geometry learning and define a canonical space for simplifying texture learning. These advantages are readily achievable owing to the effective geometric representation employed in DynTet. Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics. Beyond producing stable and visually appealing synthesis videos, our method also outputs the dynamic meshes which is promising to enable many emerging applications. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: CVPR 2024

arXiv:2402.15326 [pdf, other]

Understanding Oversmoothing in Diffusion-Based GNNs From the Perspective of Operator Semigroup Theory

Authors: Weichen Zhao, Chenguang Wang, Xinyan Wang, Congying Han, Tiande Guo, Tianshu Yu

Abstract: This paper presents a novel study of the oversmoothing issue in diffusion-based Graph Neural Networks (GNNs). Diverging from extant approaches grounded in random walk analysis or particle systems, we approach this problem through operator semigroup theory. This theoretical framework allows us to rigorously prove that oversmoothing is intrinsically linked to the ergodicity of the diffusion operator… ▽ More This paper presents a novel study of the oversmoothing issue in diffusion-based Graph Neural Networks (GNNs). Diverging from extant approaches grounded in random walk analysis or particle systems, we approach this problem through operator semigroup theory. This theoretical framework allows us to rigorously prove that oversmoothing is intrinsically linked to the ergodicity of the diffusion operator. This finding further poses a general and mild ergodicity-breaking condition, encompassing the various specific solutions previously offered, thereby presenting a more universal and theoretically grounded approach to mitigating oversmoothing in diffusion-based GNNs. Additionally, we offer a probabilistic interpretation of our theory, forging a link with prior works and broadening the theoretical horizon. Our experimental results reveal that this ergodicity-breaking term effectively mitigates oversmoothing measured by Dirichlet energy, and simultaneously enhances performance in node classification tasks. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14359 [pdf, other]

Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark

Authors: Xiuying Chen, Tairan Wang, Qingqing Zhu, Taicheng Guo, Shen Gao, Zhiyong Lu, Xin Gao, Xiangliang Zhang

Abstract: The summarization capabilities of pretrained and large language models (LLMs) have been widely validated in general areas, but their use in scientific corpus, which involves complex sentences and specialized knowledge, has been less assessed. This paper presents conceptual and experimental analyses of scientific summarization, highlighting the inadequacies of traditional evaluation methods, such a… ▽ More The summarization capabilities of pretrained and large language models (LLMs) have been widely validated in general areas, but their use in scientific corpus, which involves complex sentences and specialized knowledge, has been less assessed. This paper presents conceptual and experimental analyses of scientific summarization, highlighting the inadequacies of traditional evaluation methods, such as $n$-gram, embedding comparison, and QA, particularly in providing explanations, grasping scientific concepts, or identifying key content. Subsequently, we introduce the Facet-aware Metric (FM), employing LLMs for advanced semantic matching to evaluate summaries based on different aspects. This facet-aware approach offers a thorough evaluation of abstracts by decomposing the evaluation task into simpler subtasks.Recognizing the absence of an evaluation benchmark in this domain, we curate a Facet-based scientific summarization Dataset (FD) with facet-level annotations. Our findings confirm that FM offers a more logical approach to evaluating scientific summaries. In addition, fine-tuned smaller models can compete with LLMs in scientific contexts, while LLMs have limitations in learning from in-context information in scientific domains. This suggests an area for future enhancement of LLMs. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 14pages

arXiv:2402.11768 [pdf, other]

Targeted Parallelization of Conflict-Based Search for Multi-Robot Path Planning

Authors: Teng Guo, Jingjin Yu

Abstract: Multi-Robot Path Planning (MRPP) on graphs, equivalently known as Multi-Agent Path Finding (MAPF), is a well-established NP-hard problem with critically important applications. As serial computation in (near)-optimally solving MRPP approaches the computation efficiency limit, parallelization offers a promising route to push the limit further, especially in handling hard or large MRPP instances. In… ▽ More Multi-Robot Path Planning (MRPP) on graphs, equivalently known as Multi-Agent Path Finding (MAPF), is a well-established NP-hard problem with critically important applications. As serial computation in (near)-optimally solving MRPP approaches the computation efficiency limit, parallelization offers a promising route to push the limit further, especially in handling hard or large MRPP instances. In this study, we initiated a \emph{targeted} parallelization effort to boost the performance of conflict-based search for MRPP. Specifically, when instances are relatively small but robots are densely packed with strong interactions, we apply a decentralized parallel algorithm that concurrently explores multiple branches that leads to markedly enhanced solution discovery. On the other hand, when instances are large with sparse robot-robot interactions, we prioritize node expansion and conflict resolution. Our innovative multi-threaded approach to parallelizing bounded-suboptimal conflict search-based algorithms demonstrates significant improvements over baseline serial methods in success rate or runtime. Our contribution further pushes the understanding of MRPP and charts a promising path for elevating solution quality and computational efficiency through parallel algorithmic strategies. △ Less

Submitted 15 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Submitted to IROS

Showing 1–50 of 383 results for author: Guo, T