Search | arXiv e-print repository

arXiv:2406.19907 [pdf, ps, other]

Global well-posedness of inhomogeneous Navier-Stokes equations with bounded density

Authors: Tiantian Hao, Feng Shao, Dongyi Wei, Zhifei Zhang

Abstract: In this paper, we solve Lions' open problem: {\it the uniqueness of weak solutions for the 2-D inhomogeneous Navier-Stokes equations (INS)}. We first prove the global existence of weak solutions to 2-D (INS) with bounded initial density and initial velocity in $L^2(\mathbb R^2)$. Moreover, if the initial density is bounded away from zero, then our weak solution equals to Lions' weak solution, whic… ▽ More In this paper, we solve Lions' open problem: {\it the uniqueness of weak solutions for the 2-D inhomogeneous Navier-Stokes equations (INS)}. We first prove the global existence of weak solutions to 2-D (INS) with bounded initial density and initial velocity in $L^2(\mathbb R^2)$. Moreover, if the initial density is bounded away from zero, then our weak solution equals to Lions' weak solution, which in particular implies the uniqueness of Lions' weak solution. We also extend a celebrated result by Fujita and Kato on the 3-D incompressible Navier-Stokes equations to 3-D (INS): {\it the global well-posedness of 3-D (INS) with bounded initial density and initial velocity being small in $\dot H^{1/2}(\mathbb R^3)$}. The proof of the uniqueness is based on a surprising finding that the estimate $t^{1/2}\nabla u\in L^2(0,T; L^\infty(\mathbb R^d))$ instead of $\nabla u\in L^1(0, T; L^\infty(\mathbb R^d))$ is enough to ensure the uniqueness of the solution. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 23 pages

arXiv:2406.16269 [pdf, other]

Displaced Heavy Neutral Lepton from New Higgs Doublet

Authors: Fa-Xin Yang, Feng-Lan Shao, Zhi-Long Han, Yi Jin, Honglei Li

Abstract: Heavy neutral leptons $N$ are introduced to explain the tiny neutrino masses via the seesaw mechanism. For proper small mixing parameter $V_{\ell N}$, the heavy neutral leptons $N$ become long-lived, which leads to the displaced vertex signature at colliders. In this paper, we consider the displaced heavy neutral lepton from the neutrinophilic Higgs doublet $Φ_ν$ decay. The new Higgs doublet with… ▽ More Heavy neutral leptons $N$ are introduced to explain the tiny neutrino masses via the seesaw mechanism. For proper small mixing parameter $V_{\ell N}$, the heavy neutral leptons $N$ become long-lived, which leads to the displaced vertex signature at colliders. In this paper, we consider the displaced heavy neutral lepton from the neutrinophilic Higgs doublet $Φ_ν$ decay. The new Higgs doublet with MeV scale VEV can naturally explain the tiny neutrino masses with TeV scale $N$. Different from current experimental searches via the $W^\pm\to \ell^\pm N$ decay, the new decays as $H^\pm\to \ell^\pm N$ are not suppressed by the small mixing parameter $V_{\ell N}$. Therefore, a larger parameter space is expected to be detected at colliders. We then investigate the promising region at the 14 TeV HL-LHC and the 3 TeV CLIC. According to our simulation, the DV signature could probe $|V_{\ell N}|^2\gtrsim10^{-19}$ with $m_N<m_{H^+}$, which covers the seesaw predicted value $|V_{\ell N}|^2\sim m_ν/m_N$. We could probe $m_{H^+}\lesssim1200$ GeV at the 14 TeV HL-LHC and $m_{H^+}\lesssim1490$ GeV at the 3 TeV CLIC. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 24 pages, 11 figures

arXiv:2406.07984 [pdf, ps, other]

On the density patch problem for the 2-D inhomogeneous Navier-Stokes equations

Authors: Tiantian Hao, Feng Shao, Dongyi Wei, Zhifei Zhang

Abstract: In this paper, we first construct a class of global strong solutions for the 2-D inhomogeneous Navier-Stokes equations under very general assumption that the initial density is only bounded and the initial velocity is in $H^1(\mathbb{R}^2)$. With suitable assumptions on the initial density, which includes the case of density patch and vacuum bubbles, we prove that Lions' s weak solution is the sam… ▽ More In this paper, we first construct a class of global strong solutions for the 2-D inhomogeneous Navier-Stokes equations under very general assumption that the initial density is only bounded and the initial velocity is in $H^1(\mathbb{R}^2)$. With suitable assumptions on the initial density, which includes the case of density patch and vacuum bubbles, we prove that Lions' s weak solution is the same as the strong solution with the same initial data. In particular, this gives a complete resolution of the density patch problem proposed by Lions: {\it for the density patch data $ρ_0=1_{D}$ with a smooth bounded domain $D\subset\mathbb{R}^2$, the regularity of $D$ is preserved by the time evolution of Lions's weak solution.} △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 23 pages

arXiv:2405.19674 [pdf, ps, other]

On blow-up for the supercritical defocusing nonlinear wave equation

Authors: Feng Shao, Dongyi Wei, Zhifei Zhang

Abstract: In this paper, we consider the defocusing nonlinear wave equation $-\partial_t^2u+Δu=|u|^{p-1}u$ in $\mathbb R\times \mathbb R^d$. Building on our companion work ({\it \small Self-similar imploding solutions of the relativistic Euler equations}), we prove that for $d=4, p\geq 29$ and $d\geq 5, p\geq 17$, there exists a smooth complex-valued solution that blows up in finite time. In this paper, we consider the defocusing nonlinear wave equation $-\partial_t^2u+Δu=|u|^{p-1}u$ in $\mathbb R\times \mathbb R^d$. Building on our companion work ({\it \small Self-similar imploding solutions of the relativistic Euler equations}), we prove that for $d=4, p\geq 29$ and $d\geq 5, p\geq 17$, there exists a smooth complex-valued solution that blows up in finite time. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 56 pages

arXiv:2404.08216 [pdf, other]

Role of nonlocal heat transport on the laser ablative Rayleigh-Taylor instability

Authors: Z. H. Chen, X. H. Yang, G. B. Zhang, Y. Y. Ma, R. Yan, H. Xu, Z. M. Sheng, F. Q. Shao, J. Zhang

Abstract: Ablative Rayleigh-Taylor instability (ARTI) and nonlocal heat transport are the critical problems in laser-driven inertial confinement fusion, while their coupling with each other is not completely understood yet. Here the ARTI in the presence of nonlocal heat transport is studied self-consistently for the first time theoretically and by using radiation hydrodynamic simulations. It is found that t… ▽ More Ablative Rayleigh-Taylor instability (ARTI) and nonlocal heat transport are the critical problems in laser-driven inertial confinement fusion, while their coupling with each other is not completely understood yet. Here the ARTI in the presence of nonlocal heat transport is studied self-consistently for the first time theoretically and by using radiation hydrodynamic simulations. It is found that the nonlocal heat flux generated by the hot electron transport tends to attenuate the growth of instability, especially for short wavelength perturbations. A linear theory of the ARTI coupled with the nonlocal heat flux is developed, and a prominent stabilization of the ablation front via the nonlocal heat flux is found, in good agreement with numerical simulations. This effect becomes more significant as the laser intensity increases. Our results should have important references for the target designing for inertial confinement fusion. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures

arXiv:2403.11471 [pdf, other]

Self-similar imploding solutions of the relativistic Euler equations

Authors: Feng Shao, Dongyi Wei, Zhifei Zhang

Abstract: Motivated by recent breakthrough on smooth imploding solutions of compressible Euler, we construct self-similar smooth imploding solutions of isentropic relativistic Euler equations with isothermal equation of state $p=\frac1\ell\varrho$ for \textit{all} $\ell>1$ in physical space dimension $d=2,3$ and for $\ell>1$ close to 1 in higher dimensions. This work is a crucial step toward solving the lon… ▽ More Motivated by recent breakthrough on smooth imploding solutions of compressible Euler, we construct self-similar smooth imploding solutions of isentropic relativistic Euler equations with isothermal equation of state $p=\frac1\ell\varrho$ for \textit{all} $\ell>1$ in physical space dimension $d=2,3$ and for $\ell>1$ close to 1 in higher dimensions. This work is a crucial step toward solving the long-standing problem: finite time blow-up of the supercritical defocusing nonlinear wave equation. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 68 pages, 2 figures

arXiv:2402.04467 [pdf, other]

DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

Authors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invari… ▽ More Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories' length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measure (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models. △ Less

Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/ergodic

arXiv:2401.14687 [pdf, other]

Heavy Neutral Leptons in Gauged $U(1)_{L_μ-L_τ}$ at Muon Collider

Authors: Ru-Yi He, Jia-Qi Huang, Jin-Yuan Xu, Fa-Xin Yang, Zhi-Long Han, Feng-Lan Shao

Abstract: Heavy neutral leptons $N$ are the most appealing candidates to generate the tiny neutrino masses. In this paper, we study the signature of heavy neutral leptons in gauged $U(1)_{L_μ-L_τ}$ at a muon collider. Charged under the $U(1)_{L_μ-L_τ}$ symmetry, the heavy neutral leptons can be pair produced via the new gauge boson $Z'$ at muon collider as $μ^+μ^-\to Z^{\prime *}\to NN$ and… ▽ More Heavy neutral leptons $N$ are the most appealing candidates to generate the tiny neutrino masses. In this paper, we study the signature of heavy neutral leptons in gauged $U(1)_{L_μ-L_τ}$ at a muon collider. Charged under the $U(1)_{L_μ-L_τ}$ symmetry, the heavy neutral leptons can be pair produced via the new gauge boson $Z'$ at muon collider as $μ^+μ^-\to Z^{\prime *}\to NN$ and $μ^+μ^-\to Z^{\prime (*)} γ\to NNγ$. We then perform a detailed analysis on the lepton number violation signature $μ^+μ^-\to NN\to μ^\pmμ^\pm W^\mp W^\mp$ and $μ^+μ^-\to NN γ\to μ^\pmμ^\pm W^\mp W^\mp γ$ at the 3 TeV muon collider, where the hadronic decays of $W$ boson are treated as fat-jets $J$. These lepton number violation signatures have quite clean backgrounds at the muon collider. Our simulation shows that a wide range of viable parameter space is within the reach of the 3 TeV muon collider. For instance, with new gauge coupling $g'=0.6$ and an integrated luminosity of 1000 fb$^{-1}$, the $μ^\pmμ^\pm JJ$ signal could probe $m_{Z'}\lesssim 12.5$ TeV. Meanwhile, if the gauge boson mass satisfies $2 m_N<m_{Z'}<\sqrt{s}$, the $μ^\pmμ^\pm JJγ$ signature would be more promising than the $μ^\pmμ^\pm JJ$ signature. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 20 pages, 8 figures, 2 tables

arXiv:2311.15559 [pdf, ps, other]

Bigness of tangent bundles and dynamical rigidity of Fano manifolds of Picard number 1 (with an appendix by Jie Liu)

Authors: Feng Shao, Guolei Zhong

Abstract: Let $f\colon X\to Y$ be a surjective morphism of Fano manifolds of Picard number 1 whose VMRTs at a general point are not dual defective. Suppose that the tangent bundle $T_X$ is big. We show that $f$ is an isomorphism unless $Y$ is a projective space. As applications, we study the bigness of the tangent bundles of complete intersections, del Pezzo manifolds, and Mukai manifolds, as well as their… ▽ More Let $f\colon X\to Y$ be a surjective morphism of Fano manifolds of Picard number 1 whose VMRTs at a general point are not dual defective. Suppose that the tangent bundle $T_X$ is big. We show that $f$ is an isomorphism unless $Y$ is a projective space. As applications, we study the bigness of the tangent bundles of complete intersections, del Pezzo manifolds, and Mukai manifolds, as well as their dynamical rigidity. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 21 pages with an appendix by Jie Liu, comments are welcome!

MSC Class: 14J40; 14J45

arXiv:2311.14784 [pdf, other]

Characterization and Correction of the Scattering Background Produced by Dust on the Objective Lens of the Lijiang 10-cm Coronagraph

Authors: Feiyang Sha, Yu Liu, Xuefei Zhang, Tengfei Song

Abstract: Scattered light from the objective lens, directly exposed to the intense sunlight, is a dominant source of stray light in internally occulted coronagraphs. The variable stray light, such as the scatter from dust on the objective lens, can produce varying scattering backgrounds in coronal images, significantly impacting image quality and data analysis. Using data acquired by the Lijiang 10-cm Coron… ▽ More Scattered light from the objective lens, directly exposed to the intense sunlight, is a dominant source of stray light in internally occulted coronagraphs. The variable stray light, such as the scatter from dust on the objective lens, can produce varying scattering backgrounds in coronal images, significantly impacting image quality and data analysis. Using data acquired by the Lijiang 10-cm Coronagraph, the quantitative relationship between the distribution of dust on the objective lens and the resulting scattering backgrounds background is analyzed. Two empirical models for the scattering background are derived, and used to correct the raw coronal data. The second model, which depends on three parameters and performs better, shows that the scattering-background distribution varies with angle, weakens with increasing height, and enhances with increasing dust level on the objective lens. Moreover, we find that the dust on the center of the objective lens can contribute more significantly to the scattering background than on the edge. This study not only quantitatively confirms the significant impact of the stray light produced by dust on the objective lens of the coronagraph, but also corrects the coronal data with this stray light for the first time. Correcting for dust-scattered light is crucial for the high-precision calibration of ground-based coronagraph data, enabling a more accurate analysis of coronal structures. Furthermore, our model is envisioned to support the provision of reliable observational data for future routine coronal magnetic-field measurements using ground-based coronagraphs. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 18 pages, 14 figrues

arXiv:2311.07085 [pdf, other]

Engineering 2D material exciton lineshape with graphene/h-BN encapsulation

Authors: Steffi Y. Woo, Fuhui Shao, Ashish Arora, Robert Schneider, Nianjheng Wu, Andrew J. Mayne, Ching-Hwa Ho, Mauro Och, Cecilia Mattevi, Antoine Reserbat-Plantey, Alvaro Moreno, Hanan Herzig Sheinfux, Kenji Watanabe, Takashi Taniguchi, Steffen Michaelis de Vasconcellos, Frank H. L. Koppens, Zhichuan Niu, Odile Stéphan, Mathieu Kociak, F. Javier García de Abajo, Rudolf Bratschitsch, Andrea Konečná, Luiz H. G. Tizei

Abstract: Control over the optical properties of atomically thin two-dimensional (2D) layers, including those of transition metal dichalcogenides (TMDs), is needed for future optoelectronic applications. Remarkable advances have been achieved through alloying, chemical and electrical doping, and applied strain. However, the integration of TMDs with other 2D materials in van der Waals heterostructures (vdWHs… ▽ More Control over the optical properties of atomically thin two-dimensional (2D) layers, including those of transition metal dichalcogenides (TMDs), is needed for future optoelectronic applications. Remarkable advances have been achieved through alloying, chemical and electrical doping, and applied strain. However, the integration of TMDs with other 2D materials in van der Waals heterostructures (vdWHs) to tailor novel functionalities remains largely unexplored. Here, the near-field coupling between TMDs and graphene/graphite is used to engineer the exciton lineshape and charge state. Fano-like asymmetric spectral features are produced in WS$_{2}$, MoSe$_{2}$ and WSe$_{2}$ vdWHs combined with graphene, graphite, or jointly with hexagonal boron nitride (h-BN) as supporting or encapsulating layers. Furthermore, trion emission is suppressed in h-BN encapsulated WSe$_{2}$/graphene with a neutral exciton redshift (44 meV) and binding energy reduction (30 meV). The response of these systems to electron-beam and light probes is well-described in terms of 2D optical conductivities of the involved materials. Beyond fundamental insights into the interaction of TMD excitons with structured environments, this study opens an unexplored avenue toward shaping the spectral profile of narrow optical modes for application in nanophotonic devices. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.02331 [pdf, other]

doi 10.14722/ndss.2024.23204

NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation

Authors: Shaofei Li, Feng Dong, Xusheng Xiao, Haoyu Wang, Fei Shao, Jiedong Chen, Yao Guo, Xiangqun Chen, Ding Li

Abstract: Advanced Persistent Threats (APT) attacks have plagued modern enterprises, causing significant financial losses. To counter these attacks, researchers propose techniques that capture the complex and stealthy scenarios of APT attacks by using provenance graphs to model system entities and their dependencies. Particularly, to accelerate attack detection and reduce financial losses, online provenance… ▽ More Advanced Persistent Threats (APT) attacks have plagued modern enterprises, causing significant financial losses. To counter these attacks, researchers propose techniques that capture the complex and stealthy scenarios of APT attacks by using provenance graphs to model system entities and their dependencies. Particularly, to accelerate attack detection and reduce financial losses, online provenance-based detection systems that detect and investigate APT attacks under the constraints of timeliness and limited resources are in dire need. Unfortunately, existing online systems usually sacrifice detection granularity to reduce computational complexity and produce provenance graphs with more than 100,000 nodes, posing challenges for security admins to interpret the detection results. In this paper, we design and implement NodLink, the first online detection system that maintains high detection accuracy without sacrificing detection granularity. Our insight is that the APT attack detection process in online provenance-based detection systems can be modeled as a Steiner Tree Problem (STP), which has efficient online approximation algorithms that recover concise attack-related provenance graphs with a theoretically bounded error. To utilize STP approximation algorithm frameworks for APT attack detection, we propose a novel design of in-memory cache, an efficient attack screening method, and a new STP approximation algorithm that is more efficient than the conventional one in APT attack detection while maintaining the same complexity. We evaluate NodLink in a production environment. The open-world experiment shows that NodLink outperforms two state-of-the-art (SOTA) online provenance analysis systems by achieving magnitudes higher detection and investigation accuracy while having the same or higher throughput. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: The final version of this paper is going to appear in the Conference on Network and Distributed System Security Symposium (NDSS'24), 26 Feb - 1 Mar 2024, San Diego, California

arXiv:2311.00445 [pdf, other]

A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

Authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen

Abstract: A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate such human biases, or are they able to overcome them? Focusing on the case of sy… ▽ More A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate such human biases, or are they able to overcome them? Focusing on the case of syllogisms -- inferences from two simple premises -- we show that, within the PaLM2 family of transformer language models, larger models are more logical than smaller ones, and also more logical than humans. At the same time, even the largest models make systematic errors, some of which mirror human reasoning biases: they show sensitivity to the (irrelevant) ordering of the variables in the syllogism, and draw confident but incorrect inferences from particular syllogisms (syllogistic fallacies). Overall, we find that language models often mimic the human biases included in their training data, but are able to overcome them in some cases. △ Less

Submitted 11 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: NAACL 2024

arXiv:2310.19956 [pdf, other]

The Impact of Depth on Compositional Generalization in Transformer Language Models

Authors: Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

Abstract: To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by theoretical and empirical work, that deeper transformers generalize more compositionally. Simply adding layers increases the total number o… ▽ More To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by theoretical and empirical work, that deeper transformers generalize more compositionally. Simply adding layers increases the total number of parameters; to address this confound between depth and size, we construct three classes of models which trade off depth for width such that the total number of parameters is kept constant (41M, 134M and 374M parameters). We pretrain all models as LMs and fine-tune them on tasks that test for compositional generalization. We report three main conclusions: (1) after fine-tuning, deeper models generalize more compositionally than shallower models do, but the benefit of additional layers diminishes rapidly; (2) within each family, deeper models show better language modeling performance, but returns are similarly diminishing; (3) the benefits of depth for compositional generalization cannot be attributed solely to better performance on language modeling. Because model latency is approximately linear in the number of layers, these results lead us to the recommendation that, with a given total parameter budget, transformers can be made shallower than is typical without sacrificing performance. △ Less

Submitted 10 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to NAACL 2024

arXiv:2309.16296 [pdf, ps, other]

Production properties of deuterons, helions and tritons via an analytical nucleon coalescence method in Pb-Pb collisions at $\sqrt{s_{NN}}=2.76$ TeV

Authors: Rui-Qin Wang, Yan-Hao Li, Jun Song, Feng-Lan Shao

Abstract: We improve a nucleon coalescence model to include the coordinate-momentum correlation in nucleon joint distributions, and apply it to Pb-Pb collisions at $\sqrt{s_{NN}}=2.76$ TeV to study production properties of deuterons ($d$), helions ($^3$He) and tritons ($t$). We give formulas of the coalescence factors $B_2$ and $B_3$, and naturally explain their behaviors as functions of the collision centr… ▽ More We improve a nucleon coalescence model to include the coordinate-momentum correlation in nucleon joint distributions, and apply it to Pb-Pb collisions at $\sqrt{s_{NN}}=2.76$ TeV to study production properties of deuterons ($d$), helions ($^3$He) and tritons ($t$). We give formulas of the coalescence factors $B_2$ and $B_3$, and naturally explain their behaviors as functions of the collision centrality and the transverse momentum per nucleon $p_T/A$. We reproduce the transverse momentum spectra, averaged transverse momenta and yield rapidity densities of $d$, $^3$He and $t$, and find the system effective radius obtained in the coalescence production of light nuclei behaves similarly to Hanbury Brown-Twiss interferometry radius. We particularly give expressions of yield ratios $d/p$, $^3$He$/d$, $t/p$, $^3$He$/p$, $d/p^{2}$, $^3$He$/p^3$, $t/^3$He and argue their nontrivial behaviors can be used to distinguish production mechanisms of light nuclei. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 12 pages, 8 figures, 1 table

arXiv:2308.15560 [pdf, other]

WeatherBench 2: A benchmark for the next generation of data-driven global weather models

Authors: Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, Fei Sha

Abstract: WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and… ▽ More WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting. △ Less

Submitted 26 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.12588 [pdf, other]

Sterile Neutrino Portal Dark Matter from Semi-Production

Authors: Ang Liu, Feng-Lan Shao, Zhi-Long Han, Yi Jin, Honglei Li

Abstract: In this paper, we study the feeble sterile neutrino portal dark matter under the $Z_3$ symmetry. The dark sector consists of one fermion singlet $χ$ and one scalar singlet $χ$, which transforms as $χ\to e^{i2π/3}χ, φ\to e^{i2π/3}φ$ under the $Z_3$ symmetry. Regarding fermion singlet $χ$ as the dark matter candidate, the new interaction terms $y_χφ\bar{χ^c}χ$ and $μφ^3/2$ could induce various new p… ▽ More In this paper, we study the feeble sterile neutrino portal dark matter under the $Z_3$ symmetry. The dark sector consists of one fermion singlet $χ$ and one scalar singlet $χ$, which transforms as $χ\to e^{i2π/3}χ, φ\to e^{i2π/3}φ$ under the $Z_3$ symmetry. Regarding fermion singlet $χ$ as the dark matter candidate, the new interaction terms $y_χφ\bar{χ^c}χ$ and $μφ^3/2$ could induce various new production channels. For instance, when $m_φ>2m_χ$, the pair decay $φ\toχχ$ could be the dominant channel, rather than the delayed decay $φ\toχν$. Another appealing scenario is when the dark sector is initially produced through the scattering process as $NN\toχχ, NN\toφφ,hν\toχφ$, then the semi-production processes $N χ\toφφ, Nφ\toφχ, Nχ\toχχ$ could lead to the exponential growth of dark sector abundances. The phenomenology of sterile neutrino and the cosmological impact of the dark scalar are also considered in the $Z_3$ symmetric model. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 24 pages, 9 figures

arXiv:2307.09972 [pdf, other]

Fine-grained Text-Video Retrieval with Frozen Image Encoders

Authors: Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu

Abstract: State-of-the-art text-video retrieval (TVR) methods typically utilize CLIP and cosine similarity for efficient retrieval. Meanwhile, cross attention methods, which employ a transformer decoder to compute attention between each text query and all frames in a video, offer a more comprehensive interaction between text and videos. However, these methods lack important fine-grained spatial information… ▽ More State-of-the-art text-video retrieval (TVR) methods typically utilize CLIP and cosine similarity for efficient retrieval. Meanwhile, cross attention methods, which employ a transformer decoder to compute attention between each text query and all frames in a video, offer a more comprehensive interaction between text and videos. However, these methods lack important fine-grained spatial information as they directly compute attention between text and video-level tokens. To address this issue, we propose CrossTVR, a two-stage text-video retrieval architecture. In the first stage, we leverage existing TVR methods with cosine similarity network for efficient text/video candidate selection. In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions. Additionally, we employ the frozen CLIP model strategy in fine-grained retrieval, enabling scalability to larger pre-trained vision models like ViT-G, resulting in improved retrieval performance. Experiments on text video retrieval datasets demonstrate the effectiveness and scalability of our proposed CrossTVR compared to state-of-the-art approaches. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2306.14066 [pdf, other]

SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models

Authors: Lizao Li, Rob Carver, Ignacio Lopez-Gomez, Fei Sha, John Anderson

Abstract: Uncertainty quantification is crucial to decision-making. A prominent example is probabilistic forecasting in numerical weather prediction. The dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. This is done by running many physics-based simulations under different conditions, which is a computationally costly process. We propose to amorti… ▽ More Uncertainty quantification is crucial to decision-making. A prominent example is probabilistic forecasting in numerical weather prediction. The dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. This is done by running many physics-based simulations under different conditions, which is a computationally costly process. We propose to amortize the computational cost by emulating these forecasts with deep generative diffusion models learned from historical data. The learned models are highly scalable with respect to high-performance computing accelerators and can sample hundreds to tens of thousands of realistic weather forecasts at low cost. When designed to emulate operational ensemble forecasts, the generated ones are similar to physics-based ensembles in important statistical properties and predictive skill. When designed to correct biases present in the operational forecasting system, the generated ensembles show improved probabilistic forecast metrics. They are more reliable and forecast probabilities of extreme weather events more accurately. While this work demonstrates the utility of the methodology by focusing on weather forecasting, the generative artificial intelligence methodology can be extended for uncertainty quantification in climate modeling, where we believe the generation of very large ensembles of climate projections will play an increasingly important role in climate risk assessment. △ Less

Submitted 8 October, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: fixed a mistake of the previous version; the paper has not been submitted to neurips 2023

arXiv:2306.09224 [pdf, other]

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa . △ Less

Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV'23

arXiv:2306.07526 [pdf, other]

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

Authors: Marc Finzi, Anudhyan Boral, Andrew Gordon Wilson, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about out… ▽ More Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about outliers and extreme events; however, querying that knowledge through conditional sampling or measuring probabilities is surprisingly difficult. Existing methods for conditional sampling at inference time seek mainly to enforce the constraints, which is insufficient to match the statistics of the distribution or compute the probability of the chosen events. To achieve these ends, optimally one would use the conditional score function, but its computation is typically intractable. In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. With this scheme we are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: ICML 2023 Conference

arXiv:2306.01174 [pdf, other]

Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations

Authors: Anudhyan Boral, Zhong Yi Wan, Leonardo Zepeda-Núñez, James Lottes, Qing Wang, Yi-fan Chen, John Roberts Anderson, Fei Sha

Abstract: We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginaliz… ▽ More We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginalized to obtain the deterministic evolution of the LES state. However, ideal LES is analytically intractable. In our work, we use a latent neural SDE to model the evolution of the stochastic process and an encoder-decoder pair for transforming between the latent space and the desired ideal flow field. This stands in sharp contrast to other types of neural parameterization of closure models where each trajectory is treated as a deterministic realization of the dynamics. We show the effectiveness of our approach (niLES - neural ideal LES) on a challenging chaotic dynamical system: Kolmogorov flow at a Reynolds number of 20,000. Compared to competing methods, our method can handle non-uniform geometries using unstructured meshes seamlessly. In particular, niLES leads to trajectories with more accurate statistics and enhances stability, particularly for long-horizon rollouts. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 18 pages

arXiv:2305.15618 [pdf, other]

Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models

Authors: Zhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, Leonardo Zepeda-Núñez

Abstract: We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optim… ▽ More We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a conditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representative of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8x and 16x. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives. Code for this work is available at: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_diffusion. △ Less

Submitted 30 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023 (spotlight)

arXiv:2305.15434 [pdf]

Hybrid Optimization of Laser-Driven Fusion Targets and Laser Profiles

Authors: Z. Li, Z. Q. Zhao, X. H. Yang, G. B. Zhang, Y. Y. Ma, H. Xu, F. Y. Wu, F. Q. Shao, J. Zhang

Abstract: Quasi-isentropic compression is an effective method to achieve high-density and high-temperature implosion in laser-driven inertial confinement fusion (ICF). However, it requires precise matching between the laser profile and the target structure. Designing the optimal laser profile and the corresponding target for ICF is a challenge due to the large number of parameters involved. In this paper, w… ▽ More Quasi-isentropic compression is an effective method to achieve high-density and high-temperature implosion in laser-driven inertial confinement fusion (ICF). However, it requires precise matching between the laser profile and the target structure. Designing the optimal laser profile and the corresponding target for ICF is a challenge due to the large number of parameters involved. In this paper, we present a novel method that combines random walk and Bayesian optimization. The basic sampling data for Bayesian optimization are a series of laser pulse profiles and target structures that can produce relatively high areal densities obtained by the random walk method. This approach reduces the number of samples required for Bayesian optimization and mitigates low efficiency in the latter stages of the random walk method. The method also reduces the randomness in the optimization process and enhances the optimization efficiency. It should have important applications in ICF research. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.15354 [pdf, other]

Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization

Authors: Feifei Shao, Yawei Luo, Lei Chen, Ping Liu, Wei Yang, Yi Yang, Jun Xiao

Abstract: Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the ori… ▽ More Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the origins of biased activation. Based on our analysis, we attribute this phenomenon to the presence of co-occurring background confounders. Building upon this profound insight, we introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL), meticulously engendering counterfactual representations by adeptly disentangling the foreground from the co-occurring background elements. Furthermore, we propose an innovative network architecture known as Counterfactual-CAM. This architecture seamlessly incorporates a perturbation mechanism for counterfactual representations into the vanilla CAM-based model. By training the WSOL model with these perturbed representations, we guide the model to prioritize the consistent foreground content while concurrently reducing the influence of distracting co-occurring backgrounds. To the best of our knowledge, this study represents the initial exploration of this research direction. Our extensive experiments conducted across multiple benchmarks validate the effectiveness of the proposed Counterfactual-CAM in mitigating biased activation. △ Less

Submitted 9 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 10 pages, 6 figures, 8 tables

arXiv:2305.06594 [pdf, other]

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Authors: Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

Abstract: Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally alig… ▽ More Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally aligned signatures between video and music directly from paired music and videos, without explicitly modeling domain-specific rhythmic or semantic relationships. We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types using a multi-stage autoregressive model. Trained on 5k hours of music audio clips paired with video frames mined from in-the-wild music videos, V2Meow is competitive with previous domain-specific models when evaluated in a zero-shot manner. It synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality. Music samples are available at tinyurl.com/v2meow. △ Less

Submitted 22 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: accepted at AAAI 2024, music samples available at https://tinyurl.com/v2meow

arXiv:2305.05182 [pdf, other]

Self-similar algebraic spiral solution of 2-D incompressible Euler equations

Authors: Feng Shao, Dongyi Wei, Zhifei Zhang

Abstract: In this paper, we prove the existence of self-similar algebraic spiral solutions for 2-D incompressible Euler equations for the initial vorticity of the form $|y|^{-\frac1μ}\ \mathringω(θ)$ with $μ>\frac12$ and $\mathringω\in L^1(\mathbb T)$ satisfying $m$-fold symmetry ($m\geq 2$) and a dominant condition. As an important application, we prove the existence of weak solution when $\mathringω$ is a… ▽ More In this paper, we prove the existence of self-similar algebraic spiral solutions for 2-D incompressible Euler equations for the initial vorticity of the form $|y|^{-\frac1μ}\ \mathringω(θ)$ with $μ>\frac12$ and $\mathringω\in L^1(\mathbb T)$ satisfying $m$-fold symmetry ($m\geq 2$) and a dominant condition. As an important application, we prove the existence of weak solution when $\mathringω$ is a Radon measure on $\mathbb T$ with $m$-fold symmetry, which is related to the vortex sheet solution. △ Less

Submitted 1 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 60 pages, 1 figure

arXiv:2304.00434 [pdf, ps, other]

Transverse momentum and multiplicity dependence of $Λ_{c}^{+}/D^{0}$ ratio in $pp$ collisions at $\sqrt{s}=13$ TeV

Authors: Jun Song, Hai-hong Li, Feng-lan Shao

Abstract: We apply an equal-velocity quark combination model to study the $Λ_{c}^{+}/D^{0}$ ratio in the range $p_{T}\lesssim10$ GeV/c in $pp$ collisions at $\sqrt{s}=13$ TeV. We decompose the ratio into four parts which are related to quark numbers, light-flavor quark $p_{T}$ spectrum, charm quark $p_{T}$ spectrum, momentum correlation between light and charm quarks, respectively. Their influence on… ▽ More We apply an equal-velocity quark combination model to study the $Λ_{c}^{+}/D^{0}$ ratio in the range $p_{T}\lesssim10$ GeV/c in $pp$ collisions at $\sqrt{s}=13$ TeV. We decompose the ratio into four parts which are related to quark numbers, light-flavor quark $p_{T}$ spectrum, charm quark $p_{T}$ spectrum, momentum correlation between light and charm quarks, respectively. Their influence on $Λ_{c}^{+}/D^{0}$ ratio are individually studied. The curvature property of light-flavor quark $p_{T}$ spectrum is found to be the main reason of the non-monotonic $p_{T}$ dependence of $Λ_{c}^{+}/D^{0}$ ratio exhibited in high multiplicity events. Moreover, the multiplicity dependence of $Λ_{c}^{+}/D^{0}$ ratio as the function of $p_{T}$ is mainly because of the multiplicity dependence of light-flavor quark $p_{T}$ spectrum. Using the light-flavor quark $p_{T}$ spectrum obtained from experimental data of light-flavor hadrons and charm quark $p_{T}$ spectrum obtained from FONLL and/or PYTHIA calculations, the $p_{T}$ dependence of experimental data of $Λ_{c}^{+}/D^{0}$ ratio in high multiplicity events and that in low multiplicity events in $pp$ collisions at $\sqrt{s}=13$ TeV are reasonably understood. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: 13 pages, 5 figures

arXiv:2302.07546 [pdf, ps, other]

doi 10.3390/sym15020400

Production of Strange and Charm Hadrons in Pb+Pb Collisions at $\sqrt{s_{NN}}=$ 5.02 TeV

Authors: Wen-bin Chang, Rui-qin Wang, Jun Song, Feng-lan Shao, Qun Wang, Zuo-tang Liang

Abstract: Using a quark combination model with the equal-velocity combination approximation, we study the production of hadrons with strangeness and charm flavor quantum numbers in Pb+Pb collisions at $\sqrt{s_{NN}}=$5.02 TeV. We present analytical expressions and numerical results for these hadrons' transverse momentum spectra and yield ratios. Our numerical results agree well with the experimental data av… ▽ More Using a quark combination model with the equal-velocity combination approximation, we study the production of hadrons with strangeness and charm flavor quantum numbers in Pb+Pb collisions at $\sqrt{s_{NN}}=$5.02 TeV. We present analytical expressions and numerical results for these hadrons' transverse momentum spectra and yield ratios. Our numerical results agree well with the experimental data available. The features of strange and charm hadron production in the quark--gluon plasma at the early stage of heavy ion collisions are also discussed. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 12 pages, 9 figures

Journal ref: Symmetry 2023,15(2),400

arXiv:2302.06009 [pdf, other]

Policy-Induced Self-Supervision Improves Representation Finetuning in Visual RL

Authors: Sébastien M. R. Arnold, Fei Sha

Abstract: We study how to transfer representations pretrained on source tasks to target tasks in visual percept based RL. We analyze two popular approaches: freezing or finetuning the pretrained representations. Empirical studies on a set of popular tasks reveal several properties of pretrained representations. First, finetuning is required even when pretrained representations perfectly capture the informat… ▽ More We study how to transfer representations pretrained on source tasks to target tasks in visual percept based RL. We analyze two popular approaches: freezing or finetuning the pretrained representations. Empirical studies on a set of popular tasks reveal several properties of pretrained representations. First, finetuning is required even when pretrained representations perfectly capture the information required to solve the target task. Second, finetuned representations improve learnability and are more robust to noise. Third, pretrained bottom layers are task-agnostic and readily transferable to new tasks, while top layers encode task-specific information and require adaptation. Building on these insights, we propose a self-supervised objective that clusters representations according to the policy they induce, as opposed to traditional representation similarity measures which are policy-agnostic (e.g. Euclidean norm, cosine similarity). Together with freezing the bottom layers, this objective results in significantly better representation than frozen, finetuned, and self-supervised alternatives on a wide range of benchmarks. △ Less

Submitted 12 February, 2023; originally announced February 2023.

arXiv:2301.10448 [pdf, other]

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William Cohen

Abstract: Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs… ▽ More Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size. △ Less

Submitted 2 June, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2301.10391 [pdf, other]

Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems

Authors: Zhong Yi Wan, Leonardo Zepeda-Núñez, Anudhyan Boral, Fei Sha

Abstract: We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. Those systems have slow-decaying Kolmogorov n-width that hinders standard methods, including reduced order modeling, from producing high-fidelity simulations at low cost. In this work, we construct hypernetwork-based laten… ▽ More We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. Those systems have slow-decaying Kolmogorov n-width that hinders standard methods, including reduced order modeling, from producing high-fidelity simulations at low cost. In this work, we construct hypernetwork-based latent dynamical models directly on the parameter space of a compact representation network. We leverage the expressive power of the network and a specially designed consistency-inducing regularization to obtain latent trajectories that are both low-dimensional and smooth. These properties render our surrogate models highly efficient at inference time. We show the efficacy of our framework by learning models that generate accurate multi-step rollout predictions at much faster inference speed compared to competitors, for several challenging examples. △ Less

Submitted 6 February, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: 25 pages, 9 figures

arXiv:2301.09416 [pdf, other]

Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

Authors: Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu

Abstract: Most existing transformer based video instance segmentation methods extract per frame features independently, hence it is challenging to solve the appearance deformation problem. In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder. Specifically, in transformer encoder, we propo… ▽ More Most existing transformer based video instance segmentation methods extract per frame features independently, hence it is challenging to solve the appearance deformation problem. In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder. Specifically, in transformer encoder, we propose a novel spatio-temporal joint multi-scale deformable attention module which dynamically integrates the spatial and temporal information to obtain enriched spatio-temporal features. In transformer decoder, we introduce a temporal self-attention module to enhance the frame level box queries with the temporal relation. Moreover, TAFormer adopts an instance level contrastive loss to increase the discriminability of instance query embeddings. Therefore the tracking error caused by visually similar instances can be decreased. Experimental results show that TAFormer effectively leverages the spatial and temporal information to obtain context-aware feature representation and outperforms state-of-the-art methods. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2301.01060 [pdf, other]

Knowledge-guided Causal Intervention for Weakly-supervised Object Localization

Authors: Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao

Abstract: Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels. First, the ``entangled context'' issue arises from object-context co-occurrence (\eg, fish and water), making the model inspection hard to distinguish object boundaries clearly. Sec… ▽ More Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels. First, the ``entangled context'' issue arises from object-context co-occurrence (\eg, fish and water), making the model inspection hard to distinguish object boundaries clearly. Second, the ``C-L dilemma'' issue results from the information decay caused by the pooling layers, which struggle to retain both the semantic information for precise classification and those essential details for accurate localization, leading to a trade-off in performance. In this paper, we propose a knowledge-guided causal intervention method, dubbed KG-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention, which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the disentangled object feature, we introduce a multi-source knowledge guidance framework to strike a balance between absorbing classification knowledge and localization knowledge during model training. Extensive experiments conducted on several benchmark datasets demonstrate the effectiveness of KG-CI-CAM in learning distinct object boundaries amidst confounding contexts and mitigating the dilemma between classification and localization performance. △ Less

Submitted 12 March, 2024; v1 submitted 3 January, 2023; originally announced January 2023.

Comments: 13 pages, 7 figures, 7 tables

arXiv:2212.10043 [pdf, other]

Shinning Light on Sterile Neutrino Portal Dark Matter from Cosmology and Collider

Authors: Ang Liu, Feng-Lan Shao, Zhi-Long Han, Yi Jin, Honglei Li

Abstract: Provided the dark sector consisted of a dark scalar $φ$ and a dark fermion $χ$ under an exact $Z_2$ symmetry, the sterile neutrino $N$ can act as the messenger between the dark sector and standard model via the Yukawa coupling $λ_{ds} \barχφN$. In this paper, we focus on the specific scenario $m_N>m_φ+m_χ$ with $χ$ being a FIMP dark matter. The decay width of dark scalar $φ$ is doubly suppressed b… ▽ More Provided the dark sector consisted of a dark scalar $φ$ and a dark fermion $χ$ under an exact $Z_2$ symmetry, the sterile neutrino $N$ can act as the messenger between the dark sector and standard model via the Yukawa coupling $λ_{ds} \barχφN$. In this paper, we focus on the specific scenario $m_N>m_φ+m_χ$ with $χ$ being a FIMP dark matter. The decay width of dark scalar $φ$ is doubly suppressed by the smallness of Yukawa coupling $λ_{ds}$ and mixing angle $θ$. The delayed decay $φ\toχν$ will have a great impact on cosmological observables such as the Big Bang Nucleosynthesis, the Cosmic Microwave Background anisotropy power spectra, the effective number of relativistic neutrino species $N_{\rm eff}$ and the energetic neutrino flux. Meanwhile, the sterile neutrino can generate displaced vertex signature at colliders when $m_N<m_W$. The dark scalar $φ$ will also induce measurable Higgs invisible decay for relatively large quartic coupling. A comprehensive analysis of constraints from cosmology and collider is performed in this paper. We find that almost the whole parameter space with $m_N<m_W$ is under the reach of future experiments. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 18 pages,8 figures

arXiv:2212.08153 [pdf, other]

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

Authors: Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen

Abstract: Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while… ▽ More Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large. △ Less

Submitted 2 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: ACL Findings 2023

arXiv:2212.05141 [pdf]

A High-resolution Large-eddy Simulation Framework for Wildland Fire Predictions using TensorFlow

Authors: Qing Wang, Matthias Ihme, Rod R. Linn, Yi-Fan Chen, Vivian Yang, Fei Sha, Craig Clements, Jenna S. McDanold, John Anderson

Abstract: As the impact of wildfires has become increasingly more severe over the last decades, there is continued pressure for improvements in our ability to predict wildland fire behavior over a wide range of conditions. One approach towards this goal is through coupled fire/atmosphere modeling tools. While significant progress has been made on advancing their physical fidelity, existing modeling tools ha… ▽ More As the impact of wildfires has become increasingly more severe over the last decades, there is continued pressure for improvements in our ability to predict wildland fire behavior over a wide range of conditions. One approach towards this goal is through coupled fire/atmosphere modeling tools. While significant progress has been made on advancing their physical fidelity, existing modeling tools have not taken full advantage of emerging programming paradigms and computing architectures to enable high-resolution wildfire simulations. By addressing this gap, this work presents a new wildfire simulation framework that enables landscape-scale wildfire simulations with physical representation of the combustion at affordable computational cost. This is achieved by developing a coupled fire/atmosphere model in the TensorFlow programming paradigm, which enables highly efficient and scalable computations on Tensor Processing Unit (TPU) hardware architecture. To validate this simulation framework and demonstrate its efficiency, simulations of the prescribed fire experiment FireFlux II (Clements et al., 2019) are performed. By considering a parametric study on the mesh resolution, we show that the global quantities such as volumetric heat release and fire-spread rate are insensitive to the horizontal mesh resolution within a range between 0.5 m and 2 m, which is sufficient for predicting fire intermittency and dynamic fire properties associated with fine-scale turbulent structures in the atmospheric boundary layer. △ Less

Submitted 12 July, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 10 figures, 3 tables, 4844 words

arXiv:2211.16380 [pdf, ps, other]

doi 10.1016/j.jalgebra.2023.10.030

Boundedness of finite morphisms onto Fano manifolds with large Fano index

Authors: Feng Shao, Guolei Zhong

Abstract: Let $f:Y\to X$ be a finite morphism between Fano manifolds $Y$ and $X$ such that the Fano index of $X$ is greater than 1. On the one hand, when both $X$ and $Y$ are fourfolds of Picard number 1, we show that the degree of $f$ is bounded in terms of $X$ and $Y$ unless $X\cong\mathbb{P}^4$; hence, such $X$ does not admit any non-isomorphic surjective endomorphism. On the other hand, when $X=Y$ is ei… ▽ More Let $f:Y\to X$ be a finite morphism between Fano manifolds $Y$ and $X$ such that the Fano index of $X$ is greater than 1. On the one hand, when both $X$ and $Y$ are fourfolds of Picard number 1, we show that the degree of $f$ is bounded in terms of $X$ and $Y$ unless $X\cong\mathbb{P}^4$; hence, such $X$ does not admit any non-isomorphic surjective endomorphism. On the other hand, when $X=Y$ is either a fourfold or a del Pezzo manifold, we prove that, if $f$ is an int-amplified endomorphism, then $X$ is toric. Moreover, we classify all the singular quadrics admitting non-isomorphic endomorphisms. △ Less

Submitted 27 November, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: 30 pages, minor revisions, Journal of Algebra (to appear), comments are welcome!

MSC Class: 08A35; 14E30; 14J35; 14J40; 14M25

Journal ref: Journal of Algebra, Volume 639, 1 February 2024, Pages 678-707

arXiv:2210.10271 [pdf, other]

Different Coalescence Sources of Light Nuclei Production in Au-Au Collisions at $\sqrt{s_{NN}}=3$ GeV

Authors: Rui-Qin Wang, Ji-Peng Lv, Yan-Hao Li, Jun Song, Feng-Lan Shao

Abstract: We study the production of light nuclei in the coalescence mechanism in Au-Au collisions at midrapidity at $\sqrt{s_{NN}}=3$ GeV. We derive analytic formulas of momentum distributions of two bodies, three bodies and four nucleons coalescing into light nuclei, respectively. We naturally explain the transverse momentum spectra of the deuteron ($d$), triton ($t$), helium-3 ($^3$He) and helium-4 (… ▽ More We study the production of light nuclei in the coalescence mechanism in Au-Au collisions at midrapidity at $\sqrt{s_{NN}}=3$ GeV. We derive analytic formulas of momentum distributions of two bodies, three bodies and four nucleons coalescing into light nuclei, respectively. We naturally explain the transverse momentum spectra of the deuteron ($d$), triton ($t$), helium-3 ($^3$He) and helium-4 ($^4$He). We reproduce the data of yield rapidity densities and averaged transverse momenta of $d$, $t$, $^3$He and $^4$He. We give proportions of contributions from different coalescence sources for $t$, $^3$He and $^4$He in their productions. We find that besides nucleon coalescence, nucleon$+$nucleus coalescence and nucleus$+$nucleus coalescence may play requisite roles in light nuclei production in Au-Au collisions at $\sqrt{s_{NN}}=3$ GeV. △ Less

Submitted 15 October, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: 5 figures, 6 tables

arXiv:2209.14899 [pdf, other]

Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing

Authors: Yury Zemlyanskiy, Michiel de Jong, Joshua Ainslie, Panupong Pasupat, Peter Shaw, Linlu Qiu, Sumit Sanghai, Fei Sha

Abstract: A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of que… ▽ More A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandRfirst generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: To appear in the proceedings of COLING 2022

arXiv:2208.02021 [pdf, ps, other]

Collision centrality and energy dependence of strange hadron production in Au + Au collisions at \sqrt{s_{NN}}= 7.7-54.4 GeV

Authors: Yanting Feng, Ziyao Song, Fenglan Shao, Jun Song

Abstract: We apply an equal-velocity quark combination model to systematically study the transverse momentum (p_{T}) spectra of strange hadrons K_{S}^{0}, φ, Λ, Ξ^{-}, Ω^{-}, \barΛ, \barΞ^{+} and \barΩ^{+} at mid-rapidity in Au+Au collisions at \sqrt{s_{NN}}= 7.7, 11.5, 19.6, 27, 39, 54.4 GeV. Relative deviation between the model calculation and experimental data of these eight hadrons is generally about 2-… ▽ More We apply an equal-velocity quark combination model to systematically study the transverse momentum (p_{T}) spectra of strange hadrons K_{S}^{0}, φ, Λ, Ξ^{-}, Ω^{-}, \barΛ, \barΞ^{+} and \barΩ^{+} at mid-rapidity in Au+Au collisions at \sqrt{s_{NN}}= 7.7, 11.5, 19.6, 27, 39, 54.4 GeV. Relative deviation between the model calculation and experimental data of these eight hadrons is generally about 2-3% at \sqrt{s_{NN}}= 27, 39, 54.4 GeV and in central collisions at 7.7, 11.5, 19.6 GeV. The deviation slightly increases up to about 4% in the semi-central and peripheral collision at \sqrt{s_{NN}}= 7.7, 11.5, 19.6 GeV. We systematically explain the dependence of two baryon-to-meson ratios \barΛ/K_{S}^{0} and Ω/φon p_{T}, collision centrality and collision energy by the property of quark p_{T} spectra at hadronization. We derive the analytic relations between R_{CP} of hadrons and those of quarks, and we use them to naturally explain the species and p_{T} dependence of R_{CP} of those strange hadrons. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2208.00623 [pdf, other]

doi 10.1109/TCSVT.2022.3231041

Quality Evaluation of Arbitrary Style Transfer: Subjective Study and Objective Metric

Authors: Hangwei Chen, Feng Shao, Xiongli Chai, Yuese Gu, Qiuping Jiang, Xiangchao Meng, Yo-Sung Ho

Abstract: Arbitrary neural style transfer is a vital topic with great research value and wide industrial application, which strives to render the structure of one image using the style of another. Recent researches have devoted great efforts on the task of arbitrary style transfer (AST) for improving the stylization quality. However, there are very few explorations about the quality evaluation of AST images… ▽ More Arbitrary neural style transfer is a vital topic with great research value and wide industrial application, which strives to render the structure of one image using the style of another. Recent researches have devoted great efforts on the task of arbitrary style transfer (AST) for improving the stylization quality. However, there are very few explorations about the quality evaluation of AST images, even it can potentially guide the design of different algorithms. In this paper, we first construct a new AST images quality assessment database (AST-IQAD), which consists 150 content-style image pairs and the corresponding 1200 stylized images produced by eight typical AST algorithms. Then, a subjective study is conducted on our AST-IQAD database, which obtains the subjective rating scores of all stylized images on the three subjective evaluations, i.e., content preservation (CP), style resemblance (SR), and overall vision (OV). To quantitatively measure the quality of AST image, we propose a new sparse representation-based method, which computes the quality according to the sparse feature similarity. Experimental results on our AST-IQAD have demonstrated the superiority of the proposed method. The dataset and source code will be released at https://github.com/Hangwei-Chen/AST-IQAD-SRQE △ Less

Submitted 29 January, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology 2022, Code and Dataset: https://github.com/Hangwei-Chen/AST-IQAD-SRQE

arXiv:2207.05595 [pdf, other]

doi 10.1103/PhysRevC.107.044906

Energy Dependence of the Breit-Wheeler process in Heavy-Ion Collisions and its Application to Nuclear Charge Radius Measurements

Authors: Xiaofeng Wang, James Daniel Brandenburg, Lijuan Ruan, Fenglan Shao, Zhangbu Xu, Chi Yang, Wangmei Zha

Abstract: The collision energy dependence of the cross section and the transverse momentum distribution of dielectrons from the Breit-Wheeler process in heavy-ion collisions are computed in the lowest-order QED and found to be sensitive to the nuclear charge distribution and the infrared-divergence of the ultra-Lorentz boosted Coulomb field. Within a given experimental kinematic acceptance, the cross sectio… ▽ More The collision energy dependence of the cross section and the transverse momentum distribution of dielectrons from the Breit-Wheeler process in heavy-ion collisions are computed in the lowest-order QED and found to be sensitive to the nuclear charge distribution and the infrared-divergence of the ultra-Lorentz boosted Coulomb field. Within a given experimental kinematic acceptance, the cross section is found to increase while the pair transverse momentum ($\sqrt{\langle p_{T}^{2} \rangle}$) decreases with increasing beam energy. We demonstrate that the transverse-momentum component of Weizsacker-Williams photons is due to the finite extent of the charge source and electric field component in the longitudinal direction. We further clarify the connection between the nuclear charge distribution and the kinematics of produced $e^+e^-$ from the Breit-Wheeler process, and propose a criterion for the validity of the Breit-Wheeler process in relativistic heavy-ion collisions. Following this approach we demonstrate that the experimental measurements of the Breit-Wheeler process in ultra-relativistic heavy-ion collisions can be used to quantitatively constrain the nuclear charge radius. The extracted parameters show sensitivity to the impact parameter dependence, and can be used to study the initial-state and final-state effects in hadronic interactions. △ Less

Submitted 4 April, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: 3 figures

arXiv:2205.14205 [pdf, other]

ALMA: Hierarchical Learning for Composite Multi-Agent Tasks

Authors: Shariq Iqbal, Robby Costales, Fei Sha

Abstract: Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and… ▽ More Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and each agent can meaningfully focus on one subtask to the exclusion of all else in the environment. In these composite tasks, successful policies can often be decomposed into two levels of decision-making: agents are allocated to specific subtasks and each agent acts productively towards their assigned subtask alone. This decomposed decision making provides a strong structural inductive bias, significantly reduces agent observation spaces, and encourages subtask-specific policies to be reused and composed during training, as opposed to treating each new composition of subtasks as unique. We introduce ALMA, a general learning method for taking advantage of these structured tasks. ALMA simultaneously learns a high-level subtask allocation policy and low-level agent policies. We demonstrate that ALMA learns sophisticated coordination behavior in a number of challenging environments, outperforming strong baselines. ALMA's modularity also enables it to better generalize to new environment configurations. Finally, we find that while ALMA can integrate separately trained allocation and action policies, the best performance is obtained only by training all components jointly. Our code is available at https://github.com/shariqiqbal2810/ALMA △ Less

Submitted 25 September, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022 Camera Ready

arXiv:2205.12253 [pdf, other]

Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

Authors: Linlu Qiu, Peter Shaw, Panupong Pasupat, Tianze Shi, Jonathan Herzig, Emily Pitler, Fei Sha, Kristina Toutanova

Abstract: Despite their strong performance on many tasks, pre-trained language models have been shown to struggle on out-of-distribution compositional generalization. Meanwhile, recent work has shown considerable improvements on many NLP tasks from model scaling. Can scaling up model size also improve compositional generalization in semantic parsing? We evaluate encoder-decoder models up to 11B parameters a… ▽ More Despite their strong performance on many tasks, pre-trained language models have been shown to struggle on out-of-distribution compositional generalization. Meanwhile, recent work has shown considerable improvements on many NLP tasks from model scaling. Can scaling up model size also improve compositional generalization in semantic parsing? We evaluate encoder-decoder models up to 11B parameters and decoder-only models up to 540B parameters, and compare model scaling curves for three different methods for applying a pre-trained language model to a new task: fine-tuning all parameters, prompt tuning, and in-context learning. We observe that fine-tuning generally has flat or negative scaling curves on out-of-distribution compositional generalization in semantic parsing evaluations. In-context learning has positive scaling curves, but is generally outperformed by much smaller fine-tuned models. Prompt-tuning can outperform fine-tuning, suggesting further potential improvements from scaling as it exhibits a more positive scaling curve. Additionally, we identify several error trends that vary with model scale. For example, larger models are generally better at modeling the syntax of the output space, but are also more prone to certain types of overfitting. Overall, our study highlights limitations of current techniques for effectively leveraging model scale for compositional generalization, while our analysis also suggests promising directions for future work. △ Less

Submitted 24 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: EMNLP 2022

arXiv:2205.11846 [pdf, other]

doi 10.1140/epjc/s10052-023-11609-5

Sterile Neutrino Portal Dark Matter in $ν$THDM

Authors: Ang Liu, Feng-Lan Shao, Zhi-Long Han, Yi Jin, Honglei Li

Abstract: In this paper, we propose the sterile neutrino portal dark matter in $ν$THDM. This model can naturally generate tiny neutrino mass with the neutrinophilic scalar doublet $Φ_ν$ and sterile neutrinos $N$ around TeV scale. Charged under a $Z_2$ symmetry, one Dirac fermion singlet $χ$ and one scalar singlet $φ$ are further introduced in the dark sector. The sterile neutrinos $N$ are the mediators betw… ▽ More In this paper, we propose the sterile neutrino portal dark matter in $ν$THDM. This model can naturally generate tiny neutrino mass with the neutrinophilic scalar doublet $Φ_ν$ and sterile neutrinos $N$ around TeV scale. Charged under a $Z_2$ symmetry, one Dirac fermion singlet $χ$ and one scalar singlet $φ$ are further introduced in the dark sector. The sterile neutrinos $N$ are the mediators between the DM and SM. Depending on the coupling strength, the DM can be either WIMP or FIMP. For the WIMP scenario, pair annihilation of DM into $NN$ is the key channel to satisfy various bounds, which could be tested at indirect detection experiments. For the FIMP scenario, besides the direct production of DM from freeze-in mechanism, contributions from late decay of NLSP is also important. When sterile neutrinos are heavier than the dark sector, NLSP is long-lived due to tiny mixing angle between sterile and light neutrinos. Constrains from free-streaming length, CMB, BBN and neutrino experiments are considered. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 27 pages, 14 figures

arXiv:2204.08641 [pdf, ps, other]

doi 10.1103/PhysRevC.106.034910

Averaged transverse momentum correlations of hadrons in relativistic heavy-ion collisions

Authors: Yan-ting Feng, Feng-lan Shao, Jun Song

Abstract: We compile experimental data for the averaged transverse momentum ($\left\langle p_{T}\right\rangle $) of proton, $Λ$, $Ξ^{-}$, $Ω^{-}$ and $φ$ at mid-rapidity in Au+Au collisions at $\sqrt{s_{NN}}=$ 200, 39, 27, 19.6, 11.5, 7.7 GeV and in Pb+Pb collisions at $\sqrt{s_{NN}}=$ 2.76 TeV, and find that experimental data of these hadrons exhibit systematic correlations. We apply a quark combination mo… ▽ More We compile experimental data for the averaged transverse momentum ($\left\langle p_{T}\right\rangle $) of proton, $Λ$, $Ξ^{-}$, $Ω^{-}$ and $φ$ at mid-rapidity in Au+Au collisions at $\sqrt{s_{NN}}=$ 200, 39, 27, 19.6, 11.5, 7.7 GeV and in Pb+Pb collisions at $\sqrt{s_{NN}}=$ 2.76 TeV, and find that experimental data of these hadrons exhibit systematic correlations. We apply a quark combination model with equal-velocity combination approximation to derive analytic formulas of hadronic $\left\langle p_{T}\right\rangle $ in the case of exponential form of quark $p_{T}$ spectra at hadronization. We use them to successfully explain the systematic correlations exhibited in $\left\langle p_{T}\right\rangle $ data of $pΛ$, $ΛΞ^{-}$, $Ξ^{-}Ω^{-}$ and $Ξ^{-}φ$ pairs. We also use them to successfully explain the regularity observed in $\left\langle p_{T}\right\rangle $ of these hadrons as the function of $(dN_{ch}/dy)/(N_{part}/2)$ at mid-rapidity in central heavy-ion collisions at both RHIC and LHC energies. Our results suggest that the constituent quark degrees of freedom and the equal-velocity combination of these constituent quarks at hadronization play important role in understanding the production of baryons and $φ$ meson at these RHIC and LHC energies. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: 10 pages, 7 figures

Journal ref: Phys. Rev. C 106, 034910, 2022

arXiv:2203.12686 [pdf, other]

Possibility Before Utility: Learning And Using Hierarchical Affordances

Authors: Robby Costales, Shariq Iqbal, Fei Sha

Abstract: Reinforcement learning algorithms struggle on tasks with complex hierarchical dependency structures. Humans and other intelligent agents do not waste time assessing the utility of every high-level action in existence, but instead only consider ones they deem possible in the first place. By focusing only on what is feasible, or "afforded", at the present moment, an agent can spend more time both ev… ▽ More Reinforcement learning algorithms struggle on tasks with complex hierarchical dependency structures. Humans and other intelligent agents do not waste time assessing the utility of every high-level action in existence, but instead only consider ones they deem possible in the first place. By focusing only on what is feasible, or "afforded", at the present moment, an agent can spend more time both evaluating the utility of and acting on what matters. To this end, we present Hierarchical Affordance Learning (HAL), a method that learns a model of hierarchical affordances in order to prune impossible subtasks for more effective learning. Existing works in hierarchical reinforcement learning provide agents with structural representations of subtasks but are not affordance-aware, and by grounding our definition of hierarchical affordances in the present state, our approach is more flexible than the multitude of approaches that ground their subtask dependencies in a symbolic history. While these logic-based methods often require complete knowledge of the subtask hierarchy, our approach is able to utilize incomplete and varying symbolic specifications. Furthermore, we demonstrate that relative to non-affordance-aware methods, HAL agents are better able to efficiently learn complex tasks, navigate environment stochasticity, and acquire diverse skills in the absence of extrinsic supervision -- all of which are hallmarks of human learning. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: ICLR 2022 camera-ready

arXiv:2202.12588 [pdf, other]

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

Authors: Feifei Shao, Yawei Luo, Ping Liu, Jie Chen, Yi Yang, Yulei Lu, Jun Xiao

Abstract: The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates wit… ▽ More The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates with similar shapes in a local area while missing other representative ones in the global environment. In this paper, we propose a new 3D region-based active learning method to tackle this problem. Dubbed SSDR-AL, our method groups the original point clouds into superpoints and incrementally selects the most informative and representative ones for label acquisition. We achieve the selection mechanism via a graph reasoning network that considers both the spatial and structural diversities of superpoints. To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints. Extensive experiments on two point cloud benchmarks demonstrate the effectiveness of SSDR-AL in the semantic segmentation task. Particularly, SSDR-AL significantly outperforms the baseline method and reduces the annotation cost by up to 63.0% and 24.0% when achieving 90% performance of fully supervised learning, respectively. △ Less

Submitted 18 April, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: 9 pages, 6 figures, 2 tables

arXiv:2202.07808 [pdf, other]

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

Authors: Sebastien M. R. Arnold, Pierre L'Ecuyer, Liyu Chen, Yi-fan Chen, Fei Sha

Abstract: Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration. These integrals are rarely analytically solvable and typically estimated with the Monte Carlo method, which induces high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We co… ▽ More Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration. These integrals are rarely analytically solvable and typically estimated with the Monte Carlo method, which induces high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We combine policy gradient methods with Randomized Quasi-Monte Carlo, yielding variance-reduced formulations of policy gradient and actor-critic algorithms. These formulations are effective for policy evaluation and policy improvement, as they outperform state-of-the-art algorithms on standardized continuous control benchmarks. Our empirical analyses validate the intuition that replacing Monte Carlo with Quasi-Monte Carlo yields significantly more accurate gradient estimates. △ Less

Submitted 21 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: AISTATS 2022 camera ready; more info at: http://seba1511.net/projects/qrl/

Showing 1–50 of 192 results for author: Sha, F