-
Search for Neutrinoless Double-Beta Decay of $^{76}$Ge with a Natural Broad Energy Germanium Detector
Authors:
CDEX collaboration,
W. H. Dai,
H. Ma,
Q. Yue,
Z. She,
K. J. Kang,
Y. J. Li,
M. Agartioglu,
H. P. An,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang
, et al. (61 additional authors not shown)
Abstract:
A natural broad energy germanium (BEGe) detector is operated in the China Jinping Underground Laboratory (CJPL) for a feasibility study of building the next generation experiment of the neutrinoless double-beta (0{$νββ$}) decay of $^{76}$Ge. The setup of the prototype facility, characteristics of the BEGe detector, background reduction methods, and data analysis are described in this paper. A back…
▽ More
A natural broad energy germanium (BEGe) detector is operated in the China Jinping Underground Laboratory (CJPL) for a feasibility study of building the next generation experiment of the neutrinoless double-beta (0{$νββ$}) decay of $^{76}$Ge. The setup of the prototype facility, characteristics of the BEGe detector, background reduction methods, and data analysis are described in this paper. A background index of 6.4$\times$10$^{-3}$ counts/(keV$\cdot$kg$\cdot$day) is achieved and 1.86 times lower than our previous result of the CDEX-1 detector. No signal is observed with an exposure of 186.4 kg$\cdot$day, thus a limit on the half life of $^{76}$Ge 0$νββ$ decay is set at T$_{1/2}^{0ν}$ $>$ 5.62$\times$10$^{22}$ yr at 90% C.L.. The limit corresponds to an effective Majorana neutrino mass in the range of 4.6 $\sim$ 10.3 eV, dependent on the nuclear matrix elements.
△ Less
Submitted 5 August, 2022; v1 submitted 21 May, 2022;
originally announced May 2022.
-
Time dependent field correlators from holographic EPR pairs
Authors:
Shoichi Kawamoto,
Da-Shin Lee,
Chen-Pin Yeh
Abstract:
We study the correlators of the fields that couple to the quark and anti-quark EPR pair in the super Yang-Mills theory using the holographic description, which is a string in AdS space with its two ends anchoring on the boundaries. We consider the cases that the endpoints of the string are static and that the endpoints are uniformly accelerated in opposite directions where the exact solutions for…
▽ More
We study the correlators of the fields that couple to the quark and anti-quark EPR pair in the super Yang-Mills theory using the holographic description, which is a string in AdS space with its two ends anchoring on the boundaries. We consider the cases that the endpoints of the string are static and that the endpoints are uniformly accelerated in opposite directions where the exact solutions for the string's profiles are available. In both cases, the two-point correlators of the boundary field, described by the linearized perturbations in the worldsheet, can also be derived exactly where we obtain the all-time evolution of the correlators. In the case of the accelerating string, the induced geometry on the string worldsheet has the causal structure of a two-sided AdS black hole with a wormhole connecting two causally disconnected boundaries, which can be a realization of the ER=EPR conjecture. We find that causality plays a crucial role in determining the nature of the dispersion relation of the particle and the feature of the induced mutual interaction between two particles from the field. In the case that two boundaries of the worldsheet are causally disconnected, the induced effect from the field gives the dissipative dynamics of each particle with no dependence on the distance between two particles, and the induced mutual coupling between them vanishes in the late times, following a power law. When two ends are causally connected, the induced dispersion relation becomes non-dissipative in the late times. Here, we will also comment on the implications of our findings to the entangled particle dynamics and the ER=EPR conjecture.
△ Less
Submitted 11 August, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Jets and Jet Substructure at Future Colliders
Authors:
Ben Nachman,
Salvatore Rappoccio,
Nhan Tran,
Johan Bonilla,
Grigorios Chachamis,
Barry M. Dillon,
Sergei V. Chekanov,
Robin Erbacher,
Loukas Gouskos,
Andreas Hinzmann,
Stefan Höche,
B. Todd Huffman,
Ashutosh. V. Kotwal,
Deepak Kar,
Roman Kogler,
Clemens Lange,
Matt LeBlanc,
Roy Lemmon,
Christine McLean,
Mark S. Neubauer,
Tilman Plehn,
Debarati Roy,
Giordan Stark,
Jennifer Roloff,
Marcel Vos
, et al. (2 additional authors not shown)
Abstract:
Even though jet substructure was not an original design consideration for the Large Hadron Collider (LHC) experiments, it has emerged as an essential tool for the current physics program. We examine the role of jet substructure on the motivation for and design of future energy frontier colliders. In particular, we discuss the need for a vibrant theory and experimental research and development prog…
▽ More
Even though jet substructure was not an original design consideration for the Large Hadron Collider (LHC) experiments, it has emerged as an essential tool for the current physics program. We examine the role of jet substructure on the motivation for and design of future energy frontier colliders. In particular, we discuss the need for a vibrant theory and experimental research and development program to extend jet substructure physics into the new regimes probed by future colliders. Jet substructure has organically evolved with a close connection between theorists and experimentalists and has catalyzed exciting innovations in both communities. We expect such developments will play an important role in the future energy frontier physics program.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Precision timing for collider-experiment-based calorimetry
Authors:
S. V. Chekanov,
F. Simon,
V. Boudry,
W. Chung,
P. W. Gorham,
M. Nguyen,
C. G. Tully,
S. C. Eno,
Y. Lai,
A. V. Kotwal,
S. Ko,
I. Laktineh,
S. Lee,
J. S. H. Lee,
M. T. Lucchini,
R. Prechelt,
H. Yoo,
C. -H Yeh,
S. -S. Yu,
G. S. Varner,
R. Zhu
Abstract:
In this White Paper for the 2021 Snowmass process, we discuss aspects of precision timing within electromagnetic and hadronic calorimeter systems for high-energy physics collider experiments. Areas of applications include particle identification, event and object reconstruction, and pileup mitigation. Two different system options are considered, namely cell-level timing capabilities covering the f…
▽ More
In this White Paper for the 2021 Snowmass process, we discuss aspects of precision timing within electromagnetic and hadronic calorimeter systems for high-energy physics collider experiments. Areas of applications include particle identification, event and object reconstruction, and pileup mitigation. Two different system options are considered, namely cell-level timing capabilities covering the full detector volume, and dedicated timing layers integrated in calorimeter systems. A selection of technologies for the different approaches is also discussed.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Faith-Shap: The Faithful Shapley Interaction Index
Authors:
Che-Ping Tsai,
Chih-Kuan Yeh,
Pradeep Ravikumar
Abstract:
Shapley values, which were originally designed to assign attributions to individual players in coalition games, have become a commonly used approach in explainable machine learning to provide attributions to input features for black-box machine learning models. A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties. However, extending the Shaple…
▽ More
Shapley values, which were originally designed to assign attributions to individual players in coalition games, have become a commonly used approach in explainable machine learning to provide attributions to input features for black-box machine learning models. A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties. However, extending the Shapley value to assigning attributions to interactions rather than individual players, an interaction index, is non-trivial: as the natural set of axioms for the original Shapley values, extended to the context of interactions, no longer specify a unique interaction index. Many proposals thus introduce additional less ''natural'' axioms, while sacrificing the key axiom of efficiency, in order to obtain unique interaction indices. In this work, rather than introduce additional conflicting axioms, we adopt the viewpoint of Shapley values as coefficients of the most faithful linear approximation to the pseudo-Boolean coalition game value function. By extending linear to $\ell$-order polynomial approximations, we can then define the general family of faithful interaction indices. We show that by additionally requiring the faithful interaction indices to satisfy interaction-extensions of the standard individual Shapley axioms (dummy, symmetry, linearity, and efficiency), we obtain a unique Faithful Shapley Interaction index, which we denote Faith-Shap, as a natural generalization of the Shapley value to interactions. We then provide some illustrative contrasts of Faith-Shap with previously proposed interaction indices, and further investigate some of its interesting algebraic properties. We further show the computational efficiency of computing Faith-Shap, together with some additional qualitative insights, via some illustrative experiments.
△ Less
Submitted 22 March, 2023; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Human-Centered Concept Explanations for Neural Networks
Authors:
Chih-Kuan Yeh,
Been Kim,
Pradeep Ravikumar
Abstract:
Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which…
▽ More
Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image are important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Threading the Needle of On and Off-Manifold Value Functions for Shapley Explanations
Authors:
Chih-Kuan Yeh,
Kuan-Yun Lee,
Frederick Liu,
Pradeep Ravikumar
Abstract:
A popular explainable AI (XAI) approach to quantify feature importance of a given model is via Shapley values. These Shapley values arose in cooperative games, and hence a critical ingredient to compute these in an XAI context is a so-called value function, that computes the "value" of a subset of features, and which connects machine learning models to cooperative games. There are many possible ch…
▽ More
A popular explainable AI (XAI) approach to quantify feature importance of a given model is via Shapley values. These Shapley values arose in cooperative games, and hence a critical ingredient to compute these in an XAI context is a so-called value function, that computes the "value" of a subset of features, and which connects machine learning models to cooperative games. There are many possible choices for such value functions, which broadly fall into two categories: on-manifold and off-manifold value functions, which take an observational and an interventional viewpoint respectively. Both these classes however have their respective flaws, where on-manifold value functions violate key axiomatic properties and are computationally expensive, while off-manifold value functions pay less heed to the data manifold and evaluate the model on regions for which it wasn't trained. Thus, there is no consensus on which class of value functions to use. In this paper, we show that in addition to these existing issues, both classes of value functions are prone to adversarial manipulations on low density regions. We formalize the desiderata of value functions that respect both the model and the data manifold in a set of axioms and are robust to perturbation on off-manifold regions, and show that there exists a unique value function that satisfies these axioms, which we term the Joint Baseline value function, and the resulting Shapley value the Joint Baseline Shapley (JBshap), and validate the effectiveness of JBshap in experiments.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
First is Better Than Last for Language Data Influence
Authors:
Chih-Kuan Yeh,
Ankur Taly,
Mukund Sundararajan,
Frederick Liu,
Pradeep Ravikumar
Abstract:
The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer o…
▽ More
The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains "shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by much. To mitigate this, we propose a technique called TracIn-WE that modifies a method called TracIn to operate on the word embedding layer instead of the last layer, where the cancellation effect is less severe. One potential concern is that influence based on the word embedding layer may not encode sufficient high level information. However, we find that gradients (unlike embeddings) do not suffer from this, possibly because they chain through higher layers. We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer significantly on the case deletion evaluation on three language classification tasks for different models. In addition, TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input, a further aid in debugging.
△ Less
Submitted 27 October, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Relativistic Self-Consistent $GW$: Exact Two-Component Formalism with One-Electron Approximation for Solids
Authors:
Chia-Nan Yeh,
Avijit Shee,
Qiming Sun,
Emanuel Gull,
Dominika Zgid
Abstract:
We present a formulation of relativistic self-consistent $GW$ for solids based on the exact two-component formalism with one-electron approximation (X2C1e) and non-relativistic Coulomb interactions. Our theory allows us to study scalar relativistic effects, spin-orbit coupling, and the interplay of relativistic effects with electron correlation without adjustable parameters. Our all-electron imple…
▽ More
We present a formulation of relativistic self-consistent $GW$ for solids based on the exact two-component formalism with one-electron approximation (X2C1e) and non-relativistic Coulomb interactions. Our theory allows us to study scalar relativistic effects, spin-orbit coupling, and the interplay of relativistic effects with electron correlation without adjustable parameters. Our all-electron implementation is fully $ab$ $initio$ and does not require a pseudopotential constructed from atomic calculations. We examine the effect of the X2C1e approximation by comparison to the established four-component formalism and reach excellent agreement. The simplicity of X2C1e enables the construction of higher order theories, such as embedding theories, on top of perturbative calculations.
△ Less
Submitted 19 May, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
MeltpoolNet: Melt pool Characteristic Prediction in Metal Additive Manufacturing Using Machine Learning
Authors:
Parand Akbari,
Francis Ogoke,
Ning-Yu Kao,
Kazem Meidani,
Chun-Yu Yeh,
William Lee,
Amir Barati Farimani
Abstract:
Characterizing meltpool shape and geometry is essential in metal Additive Manufacturing (MAM) to control the printing process and avoid defects. Predicting meltpool flaws based on process parameters and powder material is difficult due to the complex nature of MAM process. Machine learning (ML) techniques can be useful in connecting process parameters to the type of flaws in the meltpool. In this…
▽ More
Characterizing meltpool shape and geometry is essential in metal Additive Manufacturing (MAM) to control the printing process and avoid defects. Predicting meltpool flaws based on process parameters and powder material is difficult due to the complex nature of MAM process. Machine learning (ML) techniques can be useful in connecting process parameters to the type of flaws in the meltpool. In this work, we introduced a comprehensive framework for benchmarking ML for melt pool characterization. An extensive experimental dataset has been collected from more than 80 MAM articles containing MAM processing conditions, materials, meltpool dimensions, meltpool modes and flaw types. We introduced physics-aware MAM featurization, versatile ML models, and evaluation metrics to create a comprehensive learning framework for meltpool defect and geometry prediction. This benchmark can serve as a basis for melt pool control and process optimization. In addition, data-driven explicit models have been identified to estimate meltpool geometry from process parameters and material properties which outperform Rosenthal estimation for meltpool geometry while maintaining interpretability.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Learning-From-Disagreement: A Model Comparison and Visual Analytics Framework
Authors:
Junpeng Wang,
Liang Wang,
Yan Zheng,
Chin-Chia Michael Yeh,
Shubham Jain,
Wei Zhang
Abstract:
With the fast-growing number of classification models being produced every day, numerous model interpretation and comparison solutions have also been introduced. For example, LIME and SHAP can interpret what input features contribute more to a classifier's output predictions. Different numerical metrics (e.g., accuracy) can be used to easily compare two classifiers. However, few works can interpre…
▽ More
With the fast-growing number of classification models being produced every day, numerous model interpretation and comparison solutions have also been introduced. For example, LIME and SHAP can interpret what input features contribute more to a classifier's output predictions. Different numerical metrics (e.g., accuracy) can be used to easily compare two classifiers. However, few works can interpret the contribution of a data feature to a classifier in comparison with its contribution to another classifier. This comparative interpretation can help to disclose the fundamental difference between two classifiers, select classifiers in different feature conditions, and better ensemble two classifiers. To accomplish it, we propose a learning-from-disagreement (LFD) framework to visually compare two classification models. Specifically, LFD identifies data instances with disagreed predictions from two compared classifiers and trains a discriminator to learn from the disagreed instances. As the two classifiers' training features may not be available, we train the discriminator through a set of meta-features proposed based on certain hypotheses of the classifiers to probe their behaviors. Interpreting the trained discriminator with the SHAP values of different meta-features, we provide actionable insights into the compared classifiers. Also, we introduce multiple metrics to profile the importance of meta-features from different perspectives. With these metrics, one can easily identify meta-features with the most complementary behaviors in two classifiers, and use them to better ensemble the classifiers. We focus on binary classification models in the financial services and advertising industry to demonstrate the efficacy of our proposed framework and visualizations.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
POPPINS : A Population-Based Digital Spiking Neuromorphic Processor with Integer Quadratic Integrate-and-Fire Neurons
Authors:
Zuo-Wei Yeh,
Chia-Hua Hsu,
Alexander White,
Chen-Fu Yeh,
Wen-Chieh Wu,
Cheng-Te Wang,
Chung-Chuan Lo,
Kea-Tiong Tang
Abstract:
The inner operations of the human brain as a biological processing system remain largely a mystery. Inspired by the function of the human brain and based on the analysis of simple neural network systems in other species, such as Drosophila, neuromorphic computing systems have attracted considerable interest. In cellular-level connectomics research, we can identify the characteristics of biological…
▽ More
The inner operations of the human brain as a biological processing system remain largely a mystery. Inspired by the function of the human brain and based on the analysis of simple neural network systems in other species, such as Drosophila, neuromorphic computing systems have attracted considerable interest. In cellular-level connectomics research, we can identify the characteristics of biological neural network, called population, which constitute not only recurrent fullyconnection in network, also an external-stimulus and selfconnection in each neuron. Relying on low data bandwidth of spike transmission in network and input data, Spiking Neural Networks exhibit low-latency and low-power design. In this study, we proposed a configurable population-based digital spiking neuromorphic processor in 180nm process technology with two configurable hierarchy populations. Also, these neurons in the processor can be configured as novel models, integer quadratic integrate-and-fire neuron models, which contain an unsigned 8-bit membrane potential value. The processor can implement intelligent decision making for avoidance in real-time. Moreover, the proposed approach enables the developments of biomimetic neuromorphic system and various low-power, and low-latency inference processing applications.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Tree-based Regression for Interval-valued Data
Authors:
Chih-Ching Yeh,
Yan Sun,
Adele Cutler
Abstract:
Regression methods for interval-valued data have been increasingly studied in recent years. As most of the existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and therefore development of nonlinear regression tools for interval-valued data is crucial. In this paper, we propose a tree-based regression method for interval-valued data,…
▽ More
Regression methods for interval-valued data have been increasingly studied in recent years. As most of the existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and therefore development of nonlinear regression tools for interval-valued data is crucial. In this paper, we propose a tree-based regression method for interval-valued data, which is well applicable to both linear and nonlinear problems. Unlike linear regression models that usually require additional constraints to ensure positivity of the predicted interval length, the proposed method estimates the regression function in a nonparametric way, so the predicted length is naturally positive without any constraints. A simulation study is conducted that compares our method to popular existing regression models for interval-valued data under both linear and nonlinear settings. Furthermore, a real data example is presented where we apply our method to analyze price range data of the Dow Jones Industrial Average index and its component stocks.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Constraints on sub-GeV dark matter boosted by cosmic rays from the CDEX-10 experiment at the China Jinping Underground Laboratory
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
M. Agartioglu,
H. P. An,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
X. Y. Guo,
Q. J. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang,
H. B. Li
, et al. (60 additional authors not shown)
Abstract:
We present new constraints on light dark matter boosted by cosmic rays (CRDM) using the 205.4 kg day data of the CDEX-10 experiment conducted at the China Jinping Underground Laboratory. The Monte Carlo simulation package CJPL\_ESS was employed to evaluate the Earth shielding effect. Several key factors have been introduced and discussed in our CRDM analysis, including the contributions from heavi…
▽ More
We present new constraints on light dark matter boosted by cosmic rays (CRDM) using the 205.4 kg day data of the CDEX-10 experiment conducted at the China Jinping Underground Laboratory. The Monte Carlo simulation package CJPL\_ESS was employed to evaluate the Earth shielding effect. Several key factors have been introduced and discussed in our CRDM analysis, including the contributions from heavier CR nuclei than proton and helium, the inhomogeneity of CR distribution, and the impact of the form factor in the Earth attenuation calculation. Our result excludes the dark matter--nucleon elastic scattering cross-section region from $1.7\times 10^{-30}$ to $10^{-26}~\rm cm^2$ for dark matter of 10 keV$/c^2$ to 1 GeV$/c^2$.
△ Less
Submitted 16 September, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Error-bounded Approximate Time Series Joins Using Compact Dictionary Representations of Time Series
Authors:
Chin-Chia Michael Yeh,
Yan Zheng,
Junpeng Wang,
Huiyuan Chen,
Zhongfang Zhuang,
Wei Zhang,
Eamonn Keogh
Abstract:
The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both…
▽ More
The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both conserved and anomalous structures in the data. Since the introduction of the matrix profile five years ago, multiple efforts have been made to speed up the computation with approximate joins; however, the majority of these efforts only focus on self-joins. In this work, we show that it is possible to efficiently perform approximate inter-time series similarity joins with error bounded guarantees by creating a compact "dictionary" representation of time series. Using the dictionary representation instead of the original time series, we are able to improve the throughput of an anomaly mining system by at least 20X, with essentially no decrease in accuracy. As a side effect, the dictionaries also summarize the time series in a semantically meaningful way and can provide intuitive and actionable insights. We demonstrate the utility of our dictionary-based inter-time series similarity joins on domains as diverse as medicine and transportation.
△ Less
Submitted 5 November, 2023; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Iterative subspace algorithms for finite-temperature solution of Dyson equation
Authors:
Pavel Pokhilko,
Chia-Nan Yeh,
Dominika Zgid
Abstract:
One-particle Green's functions obtained from the self-consistent solution of the Dyson equation can be employed in evaluation of spectroscopic and thermodynamic properties for both molecules and solids. However, typical acceleration techniques used in the traditional quantum chemistry self-consistent algorithms cannot be easily deployed for the Green's function methods, because of non-convex grand…
▽ More
One-particle Green's functions obtained from the self-consistent solution of the Dyson equation can be employed in evaluation of spectroscopic and thermodynamic properties for both molecules and solids. However, typical acceleration techniques used in the traditional quantum chemistry self-consistent algorithms cannot be easily deployed for the Green's function methods, because of non-convex grand potential functional and non-idempotent density matrix. Moreover, the inclusion of correlation effects in the form of the self-energy matrix and changing chemical potential or fluctuations in the number of particles can make the optimization problem more difficult. In this paper, we study acceleration techniques to target the self-consistent solution of the Dyson equation directly. We use the direct inversion in the iterative subspace (DIIS), the least-squared commutator in the iterative subspace (LCIIS), and the Krylov space accelerated inexact Newton method (KAIN). We observe that the definition of the residual has a significant impact on the convergence of the iterative procedure. Based on the Dyson equation, we generalize the concept of the commutator residual used in DIIS (CDIIS) and LCIIS, and compare it with the difference residual used in DIIS and KAIN. The commutator residuals outperform the difference residuals for all considered molecular and solid systems within both GW and GF2. The generalized CDIIS and LCIIS methods successfully converged restricted GF2 calculations for a number of strongly correlated systems, which could not be converged before. We also provide practical recommendations to guide convergence in such pathological cases.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Studies of the Earth shielding effect to direct dark matter searches at the China Jinping Underground Laboratory
Authors:
Z. Z. Liu,
L. T. Yang,
Q. Yue,
C. H. Yeh,
K. J. Kang,
Y. J. Li,
M. Agartioglu,
H. P. An,
J. P. Chang,
J. H. Chen,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
X. Y. Guo,
Q. J. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia
, et al. (58 additional authors not shown)
Abstract:
Dark matter direct detection experiments mostly operate at deep underground laboratories. It is necessary to consider shielding effect of the Earth, especially for dark matter particles interacting with a large cross section. We analyzed and simulated the Earth shielding effect for dark matter at the China Jinping Underground Laboratory (CJPL) with a simulation package, CJPL Earth Shielding Simula…
▽ More
Dark matter direct detection experiments mostly operate at deep underground laboratories. It is necessary to consider shielding effect of the Earth, especially for dark matter particles interacting with a large cross section. We analyzed and simulated the Earth shielding effect for dark matter at the China Jinping Underground Laboratory (CJPL) with a simulation package, CJPL Earth Shielding Simulation code (CJPL\_ESS), which is applicable to other underground locations. The further constraints on the $χ$-N cross section exclusion regions are derived based on the studies with CDEX experiment data.
△ Less
Submitted 9 March, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Response of a CMS HGCAL silicon-pad electromagnetic calorimeter prototype to 20-300 GeV positrons
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
F. Alam Khan,
M. Alhusseini,
J. Alison,
A. Alpana,
G. Altopp,
M. Alyari,
S. An,
S. Anagul,
I. Andreev,
P. Aspell,
I. O. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
S. Bannerjee,
P. Bargassa,
D. Barney,
F. Beaudette
, et al. (364 additional authors not shown)
Abstract:
The Compact Muon Solenoid Collaboration is designing a new high-granularity endcap calorimeter, HGCAL, to be installed later this decade. As part of this development work, a prototype system was built, with an electromagnetic section consisting of 14 double-sided structures, providing 28 sampling layers. Each sampling layer has an hexagonal module, where a multipad large-area silicon sensor is glu…
▽ More
The Compact Muon Solenoid Collaboration is designing a new high-granularity endcap calorimeter, HGCAL, to be installed later this decade. As part of this development work, a prototype system was built, with an electromagnetic section consisting of 14 double-sided structures, providing 28 sampling layers. Each sampling layer has an hexagonal module, where a multipad large-area silicon sensor is glued between an electronics circuit board and a metal baseplate. The sensor pads of approximately 1 cm$^2$ are wire-bonded to the circuit board and are readout by custom integrated circuits. The prototype was extensively tested with beams at CERN's Super Proton Synchrotron in 2018. Based on the data collected with beams of positrons, with energies ranging from 20 to 300 GeV, measurements of the energy resolution and linearity, the position and angular resolutions, and the shower shapes are presented and compared to a detailed Geant4 simulation.
△ Less
Submitted 31 March, 2022; v1 submitted 12 November, 2021;
originally announced November 2021.
-
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning
Authors:
Christopher Yeh,
Chenlin Meng,
Sherrie Wang,
Anne Driscoll,
Erik Rozi,
Patrick Liu,
Jihyeon Lee,
Marshall Burke,
David B. Lobell,
Stefano Ermon
Abstract:
Progress toward the United Nations Sustainable Development Goals (SDGs) has been hindered by a lack of data on key environmental and socioeconomic indicators, which historically have come from ground surveys with sparse temporal and spatial coverage. Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellit…
▽ More
Progress toward the United Nations Sustainable Development Goals (SDGs) has been hindered by a lack of data on key environmental and socioeconomic indicators, which historically have come from ground surveys with sparse temporal and spatial coverage. Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media, to provide insights into progress toward SDGs. Despite promising early results, approaches to using such data for SDG measurement thus far have largely evaluated on different datasets or used inconsistent evaluation metrics, making it hard to understand whether performance is improving and where additional research would be most fruitful. Furthermore, processing satellite and ground survey data requires domain knowledge that many in the machine learning community lack. In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs, including tasks related to economic development, agriculture, health, education, water and sanitation, climate action, and life on land. Datasets for 11 of the 15 tasks are released publicly for the first time. Our goals for SustainBench are to (1) lower the barriers to entry for the machine learning community to contribute to measuring and achieving the SDGs; (2) provide standard benchmarks for evaluating machine learning models on tasks across a variety of SDGs; and (3) encourage the development of novel machine learning methods where improved model performance facilitates progress towards the SDGs.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
TorchAudio: Building Blocks for Audio and Speech Processing
Authors:
Yao-Yuan Yang,
Moto Hira,
Zhaoheng Ni,
Anjali Chourdia,
Artyom Astafurov,
Caroline Chen,
Ching-Feng Yeh,
Christian Puhrsch,
David Pollack,
Dmitriy Genzel,
Donny Greenberg,
Edward Z. Yang,
Jason Lian,
Jay Mahadeokar,
Jeff Hwang,
Ji Chen,
Peter Goldsborough,
Prabhat Roy,
Sean Narenthiran,
Shinji Watanabe,
Soumith Chintala,
Vincent Quenneville-Bélair,
Yangyang Shi
Abstract:
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif…
▽ More
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically differentiable, and production-ready. TorchAudio can be easily installed from Python Package Index repository and the source code is publicly available under a BSD-2-Clause License (as of September 2021) at https://github.com/pytorch/audio. In this document, we provide an overview of the design principles, functionalities, and benchmarks of TorchAudio. We also benchmark our implementation of several audio and speech operations and models. We verify through the benchmarks that our implementations of various operations and models are valid and perform similarly to other publicly available implementations.
△ Less
Submitted 16 February, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
A Non-linear Differentiable Model for Stormwater-based Irrigation of a Green Roof in Toronto
Authors:
Chia-Hui Yeh,
Margaret P. Chapman
Abstract:
Green infrastructure has potential to alleviate the environmental impact of rapidly growing cities. This potential has inspired laws in Toronto that require the inclusion of rooftops with large vegetation beds, called green roofs, into sufficiently sized construction projects. We study the problem of reusing stormwater to irrigate a green roof in Toronto, where potable water is the current irrigat…
▽ More
Green infrastructure has potential to alleviate the environmental impact of rapidly growing cities. This potential has inspired laws in Toronto that require the inclusion of rooftops with large vegetation beds, called green roofs, into sufficiently sized construction projects. We study the problem of reusing stormwater to irrigate a green roof in Toronto, where potable water is the current irrigation source. The vision is that widespread reuse of stormwater runoff for irrigation of green roofs and other purposes can reduce sewer overflow volumes without over-building (with the added benefit of conserving potable water). Towards this vision, our goal is to develop and evaluate two pump controllers for transporting stormwater to the green roof of interest in simulation. A key contribution is our development of a site-specific non-linear model for stormwater flow using smoothing techniques that permits linearization and a standard model predictive controller (MPC). We compare the efficacy of the MPC, which anticipates the weather, and an on/off controller, which is reactive rather than anticipative, for the site in simulation. With further study, we are hopeful that this research will advance control systems technology to improve the performance of green and stormwater infrastructure in growing urban areas.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Decoupled Contrastive Learning
Authors:
Chun-Hsiao Yeh,
Cheng-Yao Hong,
Yen-Chi Hsu,
Tyng-Luh Liu,
Yubei Chen,
Yann LeCun
Abstract:
Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, inclu…
▽ More
Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, including large sample batches, extensive training epochs, etc. We are thus motivated to tackle these issues and establish a simple, efficient, yet competitive baseline of contrastive learning. Specifically, we identify, from theoretical and empirical studies, a noticeable negative-positive-coupling (NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning efficiency concerning the batch size. By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the denominator and significantly improves the learning efficiency. DCL achieves competitive performance with less sensitivity to sub-optimal hyperparameters, requiring neither large batches in SimCLR, momentum encoding in MoCo, or large epochs. We demonstrate with various benchmarks while manifesting robustness as much less sensitive to suboptimal hyperparameters. Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by 6.4%. Further, DCL can be combined with the SOTA contrastive learning method, NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400 epochs, which represents a new SOTA in contrastive learning. We believe DCL provides a valuable baseline for future contrastive SSL studies.
△ Less
Submitted 29 July, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Attack as the Best Defense: Nullifying Image-to-image Translation GANs via Limit-aware Adversarial Attack
Authors:
Chin-Yuan Yeh,
Hsi-Wen Chen,
Hong-Han Shuai,
De-Nian Yang,
Ming-Syan Chen
Abstract:
With the successful creation of high-quality image-to-image (Img2Img) translation GANs comes the non-ethical applications of DeepFake and DeepNude. Such misuses of img2img techniques present a challenging problem for society. In this work, we tackle the problem by introducing the Limit-Aware Self-Guiding Gradient Sliding Attack (LaS-GSA). LaS-GSA follows the Nullifying Attack to cancel the img2img…
▽ More
With the successful creation of high-quality image-to-image (Img2Img) translation GANs comes the non-ethical applications of DeepFake and DeepNude. Such misuses of img2img techniques present a challenging problem for society. In this work, we tackle the problem by introducing the Limit-Aware Self-Guiding Gradient Sliding Attack (LaS-GSA). LaS-GSA follows the Nullifying Attack to cancel the img2img translation process under a black-box setting. In other words, by processing input images with the proposed LaS-GSA before publishing, any targeted img2img GANs can be nullified, preventing the model from maliciously manipulating the images. To improve efficiency, we introduce the limit-aware random gradient-free estimation and the gradient sliding mechanism to estimate the gradient that adheres to the adversarial limit, i.e., the pixel value limitations of the adversarial example. Theoretical justifications validate how the above techniques prevent inefficiency caused by the adversarial limit in both the direction and the step length. Furthermore, an effective self-guiding prior is extracted solely from the threat model and the target image to efficiently leverage the prior information and guide the gradient estimation process. Extensive experiments demonstrate that LaS-GSA requires fewer queries to nullify the image translation process with higher success rates than 4 state-of-the-art black-box methods.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks
Authors:
Chin-Chia Michael Yeh,
Zhongfang Zhuang,
Junpeng Wang,
Yan Zheng,
Javid Ebrahimi,
Ryan Mercer,
Liang Wang,
Wei Zhang
Abstract:
Predicting metrics associated with entities' transnational behavior within payment processing networks is essential for system monitoring. Multivariate time series, aggregated from the past transaction history, can provide valuable insights for such prediction. The general multivariate time series prediction problem has been well studied and applied across several domains, including manufacturing,…
▽ More
Predicting metrics associated with entities' transnational behavior within payment processing networks is essential for system monitoring. Multivariate time series, aggregated from the past transaction history, can provide valuable insights for such prediction. The general multivariate time series prediction problem has been well studied and applied across several domains, including manufacturing, medical, and entomology. However, new domain-related challenges associated with the data such as concept drift and multi-modality have surfaced in addition to the real-time requirements of handling the payment transaction data at scale. In this work, we study the problem of multivariate time series prediction for estimating transaction metrics associated with entities in the payment transaction database. We propose a model with five unique components to estimate the transaction metrics from multi-modality data. Four of these components capture interaction, temporal, scale, and shape perspectives, and the fifth component fuses these perspectives together. We also propose a hybrid offline/online training scheme to address concept drift in the data and fulfill the real-time requirements. Combining the estimation model with a graphical user interface, the prototype transaction metric estimation system has demonstrated its potential benefit as a tool for improving a payment processing company's system monitoring capability.
△ Less
Submitted 22 September, 2021; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Unsupervised Person Re-Identification: A Systematic Survey of Challenges and Solutions
Authors:
Xiangtan Lin,
Pengzhen Ren,
Chung-Hsing Yeh,
Lina Yao,
Andy Song,
Xiaojun Chang
Abstract:
Person re-identification (Re-ID) has been a significant research topic in the past decade due to its real-world applications and research significance. While supervised person Re-ID methods achieve superior performance over unsupervised counterparts, they can not scale to large unlabelled datasets and new domains due to the prohibitive labelling cost. Therefore, unsupervised person Re-ID has drawn…
▽ More
Person re-identification (Re-ID) has been a significant research topic in the past decade due to its real-world applications and research significance. While supervised person Re-ID methods achieve superior performance over unsupervised counterparts, they can not scale to large unlabelled datasets and new domains due to the prohibitive labelling cost. Therefore, unsupervised person Re-ID has drawn increasing attention for its potential to address the scalability issue in person Re-ID. Unsupervised person Re-ID is challenging primarily due to lacking identity labels to supervise person feature learning. The corresponding solutions are diverse and complex, with various merits and limitations. Therefore, comprehensive surveys on this topic are essential to summarise challenges and solutions to foster future research. Existing person Re-ID surveys have focused on supervised methods from classifications and applications but lack detailed discussion on how the person Re-ID solutions address the underlying challenges. This survey review recent works on unsupervised person Re-ID from the perspective of challenges and solutions. Specifically, we provide an in-depth analysis of highly influential methods considering the four significant challenges in unsupervised person Re-ID: 1) lacking ground-truth identity labels to supervise person feature learning; 2) learning discriminative person features with pseudo-supervision; 3) learning cross-camera invariant person feature, and 4) the domain shift between datasets. We summarise and analyse evaluation results and provide insights on the effectiveness of the solutions. Finally, we discuss open issues and suggest some promising future research directions.
△ Less
Submitted 1 October, 2021; v1 submitted 31 August, 2021;
originally announced September 2021.
-
Exploring Coupled Cluster Green's function as a method for treating system and environment in Green's function embedding methods
Authors:
Avijit Shee,
Chia-Nan Yeh,
Dominika Zgid
Abstract:
Within the self-energy embedding theory (SEET) framework, we study coupled cluster Green's function (GFCC) method in two different contexts: as a method to treat either the system or environment present in the embedding construction. Our study reveals that when GFCC is used to treat the environment we do not see improvement in total energies in comparison to the coupled cluster method itself. To r…
▽ More
Within the self-energy embedding theory (SEET) framework, we study coupled cluster Green's function (GFCC) method in two different contexts: as a method to treat either the system or environment present in the embedding construction. Our study reveals that when GFCC is used to treat the environment we do not see improvement in total energies in comparison to the coupled cluster method itself. To rationalize this puzzling result, we analyze the performance of GFCC as an impurity solver with a series of transition metal oxides. These studies shed light on strength and weaknesses of such a solver and demonstrate that such a solver gives very accurate results when the size of the impurity is small. We investigate if it is possible to achieve a systematic accuracy of the embedding solution when we increase the size of the impurity problem. We found that in such a case, the performance of the solver worsens, both in terms of finding the ground state solution of the impurity problem as well as the self-energies produced. We concluded that increasing the rank of GFCC solver is necessary to be able to enlarge impurity problems and achieve a reliable accuracy. We also have shown that natural orbitals from weakly correlated perturbative methods are better suited than symmetrized atomic orbitals (SAO) when the total energy of the system is the target quantity.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
NTIRE 2021 Multi-modal Aerial View Object Classification Challenge
Authors:
Jerrick Liu,
Nathan Inkawhich,
Oliver Nina,
Radu Timofte,
Sahil Jain,
Bob Lee,
Yuru Duan,
Wei Wei,
Lei Zhang,
Songzheng Xu,
Yuxuan Sun,
Jiaqi Tang,
Xueli Geng,
Mengru Ma,
Gongzhe Li,
Xueli Geng,
Huanqia Cai,
Chengxue Cai,
Sol Cummings,
Casian Miron,
Alexandru Pasarica,
Cheng-Yen Yang,
Hung-Min Hsu,
Jiarui Cai,
Jie Mei
, et al. (9 additional authors not shown)
Abstract:
In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in compl…
▽ More
In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition
△ Less
Submitted 6 April, 2022; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Analytical Continuation of Matrix-Valued Functions: Carathéodory Formalism
Authors:
Jiani Fei,
Chia-Nan Yeh,
Dominika Zgid,
Emanuel Gull
Abstract:
Finite-temperature quantum field theories are formulated in terms of Green's functions and self-energies on the Matsubara axis. In multi-orbital systems, these quantities are related to positive semidefinite matrix-valued functions of the Carathéodory and Schur class. Analysis, interpretation and evaluation of derived quantities such as real-frequency response functions requires analytic continuat…
▽ More
Finite-temperature quantum field theories are formulated in terms of Green's functions and self-energies on the Matsubara axis. In multi-orbital systems, these quantities are related to positive semidefinite matrix-valued functions of the Carathéodory and Schur class. Analysis, interpretation and evaluation of derived quantities such as real-frequency response functions requires analytic continuation of the off-diagonal elements to the real axis. We derive the criteria under which such functions exist for given Matsubara data and present an interpolation algorithm that intrinsically respects their mathematical properties. For small systems with precise Matsubara data, we find that the continuation exactly recovers all off-diagonal and diagonal elements. In real-materials systems, we show that the precision of the continuation is sufficient for the analytic continuation to commute with the Dyson equation, and we show that the commonly used truncation of off-diagonal self-energy elements leads to considerable approximation artifacts. Our method paves the way for the systematic evaluation of Matsubara data with equations of many-body theory on the real-frequency axis.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Beyond 5G URLLC Evolution: New Service Modes and Practical Considerations
Authors:
Hirley Alves,
Gweon Do Jo,
JaeSheung Shin,
Choongil Yeh,
Nurul Huda Mahmood,
Carlos Lima,
Chanho Yoon,
Nandana Rahatheva,
Ok-Sun Park,
Seokki Kim,
Eunah Kim,
Ville Niemelä,
Hyeon Woo Lee,
Ari Pouttu,
Hyun Kyu Chung,
Matti Latva-aho
Abstract:
Ultra-reliable low latency communications (URLLC) arose to serve industrial IoT (IIoT) use cases within the 5G. Currently, it has inherent limitations to support future services. Based on state-of-the-art research and practical deployment experience, in this article, we introduce and advocate for three variants: broadband, scalable and extreme URLLC. We discuss use cases and key performance indica…
▽ More
Ultra-reliable low latency communications (URLLC) arose to serve industrial IoT (IIoT) use cases within the 5G. Currently, it has inherent limitations to support future services. Based on state-of-the-art research and practical deployment experience, in this article, we introduce and advocate for three variants: broadband, scalable and extreme URLLC. We discuss use cases and key performance indicators and identify technology enablers for the new service modes. We bring practical considerations from the IIoT testbed and provide an outlook toward some new research directions.
△ Less
Submitted 16 June, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Improvement of generalization of Larman-Rogers-Seidel's theorem
Authors:
Cheng-Jui Yeh,
Wei-Hsuan Yu
Abstract:
A finite set $X$ in the $d$-dimensional Euclidean space is called an $s$-distance set if the set of distances between any two distinct points of $X$ has size $s$. In 1977, Larman-Rogers-Seidel proved that if the cardinality of an two-distance set is large enough, then there exists an integer $k$ such that the two distances $α$, $β$ $(α< β)$ having the integer condition, namely,…
▽ More
A finite set $X$ in the $d$-dimensional Euclidean space is called an $s$-distance set if the set of distances between any two distinct points of $X$ has size $s$. In 1977, Larman-Rogers-Seidel proved that if the cardinality of an two-distance set is large enough, then there exists an integer $k$ such that the two distances $α$, $β$ $(α< β)$ having the integer condition, namely, $\frac{α^2}{β^2}=\frac{k-1}{k}$. In 2011, Nozaki generalized Larman-Rogers-Seidel's theorem to the case of $s$-distance sets, i.e. if the cardinality of an $s$-distance set $|X|\geqslant 2N$ with distances $α_1,α_2,\cdots,α_s$, where $N=\binom{d+s-1}{s-1}+\binom{d+s-2}{s-2}$, then the numbers $k_i=\prod_{j=1,2,\cdots,s,\text{ }j\neq i}\frac{α_{j}^{2}}{α_{j}^{2}-α_{i}^{2}}$ are integers. In this note, we reduce the lower bound of the requirement of integer condition of $s$-distance sets in $\mathbb{R}^d$. Furthermore, we can show that there are only finitely many $s$-distance sets $X$ in $\mathbb{R}^d$ with $|X|\geqslant 2\binom{d+s-1}{s-1}.$
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Quantitative mapping of the brain's structural connectivity using diffusion MRI tractography: a review
Authors:
Fan Zhang,
Alessandro Daducci,
Yong He,
Simona Schiavi,
Caio Seguin,
Robert Smith,
Chun-Hung Yeh,
Tengda Zhao,
Lauren J. O'Donnell
Abstract:
Diffusion magnetic resonance imaging (dMRI) tractography is an advanced imaging technique that enables in vivo mapping of the brain's white matter connections at macro scale. Over the last two decades, the study of brain connectivity using dMRI tractography has played a prominent role in the neuroimaging research landscape. In this paper, we provide a high-level overview of how tractography is use…
▽ More
Diffusion magnetic resonance imaging (dMRI) tractography is an advanced imaging technique that enables in vivo mapping of the brain's white matter connections at macro scale. Over the last two decades, the study of brain connectivity using dMRI tractography has played a prominent role in the neuroimaging research landscape. In this paper, we provide a high-level overview of how tractography is used to enable quantitative analysis of the brain's structural connectivity in health and disease. We first provide a review of methodology involved in three main processing steps that are common across most approaches for quantitative analysis of tractography, including methods for tractography correction, segmentation and quantification. For each step, we aim to describe methodological choices, their popularity, and potential pros and cons. We then review studies that have used quantitative tractography approaches to study the brain's white matter, focusing on applications in neurodevelopment, aging, neurological disorders, mental disorders, and neurosurgery. We conclude that, while there have been considerable advancements in methodological technologies and breadth of applications, there nevertheless remains no consensus about the "best" methodology in quantitative analysis of tractography, and researchers should remain cautious when interpreting results in research and clinical applications.
△ Less
Submitted 23 April, 2021;
originally announced April 2021.
-
Evaluation of two-particle properties within finite-temperature self-consistent one-particle Green's function methods: theory and application to GW and GF2
Authors:
Pavel Pokhilko,
Sergei Iskakov,
Chia-Nan Yeh,
Dominika Zgid
Abstract:
One-particle Green's function methods can model molecular and solid spectra at zero or non-zero temperatures. One-particle Green's functions directly provide electronic energies and one-particle properties, such as dipole moment. However, the evaluation of two-particle properties, such as $\langle{S^2}\rangle$ and $\langle{N^2}\rangle$ can be challenging, because they require a solution of the com…
▽ More
One-particle Green's function methods can model molecular and solid spectra at zero or non-zero temperatures. One-particle Green's functions directly provide electronic energies and one-particle properties, such as dipole moment. However, the evaluation of two-particle properties, such as $\langle{S^2}\rangle$ and $\langle{N^2}\rangle$ can be challenging, because they require a solution of the computationally expensive Bethe--Salpeter equation to find two-particle Green's functions. We demonstrate that the solution of the Bethe--Salpeter equation can be complitely avoided. Applying the thermodynamic Hellmann--Feynman theorem to self-consistent one-particle Green's function methods, we derive expressions for two-particle density matrices in a general case and provide explicit expressions for GF2 and GW methods. Such density matrices can be decomposed into an antisymmetrized product of correlated one-electron density matrices and the two-particle electronic cumulant of the density matrix. Cumulant expressions reveal a deviation from ensemble representability for GW, explaining its known deficiencies. We analyze the temperature dependence of $\langle{S^2}\rangle$ and $\langle{N^2}\rangle$ for a set of small closed-shell systems. Interestingly, both GF2 and GW show a non-zero spin contamination and a non-zero fluctuation of the number of particles for closed-shell systems at the zero-temperature limit.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations
Authors:
Archit Rathore,
Sunipa Dev,
Jeff M. Phillips,
Vivek Srikumar,
Yan Zheng,
Chin-Chia Michael Yeh,
Junpeng Wang,
Wei Zhang,
Bei Wang
Abstract:
Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this,…
▽ More
Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this, we present Visualization of Embedding Representations for deBiasing system ("VERB"), an open-source web-based visualization tool that helps the users gain a technical understanding and visual intuition of the inner workings of debiasing techniques, with a focus on their geometric properties. In particular, VERB offers easy-to-follow use cases in exploring the effects of these debiasing techniques on the geometry of high-dimensional word vectors. To help understand how various debiasing techniques change the underlying geometry, VERB decomposes each technique into interpretable sequences of primitive transformations and highlights their effect on the word vectors using dimensionality reduction and interactive visual exploration. VERB is designed to target natural language processing (NLP) practitioners who are designing decision-making systems on top of word embeddings, and also researchers working with fairness and ethics of machine learning systems in NLP. It can also serve as a visual medium for education, which helps an NLP novice to understand and mitigate biases in word embeddings.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Authors:
Yangyang Shi,
Varun Nagaraja,
Chunyang Wu,
Jay Mahadeokar,
Duc Le,
Rohit Prabhavalkar,
Alex Xiao,
Ching-Feng Yeh,
Julian Chan,
Christian Fuegen,
Ozlem Kalinli,
Michael L. Seltzer
Abstract:
We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET assigns different encoders to decode different parts of an utterance. We apply and compare the layer dropout and the collaborative learning for DET training. The laye…
▽ More
We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET assigns different encoders to decode different parts of an utterance. We apply and compare the layer dropout and the collaborative learning for DET training. The layer dropout method that randomly drops out encoder layers in the training phase, can do on-demand layer dropout in decoding. Collaborative learning jointly trains multiple encoders with different depths in one single model. Experiment results on Librispeech and in-house data show that DET provides a flexible accuracy and latency trade-off. Results on Librispeech show that the full-size encoder in DET relatively reduces the word error rate of the same size baseline by over 8%. The lightweight encoder in DET trained with collaborative learning reduces the model size by 25% but still gets similar WER as the full-size baseline. DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Authors:
Suyoun Kim,
Abhinav Arora,
Duc Le,
Ching-Feng Yeh,
Christian Fuegen,
Ozlem Kalinli,
Michael L. Seltzer
Abstract:
Word Error Rate (WER) has been the predominant metric used to evaluate the performance of automatic speech recognition (ASR) systems. However, WER is sometimes not a good indicator for downstream Natural Language Understanding (NLU) tasks, such as intent recognition, slot filling, and semantic parsing in task-oriented dialog systems. This is because WER takes into consideration only literal correc…
▽ More
Word Error Rate (WER) has been the predominant metric used to evaluate the performance of automatic speech recognition (ASR) systems. However, WER is sometimes not a good indicator for downstream Natural Language Understanding (NLU) tasks, such as intent recognition, slot filling, and semantic parsing in task-oriented dialog systems. This is because WER takes into consideration only literal correctness instead of semantic correctness, the latter of which is typically more important for these downstream tasks. In this study, we propose a novel Semantic Distance (SemDist) measure as an alternative evaluation metric for ASR systems to address this issue. We define SemDist as the distance between a reference and hypothesis pair in a sentence-level embedding space. To represent the reference and hypothesis as a sentence embedding, we exploit RoBERTa, a state-of-the-art pre-trained deep contextualized language model based on the transformer architecture. We demonstrate the effectiveness of our proposed metric on various downstream tasks, including intent recognition, semantic parsing, and named entity recognition.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Resolvent analysis on the origin of two-dimensional transonic buffet
Authors:
Yoimi Kojima,
Chi-An Yeh,
Kunihiko Taira,
Masaharu Kameda
Abstract:
Resolvent analysis is performed to identify the origin of two-dimensional transonic buffet over an airfoil. The base flow for the resolvent analysis is the time-averaged flow over a NACA 0012 airfoil at a chord-based Reynolds number of 2000 and a free-stream Mach number of 0.85. We reveal that the mechanism of buffet is buried underneath the global low-Reynolds-number flow physics. At this low Rey…
▽ More
Resolvent analysis is performed to identify the origin of two-dimensional transonic buffet over an airfoil. The base flow for the resolvent analysis is the time-averaged flow over a NACA 0012 airfoil at a chord-based Reynolds number of 2000 and a free-stream Mach number of 0.85. We reveal that the mechanism of buffet is buried underneath the global low-Reynolds-number flow physics. At this low Reynolds number, the dominant flow feature is the von Karman shedding. However, we show that with the appropriate forcing input, buffet can appear even at a Reynolds number that is much lower than what is traditionally associated with transonic buffet. The source of buffet is identified to be at the shock foot from the windowed resolvent analysis, which is validated by companion simulations using sustained forcing inputs based on resolvent modes. We also comment on the role of perturbations in the vicinity of the trailing edge. The present study not only provides insights on the origin of buffet but also serves a building block for low-Reynolds-number compressible aerodynamics in light of the growing interests in Martian flights.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Testing the GFCCSD impurity solver on real materials within the self-energy embedding theory framework
Authors:
Chia-Nan Yeh,
Avijit Shee,
Sergei Iskakov,
Dominika Zgid
Abstract:
We apply the Green's function coupled cluster singles and doubles (GFCCSD) impurity solver to realistic impurity problems arising for strongly correlated solids within the self-energy embedding theory (SEET) framework. We describe the details of our GFCC solver implementation, investigate its performance, and highlight potential advantages and problems on examples of impurities created during the…
▽ More
We apply the Green's function coupled cluster singles and doubles (GFCCSD) impurity solver to realistic impurity problems arising for strongly correlated solids within the self-energy embedding theory (SEET) framework. We describe the details of our GFCC solver implementation, investigate its performance, and highlight potential advantages and problems on examples of impurities created during the self-consistent SEET for antiferromagnetic MnO and paramagnetic SrMnO$_{3}$. GFCCSD provides satisfactory descriptions for weakly and moderately correlated impurities with sizes that are intractable by existing accurate impurity solvers such as exact diagonalization (ED). However, our data also shows that when correlations become strong, the singles and doubles approximation used in GFCC could lead to instabilities in searching for the particle number present in impurity problems. These instabilities appears especially severe when the impurity size gets larger and multiple degenerate orbitals with strong correlations are present. We conclude that to fully check the reliability of GFCCSD results and use them in fully {\em ab initio} calculations in the absence of experiments, a verification from a GFCC solver with higher order excitations is necessary.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Motional heating of spatially extended ion crystals
Authors:
D. Kalincev,
L. S. Dreissen,
A. P. Kulosa,
C-H. Yeh,
H. A. Fürst,
T. E. Mehlstäubler
Abstract:
We study heating of motional modes of a single ion and of extended ion crystals trapped in a linear radio frequency (rf) Paul trap with a precision of $Δ\dot{\bar{n}} \approx 0.2 $ phonons s$^{-1}$. Single-ion axial and radial heating rates are consistent and electric field noise has been stable over the course of four years. At a secular frequency of $ω_\mathrm{sec}=2π\times620$ kHz, we measure…
▽ More
We study heating of motional modes of a single ion and of extended ion crystals trapped in a linear radio frequency (rf) Paul trap with a precision of $Δ\dot{\bar{n}} \approx 0.2 $ phonons s$^{-1}$. Single-ion axial and radial heating rates are consistent and electric field noise has been stable over the course of four years. At a secular frequency of $ω_\mathrm{sec}=2π\times620$ kHz, we measure $\dot{\bar{n}} = 0.56(6)$ phonons s$^{-1}$ per ion for the center-of-mass (com) mode of linear chains of up to eleven ions and observe no significant heating of the out-of-phase (oop) modes. By displacing the ions away from the nodal line, inducing excess micromotion, rf noise heats the com mode quadratically as a function of radial displacement $r$ by $\dot{\bar{n}}(r)/ r^2 = 0.89(4)$ phonons s$^{-1}$ $μ$m$^{-2}$ per ion, while the oop modes are protected from rf-noise induced heating in linear chains. By changing the quality factor of the resonant rf circuit from $Q=542$ to $Q=204$, we observe an increase of rf noise by a factor of up to 3. We show that the rf-noise induced heating of motional modes of extended crystals also depends on the symmetry of the crystal and of the mode itself. As an example, we consider several 2D and 3D crystal configurations. Heating rates of up to 500 phonons s$^{-1}$ are observed for individual modes, giving rise to a total kinetic energy increase and thus a fractional time dilation shift of up to $-0.3\times 10^{-18}$ s$^{-1}$ of the total system. In addition, we detail on how the excitation probability of the individual ions is reduced and decoherence is increased due to the Debye-Waller effect.
△ Less
Submitted 28 May, 2021; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Construction and commissioning of CMS CE prototype silicon modules
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
M. Alhusseini,
J. Alison,
G. Altopp,
M. Alyari,
S. An,
S. Anagul,
I. Andreev,
M. Andrews,
P. Aspell,
I. A. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
P. Bargassa,
D. Barney,
E. Becheva,
P. Behera,
A. Belloni
, et al. (307 additional authors not shown)
Abstract:
As part of its HL-LHC upgrade program, the CMS Collaboration is developing a High Granularity Calorimeter (CE) to replace the existing endcap calorimeters. The CE is a sampling calorimeter with unprecedented transverse and longitudinal readout for both electromagnetic (CE-E) and hadronic (CE-H) compartments. The calorimeter will be built with $\sim$30,000 hexagonal silicon modules. Prototype modul…
▽ More
As part of its HL-LHC upgrade program, the CMS Collaboration is developing a High Granularity Calorimeter (CE) to replace the existing endcap calorimeters. The CE is a sampling calorimeter with unprecedented transverse and longitudinal readout for both electromagnetic (CE-E) and hadronic (CE-H) compartments. The calorimeter will be built with $\sim$30,000 hexagonal silicon modules. Prototype modules have been constructed with 6-inch hexagonal silicon sensors with cell areas of 1.1~$cm^2$, and the SKIROC2-CMS readout ASIC. Beam tests of different sampling configurations were conducted with the prototype modules at DESY and CERN in 2017 and 2018. This paper describes the construction and commissioning of the CE calorimeter prototype, the silicon modules used in the construction, their basic performance, and the methods used for their calibration.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
The DAQ system of the 12,000 Channel CMS High Granularity Calorimeter Prototype
Authors:
B. Acar,
G. Adamov,
C. Adloff,
S. Afanasiev,
N. Akchurin,
B. Akgün,
M. Alhusseini,
J. Alison,
G. Altopp,
M. Alyari,
S. An,
S. Anagul,
I. Andreev,
M. Andrews,
P. Aspell,
I. A. Atakisi,
O. Bach,
A. Baden,
G. Bakas,
A. Bakshi,
P. Bargassa,
D. Barney,
E. Becheva,
P. Behera,
A. Belloni
, et al. (307 additional authors not shown)
Abstract:
The CMS experiment at the CERN LHC will be upgraded to accommodate the 5-fold increase in the instantaneous luminosity expected at the High-Luminosity LHC (HL-LHC). Concomitant with this increase will be an increase in the number of interactions in each bunch crossing and a significant increase in the total ionising dose and fluence. One part of this upgrade is the replacement of the current endca…
▽ More
The CMS experiment at the CERN LHC will be upgraded to accommodate the 5-fold increase in the instantaneous luminosity expected at the High-Luminosity LHC (HL-LHC). Concomitant with this increase will be an increase in the number of interactions in each bunch crossing and a significant increase in the total ionising dose and fluence. One part of this upgrade is the replacement of the current endcap calorimeters with a high granularity sampling calorimeter equipped with silicon sensors, designed to manage the high collision rates. As part of the development of this calorimeter, a series of beam tests have been conducted with different sampling configurations using prototype segmented silicon detectors. In the most recent of these tests, conducted in late 2018 at the CERN SPS, the performance of a prototype calorimeter equipped with ${\approx}12,000\rm{~channels}$ of silicon sensors was studied with beams of high-energy electrons, pions and muons. This paper describes the custom-built scalable data acquisition system that was built with readily available FPGA mezzanines and low-cost Raspberry PI computers.
△ Less
Submitted 8 December, 2020; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Authors:
Ching-Feng Yeh,
Yongqiang Wang,
Yangyang Shi,
Chunyang Wu,
Frank Zhang,
Julian Chan,
Michael L. Seltzer
Abstract:
Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for l…
▽ More
Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7% on test-clean and 5.8% on test-other, to our best knowledge the lowest among streaming approaches reported so far.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Authors:
Xiaohui Zhang,
Frank Zhang,
Chunxi Liu,
Kjell Schubert,
Julian Chan,
Pradyot Prakash,
Jun Liu,
Ching-Feng Yeh,
Fuchun Peng,
Yatharth Saraf,
Geoffrey Zweig
Abstract:
In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T. In transcribing social media videos of 7 languages with training data 3K-14K hours, we conduct large-scale controlled experimentation across each criterion using identi…
▽ More
In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T. In transcribing social media videos of 7 languages with training data 3K-14K hours, we conduct large-scale controlled experimentation across each criterion using identical datasets and encoder model architecture. We find that RNN-T has consistent wins in ASR accuracy, while CTC models excel at inference efficiency. Moreover, we selectively examine various modeling strategies for different training criteria, including modeling units, encoder architectures, pre-training, etc. Given such large-scale real-world streaming ASR application, to our best knowledge, we present the first comprehensive benchmark on these three widely used training criteria across a great many languages.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Alignment Restricted Streaming Recurrent Neural Network Transducer
Authors:
Jay Mahadeokar,
Yuan Shangguan,
Duc Le,
Gil Keren,
Hang Su,
Thong Le,
Ching-Feng Yeh,
Christian Fuegen,
Michael L. Seltzer
Abstract:
There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications. RNN-T is trained with a loss function that does not enforce temporal alignment of the training transcripts and audio. As a result, RNN-T models built with uni-directional long short term memory (LSTM) encoders tend to wait for lon…
▽ More
There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications. RNN-T is trained with a loss function that does not enforce temporal alignment of the training transcripts and audio. As a result, RNN-T models built with uni-directional long short term memory (LSTM) encoders tend to wait for longer spans of input audio, before streaming already decoded ASR tokens. In this work, we propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T (Ar-RNN-T) models, which utilize audio-text alignment information to guide the loss computation. We compare the proposed method with existing works, such as monotonic RNN-T, on LibriSpeech and in-house datasets. We show that the Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER). The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency. Moreover, the Ar-RNN-T loss allows for bigger batch sizes and 4 times higher throughput for our LSTM model architecture, enabling faster training and convergence on GPUs.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Holographic approach to thermalization in general anisotropic theories
Authors:
Po-Chun Sun,
Da-Shin Lee,
Chen-Pin Yeh
Abstract:
We employ the holographic approach to study the thermalization in the quenched strongly-coupled field theories with very general anisotropic scalings including Lifshitz and hyperscaling violating fixed points. The holographic dual is a Vaidya-like time-dependent geometry where the asymptotic metric has general anisotropic scaling isometries. We find the Ryu-Takanayagi extremal surface and use it t…
▽ More
We employ the holographic approach to study the thermalization in the quenched strongly-coupled field theories with very general anisotropic scalings including Lifshitz and hyperscaling violating fixed points. The holographic dual is a Vaidya-like time-dependent geometry where the asymptotic metric has general anisotropic scaling isometries. We find the Ryu-Takanayagi extremal surface and use it to calculate the time-dependent entanglement entropy between a strip region with width $2R$ and its outside region. In the special case with an isotropic metric, we also explore the entanglement entropy for a spherical region of radius $R$. The growth of the entanglement entropy characterizes the thermalization rate after a quench. We study the thermalization process in the early times and late times in both large $R$ and small $R$ limits. The allowed scaling parameter regions are constrained by the null energy conditions as well as the condition for the existence of the Ryu-Takanayagi extremal surfaces. This generalizes the previous works on this subject. All obtained results can be compared with experiments and other methods of probing thermalization.
△ Less
Submitted 31 March, 2021; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Merchant Category Identification Using Credit Card Transactions
Authors:
Chin-Chia Michael Yeh,
Zhongfang Zhuang,
Yan Zheng,
Liang Wang,
Junpeng Wang,
Wei Zhang
Abstract:
Digital payment volume has proliferated in recent years with the rapid growth of small businesses and online shops. When processing these digital transactions, recognizing each merchant's real identity (i.e., business type) is vital to ensure the integrity of payment processing systems. Conventionally, this problem is formulated as a time series classification problem solely using the merchant tra…
▽ More
Digital payment volume has proliferated in recent years with the rapid growth of small businesses and online shops. When processing these digital transactions, recognizing each merchant's real identity (i.e., business type) is vital to ensure the integrity of payment processing systems. Conventionally, this problem is formulated as a time series classification problem solely using the merchant transaction history. However, with the large scale of the data, and changing behaviors of merchants and consumers over time, it is extremely challenging to achieve satisfying performance from off-the-shelf classification methods. In this work, we approach this problem from a multi-modal learning perspective, where we use not only the merchant time series data but also the information of merchant-merchant relationship (i.e., affinity) to verify the self-reported business type (i.e., merchant category) of a given merchant. Specifically, we design two individual encoders, where one is responsible for encoding temporal information and the other is responsible for affinity information, and a mechanism to fuse the outputs of the two encoders to accomplish the identification task. Our experiments on real-world credit card transaction data between 71,668 merchants and 433,772,755 customers have demonstrated the effectiveness and efficiency of the proposed model.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
Authors:
Yongqiang Wang,
Yangyang Shi,
Frank Zhang,
Chunyang Wu,
Julian Chan,
Ching-Feng Yeh,
Alex Xiao
Abstract:
In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks.…
▽ More
In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks. On a low latency voice assistant task, Emformer gets 24% to 26% relative word error rate reductions (WERRs). For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time factors reduction.
△ Less
Submitted 29 October, 2020; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Electron correlations in cubic paramagnetic perovskite Sr(V,Mn)O$_{3}$ -- Results from fully self-consistent self-energy embedding calculations
Authors:
Chia-Nan Yeh,
Sergei Iskakov,
Dominika Zgid,
Emanuel Gull
Abstract:
In this work, we use the thermodynamically consistent and conserving self-energy embedding theory (SEET) to study the spectra of the prototypical undistorted cubic perovskites SrVO$_3$ and SrMnO$_3$. In the strongly correlated metallic SrVO$_3$ we find that the usual attribution of the satellite peaks at -1.8eV to Hund or Hubbard physics in the $t_{2g}$ orbitals is inconsistent with our calculatio…
▽ More
In this work, we use the thermodynamically consistent and conserving self-energy embedding theory (SEET) to study the spectra of the prototypical undistorted cubic perovskites SrVO$_3$ and SrMnO$_3$. In the strongly correlated metallic SrVO$_3$ we find that the usual attribution of the satellite peaks at -1.8eV to Hund or Hubbard physics in the $t_{2g}$ orbitals is inconsistent with our calculations. In the strongly correlated insulator SrMnO$_3$ we recover insulating behavior due to a feedback effect between the strongly correlated orbitals and the weakly correlated environment. Our calculation shows a systematic convergence of spectral features as the space of strongly correlated orbitals is enlarged, paving the way to a systematic parameter free study of correlated perovskites.
△ Less
Submitted 1 June, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Authors:
Yangyang Shi,
Yongqiang Wang,
Chunyang Wu,
Ching-Feng Yeh,
Julian Chan,
Frank Zhang,
Duc Le,
Mike Seltzer
Abstract:
This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history context is distilled into an augmented memory bank to reduce self-attention's computation complexity. A cache mechanism saves the computation for the key and value in self-attention for the left context. Emformer applies a parallelized block processing in t…
▽ More
This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history context is distilled into an augmented memory bank to reduce self-attention's computation complexity. A cache mechanism saves the computation for the key and value in self-attention for the left context. Emformer applies a parallelized block processing in training to support low latency models. We carry out experiments on benchmark LibriSpeech data. Under average latency of 960 ms, Emformer gets WER $2.50\%$ on test-clean and $5.62\%$ on test-other. Comparing with a strong baseline augmented memory transformer (AM-TRF), Emformer gets $4.6$ folds training speedup and $18\%$ relative real-time factor (RTF) reduction in decoding with relative WER reduction $17\%$ on test-clean and $9\%$ on test-other. For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3.01\%$ on test-clean and $7.09\%$ on test-other. Comparing with the LSTM baseline with the same latency and model size, Emformer gets relative WER reduction $9\%$ and $16\%$ on test-clean and test-other, respectively.
△ Less
Submitted 30 December, 2020; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Nevanlinna Analytical Continuation
Authors:
Jiani Fei,
Chia-Nan Yeh,
Emanuel Gull
Abstract:
Simulations of finite temperature quantum systems provide imaginary frequency Green's functions that correspond one-to-one to experimentally measurable real-frequency spectral functions. However, due to the bad conditioning of the continuation transform from imaginary to real frequencies, established methods tend to either wash out spectral features at high frequencies or produce spectral function…
▽ More
Simulations of finite temperature quantum systems provide imaginary frequency Green's functions that correspond one-to-one to experimentally measurable real-frequency spectral functions. However, due to the bad conditioning of the continuation transform from imaginary to real frequencies, established methods tend to either wash out spectral features at high frequencies or produce spectral functions with unphysical negative parts. Here, we show that explicitly respecting the analytic `Nevanlinna' structure of the Green's function leads to intrinsically positive and normalized spectral functions, and we present a continued fraction expansion that yields all possible functions consistent with the analytic structure. Application to synthetic trial data shows that sharp, smooth, and multi-peak data is resolved accurately. Application to the band structure of silicon demonstrates that high energy features are resolved precisely. Continuations in a realistic correlated setup reveal additional features that were previously unresolved. By substantially increasing the resolution of real frequency calculations our work overcomes one of the main limitations of finite-temperature quantum simulations.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Evaluating real-time probabilistic forecasts with application to National Basketball Association outcome prediction
Authors:
Chi-Kuang Yeh,
Gregory Rice,
Joel A. Dubin
Abstract:
Motivated by the goal of evaluating real-time forecasts of home team win probabilities in the National Basketball Association, we develop new tools for measuring the quality of continuously updated probabilistic forecasts. This includes introducing calibration surface plots, and simple graphical summaries of them, to evaluate at a glance whether a given continuously updated probability forecasting…
▽ More
Motivated by the goal of evaluating real-time forecasts of home team win probabilities in the National Basketball Association, we develop new tools for measuring the quality of continuously updated probabilistic forecasts. This includes introducing calibration surface plots, and simple graphical summaries of them, to evaluate at a glance whether a given continuously updated probability forecasting method is well-calibrated, as well as developing statistical tests and graphical tools to evaluate the skill, or relative performance, of two competing continuously updated forecasting methods. These tools are studied by means of a Monte Carlo simulation study of simulated basketball games, and demonstrated in an application to evaluate the continuously updated forecasts published by the United States-based multinational sports network ESPN on its principle webpage {\tt espn.com}. This application lends statistical evidence that the forecasts published there are well-calibrated, and exhibit improved skill over several naïve models, but do not demonstrate significantly improved skill over simple logistic regression models based solely on a measurement of each teams' relative strength, and the evolving score difference throughout the game.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.