revtex4-2Repair the float package
Exotic and physics-informed support vector machines for high energy physics
Abstract
In this article, we explore machine learning techniques using support vector machines with two novel approaches: exotic and physics-informed support vector machines. Exotic support vector machines employ unconventional techniques such as genetic algorithms and boosting. Physics-informed support vector machines integrate the physics dynamics of a given high-energy physics process in a straightforward manner. The goal is to efficiently distinguish signal and background events in high-energy physics collision data. To test our algorithms, we perform computational experiments with simulated Drell-Yan events in proton-proton collisions. Our results highlight the superiority of the physics-informed support vector machines, emphasizing their potential in high-energy physics and promoting the inclusion of physics information in machine learning algorithms for future research.
I INTRODUCTION
Machine learning techniques have proven to be extremely powerful when applied to high energy physics phenomena, both in theory and experimental studies [1, 2, 3]. Several algorithms have been applied to distinguish signals coming from high energy collider data [4, 5]. For instance, the discovery of the Higgs boson was aided with the help of the so-called boosted decision trees algorithm [6]. Some other popular machine learning algorithms which have been successful in high energy physics are: neural networks [7, 8, 9, 10], linear regressions [8, 11, 12] and deep learning [13, 14, 15, 16].
To continue exploiting the potential of machine learning techniques, the idea that physics insights can help design a better machine learning algorithm has recently been used across several fields, yielding excellent results. This field is known as physics-informed machine learning [17]. The majority of physics-informed machine learning studies are through the use of advanced neural network architectures. Moreover, support vector machines (SVM) which are based on kernel methods, have also benefited from these physics insights. The physics information in the SVMs is introduced via their kernels. The latter improves the SVMs performance [18].
In the realm of high energy physics, physics-informed neural networks and deep learning techniques have been proposed to tackle the most challenging tasks in data analysis coming from high energy physics experiments ranging from searches of new physics phenomena to jet tagging [19, 20, 21, 22]. SVMs have been also helpful and interesting for the high energy physics community [23, 24, 25, 26, 1, 27]. However, there is no reports of physics-informed support vector machines applied to high energy physics phenomena. Hence, this invites the exploration of SVMs in the context of physics-informed machine learning.
This paper is focused on the application and interpretation of the SVMs in experimental high energy physics. The use of support vector machines is motivated by their relatively simple geometric interpretation, especially for binary discrimination of signal events against background events. First, we study what we call exotic support vector machines. These SVMs are exotic in the sense that we utilize unconventional techniques to build them. That is, we use genetic and boosting algorithms to construct more efficient classifiers. Moreover, we use somewhat unconventional kernels. The construction of the exotic SVMs is guided by our previous studies [28]. Second, we study physics-informed support vector machines. To include high energy physics information in our SVMs, we propose kernels that define the SVM and aim to capture the dynamical properties of the underlying theory that intends to describe the observed/expected data in high energy experiments.
We perform a case of study: The Drell-Yan boson production in proton-proton high energy collisions. In our studies, we simulate data for the process , where and are the quarks coming from the colliding protons and are the final state oppositely charged leptons. Using the kinematic variables for these final state leptons we construct the kernels that define the SVM in every case. We then make formal statistical tests to compare the performances of each SVM. The latter will help us to conclude the usefulness of introducing the dynamics into a support vector machine algorithm.
In Sect. II we summarize the formalism of support vector machines, the basic kernel theory, and the definition of the considered kernels. Furthermore, we describe the genetic algorithms and boosting techniques used, and the approach of how to introduce the physics dynamics of a given process in high energy physics to a support vector machine algorithm. In Sec. III we present the computational experiments to train and test our proposed support vector machines. In Sect. IV we present our results and discussion. Finally, in Sect. V we present our conclusions.
II Methodology
We propose that if the theory underlying the dynamics of a physics process to be studied in high-energy experiments are considered or included during the construction of a kernel that defines the support vector machine, then the discrimination capabilities of the support vector machine binary classifier will be significantly enhanced. Then we compare the physics-informed SVMs with state-of-the-art SVMs. The following sections describe the ingredients of this proposal.
II.1 Support vector machines
In a binary SVM classifier, an optimal hyper-plane, separating two classes in the feature space, is found [29]. Binary classification is important in experimental high energy physics, as it helps discriminate between signals of interest against background. During optimization, the SVM model selects a subset of support vectors (SVs) from the training samples, , to establish the decision surface’s location. To simplify the search for SVs, the training samples are mapped into a high-dimensional space using kernel functions, , which are expressed as inner products of the training samples or their mappings. In this feature space, a specific kernel produces a hyperplane that assigns a prediction to each element of based on which side of the hyperplane lies. The kernel functions solve the optimization problem without explicitly using the actual mappings, a technique known as the kernel trick. Since data may not be perfectly separable and some points may lie within the margin or be misclassified, SVM implementations allow for a certain degree of misclassification by introducing an adjustable penalty cost [29, 28]. A SVM classifier is defined by its kernel and the parameters that describe the kernel. Kernel theory in machine learning allows the construction of a broad diversity of kernels employing elemental kernel properties. Let and be kernels over where , and is a kernel over . Then the following functions are kernels as well [30]:
(1) | |||||
where, .
II.2 Basic kernels
In this context, a kernel is a Hermitian and positive semidefinite Gram matrix defined as , where the vectors live in a vector space that contains an inner product [31]. To make the notation more compact, we write , with . This paper considers the kernels:
-
•
Linear kernel
(2) with no hyper-parameters.
-
•
Radial Basis Function (RBF) kernel
(3) with hyper-parameter .
-
•
Sigmoid kernel
(4) with hyper-parameters and .
-
•
Polynomial kernel
(5) with hyper-parameters , and .
For the sigmoid kernel . For the polynomial kernel and . Finally, we set a high value, , to provide a non-negligible impact of each training vector. The chosen values of the hyper-parameters , , and enforce a good behavior when fitting a SVM [33, 32].
II.3 Exotic support vector machines
To construct exotic support vector machines we use and combine three elements:
- •
-
•
Ensembles of classifiers. An ensemble of classifiers is a collection of single weak classifiers that when combined together, provide a strong classifier [34, 35]. In this work, we use the AdaBoost algorithm [36] to construct ensembles. This adaptive method updates the vector111In this context, a vector is a point of the data sample. weights based on the training error of a given binary classifier. These weights are used to train the next classifier to be added to the ensemble. Correctly classified vectors are assigned lower weights, whilst misclassified vectors are given higher weights. Thus, vectors that are harder to classify receive more focus from the algorithm. The AdaBoost algorithm is repeated times, . First, for the data true label and the base classifier prediction , the training error is calculated
(6) where are the weights of each vector utilized to train the classifier. Then, the score is defined as
(7) The weights are updated for the next iteration with
(8) where is a normalization factor. The weights in Eq. (8) are applied to train and add a new classifier to the ensemble. When iterations are completed, the predicted label of the total ensemble is the weighted sum of the predictions of the individual classifiers within the ensemble
(9) -
•
Genetic algorithms. The genetic algorithms are optimization techniques inspired by the principles of biological evolution. Selections are performed using simple operators based on genetic recombinations and mutations. In this work, we use genetic algorithms to select a small subset of the training data, which will likely contain the support vectors needed to solve the binary classification problem [37]. To determine if a subgroup of vectors is indeed likely to contain the support vectors, a fitness function is calculated to check if this subgroup is good at classifying data outside this subgroup. This is repeated for several subgroups of vectors and a selection of subgroups is performed using the high-low method [38]. The selected subgroups are recombined and the previous steps are repeated until a given stop criterion is satisfied. For more details, see Ref. [28].
II.4 The Drell-Yan process
Based on the parton model and the quark-antiquark annihilation mechanism, Sidney D. Drell and Tung-Mow Yan [39] predicted the production of two oppositely charged leptons in hadron-hadron collisions. The neutral dilepton pair was predicted to appear with a large invariant mass. This production is the well-known neutral current Drell-Yan process. For proton-proton collisions, the partons participating in the Drell-Yan production are quark and antiquark that constitute the protons. The tree-level or leading-order partonic cross-section of the process is found to be [40]
(10) |
where is the Fermi weak coupling constant, the invariant mass of the boson, is the vector (axial vector) coupling of the to the quarks, and is the square of the center-of-mass energy of the quark-antiquark.
A quark with charge inside a proton is described by a parton distribution function . Considering all the proton parton distribution functions and with the aid of the QCD factorization theorem, it is found that the hadronic (proton-proton) cross-section for the Drell-Yan process is
(11) | |||||
where is the color factor. are defined in terms of the four-momentum of each parton
(12) | |||||
(13) |
From Eqs. (12)-(13) it is found that , where is the proton-proton center-of-mass energy. For the produced lepton pair, the rapidity is given by , and hence
(14) |
The cross-section of Eq. (11) is multiplied by the branching ratio for any particular hadronic or leptonic final state of interest, which for this paper is the dielectron final state, namely, .
The proton-proton cross section in Eq. (11) is a function of the kinematics of the outgoing leptons. The kernel for our support vector machines is therefore constructed in accordance with Eqs. (10)-(14) in the following way: First, we identify the matrix of the proton-proton collision data as the kernel. Then, we perform operations on this kernel according to the relevant kinematic variables of the final state leptons in the cross-section. With this information, the kernel is expected to discriminate Drell-Yan events against backgrounds. Taking into account the kernel properties in Eq. (1), we propose a physics-informed kernel
(15) |
III Experiments
To test our proposed methodology, we perform computational experiments on a well-known Standard Model process. Namely, the production of a boson decaying to an electron-positron pair (Drell-Yan production). Finally, we train, test, and compare several support vector machine binary classifiers to characterize their discrimination power between the Drell-Yan process against backgrounds.
III.1 Data simulation
In this work, we consider Drell-Yan simulated signal and backgrounds. The simulated data is at the generator level, that is, no detector effects are taken into account. The simulation is carried out utilizing PYTHIA8.3 [41]. The event generation is performed utilizing the PYTHIA configuration for the production of weak single and double bosons for proton-proton collisions at center-of-mass energy TeV. For the signal events, we require that the event contains particles with the PDGid [42] corresponding to the boson. Then, we require that this particle’s invariant mass is within the boson mass (91.1876 GeV) with a width of 40 GeV. Also, we require in the final state, two oppositely charged leptons whose mother particle is the selected . The kinematics of these charged leptons are the variables that are used to construct the kernels of the support vector machines. In this study, we consider the backgrounds which are most important for the Drell-Yan production reported by the ATLAS and CMS experiments at the Large Hadron Collider [43, 44]. The considered backgrounds are the diboson (, , ), , and single top productions. These backgrounds are expected, as their final states may mimic the single boson production final state charged leptons. The event selection for the backgrounds is similar to the single boson. In this work, we do not consider backgrounds coming from multijet, as they are expected to be negligible () [43, 44]. Since the events are simulated with no detector effects, the samples contain a high purity of events and there is no need to consider variables which are used to handle mismodelling, particle identification, lepton isolation, or acceptance effects. Figure 1 shows the invariant mass of the boson calculated with the kinematics of the final state electron-positron pair. Furthermore, we consider the electron-positron kinematic quantities: energy, momentum, transverse momentum, rapidity and azymuthal angle. These quantities are utilized to build the kernels for SVMs.
III.2 Data splitting
In high energy physics, the challenge of class imbalance in the data sample usually appears. Hence, in this study, we consider different levels of imbalance among the signal and background events. Conventionally, in the binary classification task for high energy physics, a positive value is assigned to label a signal event, and a negative value is assigned to label a background event, being these values . We consider the cases when the data sample is fully balanced and the cases when there is an imbalance of the ratio signal: background as 1:3, 1:10, 3:1, and 10:1. This is summarized in Table 1.
Sample | Class | Class | Imbalance |
---|---|---|---|
half_half | 5000 | 5000 | 1:1 |
1quart_3quart | 2500 | 7500 | 1:3 |
3quart_1quart | 7500 | 2500 | 3:1 |
1dec_10dec | 1000 | 10000 | 1:10 |
10dec_1dec | 10000 | 1000 | 10:1 |
III.3 Support vector machine models
The support vector machines we study in this paper are summarized in Table 2. The models listed in this table are based on the definitions in Sections II.1-II.4. The phys-DY model employs a kernel that incorporates the Drell-Yan dynamics, as detailed in Eqs. (10)-(14) and summarized in Eq. (15). Models with lin, rbf, pol, or sig in their names utilize the kernels specified in Eq. (2), Eq. (3), Eq. (5), and Eq. (4), respectively. Models featuring adaboost are ensembles constructed using the AdaBoost algorithm described in Sec. II.3, following Eqs. (6)-(9). Models marked with gen use genetic selection as discussed in Sec. II.3. Finally, single and sum indicate that the kernel consists of a single element or the sum of two kernels, respectively. In addition to the physics-informed support vector machine, the classifiers listed in this table are chosen for their outstanding performance in preliminary tests in agreement with our previous study in Ref. [28].
Name | Description |
---|---|
phys-DY | Single with physics-informed kernel |
adaboost-gen-rbf | AdaBoost ensemble with genetic |
selection and RBF kernel | |
adaboost-gen-pol | AdaBoost ensemble with genetic |
selection and polynomial kernel | |
adaboost-gen-sig | AdaBoost ensemble with genetic |
selection and sigmoid kernel | |
single-rbf | Single RBF kernel |
single-lin | Single linear kernel |
single-pol | Single polynomial kernel |
single-sig | Single sigmoid kernel |
single-sum-rbf-lin | Sum of RBF and linear kernels |
single-sum-rbf-pol | Sum of RBF and polynomial kernels |
adaboost-rbf | AdaBoost ensemble with RBF kernel |
adaboost-pol | AdaBoost ensemble with polynomial kernel |
adaboost-lin | AdaBoost ensemble with linear kernel |
adaboost-sig | AdaBoost ensemble with sigmoid kernel |
Model/Sample | -val. | R. | -val. | R. | -val. | R. | -val. | R. | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
half_half | ||||||||||||
phys-DY | 0.98 (0.01) | 0.86 (0.02) | 0.97 (0.01) | 0.91 (0.01) | ||||||||
adaboost-gen-rbf | 0.67 (0.01) | 0.0 | ✓✓ | 0.73 (0.04) | 0.0 | ✓✓ | 0.64 (0.02) | 0.0 | ✓✓ | 0.67 (0.01) | 0.0 | ✓✓ |
single-rbf | 0.84 (0.01) | 0.0 | ✓✓ | 0.83 (0.02) | 0.0 | ✓✓ | 0.72 (0.03) | 0.0 | ✓✓ | 0.76 (0.02) | 0.0 | ✓✓ |
single-sum-rbf-pol | 0.86 (0.01) | 0.0 | ✓✓ | 0.77 (0.03) | 0.0 | ✓✓ | 0.74 (0.03) | 0.0 | ✓✓ | 0.75 (0.02) | 0.0 | ✓✓ |
single-sum-rbf-lin | 0.85 (0.01) | 0.0 | ✓✓ | 0.74 (0.03) | 0.0 | ✓✓ | 0.74 (0.03) | 0.0 | ✓✓ | 0.74 (0.02) | 0.0 | ✓✓ |
1quart_3quart | ||||||||||||
phys-DY | 0.96 (0.01) | 0.83 (0.04) | 0.91 (0.02) | 0.89 (0.01) | ||||||||
adaboost-gen-rbf | 0.55 (0.05) | 0.0 | ✓✓ | 0.75 (0.29) | 0.36224 | ✗ | 0.88 (0.02) | 0.0 | ✓✓ | 0.87 (0.01) | 0.0 | ✓✓ |
single-rbf | 0.81 (0.02) | 0.0 | ✓✓ | 0.72 (0.06) | 0.0 | ✓✓ | 0.81 (0.02) | 0.0 | ✓✓ | 0.80 (0.02) | 0.0 | ✓✓ |
single-sum-rbf-pol | 0.85 (0.02) | 0.0 | ✓✓ | 0.87 (0.05) | 2e-05 | ✓ | 0.82 (0.02) | 0.0 | ✓✓ | 0.83 (0.01) | 0.0 | ✓✓ |
single-sum-rbf-lin | 0.87 (0.02) | 0.0 | ✓✓ | 0.86 (0.05) | 0.00025 | ✓ | 0.84 (0.02) | 0.0 | ✓✓ | 0.84 (0.01) | 0.0 | ✓✓ |
3quart_1quart | ||||||||||||
phys-DY | 0.98 (0.01) | 0.92 (0.01) | 0.99 (0.01) | 0.94 (0.01) | ||||||||
adaboost-gen-rbf | 0.57 (0.06) | 0.0 | ✓✓ | 0.85 (0.03) | 0.0 | ✓✓ | 0.29 (0.23) | 0.0 | ✓✓ | 0.77 (0.08) | 0.0 | ✓✓ |
single-rbf | 0.83 (0.02) | 0.0 | ✓✓ | 0.77 (0.02) | 0.0 | ✓✓ | 0.64 (0.08) | 0.0 | ✓✓ | 0.77 (0.02) | 0.0 | ✓✓ |
single-sum-rbf-pol | 0.84 (0.02) | 0.0 | ✓✓ | 0.82 (0.02) | 0.0 | ✓✓ | 0.85 (0.04) | 0.0 | ✓✓ | 0.83 (0.02) | 0.0 | ✓✓ |
single-sum-rbf-lin | 0.83 (0.02) | 0.0 | ✓✓ | 0.83 (0.02) | 0.0 | ✓✓ | 0.91 (0.04) | 0.0 | ✓✓ | 0.84 (0.01) | 0.0 | ✓✓ |
1dec_10dec | ||||||||||||
phys-DY | 0.96 (0.01) | 0.85 (0.07) | 0.95 (0.01) | 0.94 (0.01) | ||||||||
adaboost-gen-rbf | 0.59 (0.06) | 0.0 | ✓✓ | 0.54 (0.33) | 0.0 | ✓✓ | 0.96 (0.01) | 0.0 | ✓ | 0.94 (0.04) | 0.59846 | ✗ |
single-rbf | 0.78 (0.03) | 0.0 | ✓✓ | 0.58 (0.15) | 0.0 | ✓✓ | 0.91 (0.01) | 0.0 | ✓✓ | 0.90 (0.01) | 0.0 | ✓✓ |
single-sum-rbf-pol | 0.86 (0.02) | 0.0 | ✓✓ | 0.95 (0.06) | 0.0 | ✓ | 0.93 (0.01) | 0.0 | ✓✓ | 0.93 (0.01) | 0.0 | ✓✓ |
single-sum-rbf-lin | 0.86 (0.02) | 0.0 | ✓✓ | 0.96 (0.06) | 0.0 | ✓ | 0.92 (0.01) | 0.0 | ✓✓ | 0.92 (0.01) | 0.0 | ✓✓ |
10dec_1dec | ||||||||||||
phys-DY | 0.96 (0.02) | 0.96 (0.01) | 0.99 (0.01) | 0.96 (0.01) | ||||||||
adaboost-gen-rbf | 0.55 (0.05) | 0.0 | ✓✓ | 0.94 (0.01) | 0.0 | ✓✓ | 0.27 (0.32) | 0.0 | ✓✓ | 0.86 (0.09) | 0.0 | ✓✓ |
single-rbf | 0.84 (0.02) | 0.0 | ✓✓ | 0.90 (0.01) | 0.0 | ✓✓ | 0.55 (0.26) | 0.0 | ✓✓ | 0.90 (0.01) | 0.0 | ✓✓ |
single-sum-rbf-pol | 0.61 (0.06) | 0.0 | ✓✓ | 0.92 (0.01) | 0.0 | ✓✓ | 1.00 (0.01) | 0.17971 | ✗ | 0.92 (0.01) | 0.0 | ✓✓ |
single-sum-rbf-lin | 0.61 (0.06) | 0.0 | ✓✓ | 0.93 (0.01) | 0.0 | ✓✓ | 0.98 (0.04) | 0.01646 | ✓✓ | 0.93 (0.01) | 0.0 | ✓✓ |
III.4 Support vector machines training and testing
To evaluate the efficiency of the proposed support vector machines, we perform training and testing experiments utilizing the data described in Sec. III.1. In the training phase, a subset of the data is used to fit the model. During the testing phase, the fitted model obtains the predictions for the remaining data, where these predictions are the labels of whether a given data point is signal or background. To ensure reliable performance metrics for each support vector machine, we implement a repeated -fold cross-validation. We divide the data into folds, where each fold is used once as the test set while the remaining folds serve as the training set. This process is repeated for each of the folds. That is, the entire -fold cross-validation is repeated times, with a different random split for each repetition. Overall, this results in training and testing cycles. The reported metrics are the average values of the obtained distributions, with one standard deviation as the associated errors [45, 28].
The classifier metrics are defined in terms of the error matrix elements: is the number of true positive values, is the number of true negative values, is the number of false positive values, is the number of false negative values [46]. The metrics considered in this paper are the accuracy ACC,
(19) |
the positive precision ,
(20) |
the negative precision ,
(21) |
and the Area Under the Receiver Operating Characteristic Curve AUC. The AUC is the area under the plot of the yields at different thresholds [47]. In SVMs, these thresholds are obtained by varying the offset of the hyperplane from the origin to produce different predictions. The values of these metrics are within the range [0,1] where 0 corresponds to the worst performance and 1 to the best performance.
III.5 Computing implementation
IV Results
IV.1 Cross validation and statistical tests
In Fig. 2, we display our ACC, PREC+, PREC-, and AUC defined in Eqs. (19)-(21). The latter are calculated for the data samples listed in Table 1 and the support vector machines described in Table 2. Each point in these plots represents the mean value of the metric calculated following the cross-validation procedure described in Sec. III.4. Here we set and , that is, we compute 100 times the training and testing phases for each sample and support vector machine, and obtain a distribution for each metric. The displayed ACC, PREC+, PREC-, and AUC are calculated using the predicted classes from the test samples excluded during the training phase. In this plot, a line of a given color can show the behavior across all the proposed support vector machines for a specific data sample.
We carry out a comparison of the support vector machine containing the Drell-Yan kernel, against the support vector machines that showed the best four behaviors in Fig. 2 (we consider that the rest of the classifiers are evidently outperformed by our proposed physics-informed kernel). These are the single-sum-rbf-pol, single-sum-rbf-lin, single-single-rbf, and adaboost-gen-rbf support vector machines whose kernels are described in Table 2. In this work, we use a paired ranked Wilcoxon test [52]. This is equivalent to a Student’s -test for distributions with a non-Gaussian behavior. This test will determine if the difference between the metrics of the physics-informed support vector machine with respect to the others is statistically significant. Let be the null hypothesis that states that the metrics of the classifiers are equal. The purpose is to accept or reject in light of the distributions of the metrics obtained in the cross-validation procedure. We reject at a statistical significance level of , meaning we conclude that the ACC, PREC+, PREC-, and AUC of two classifiers are indeed not equal if the -value, coming from the Wilcoxon test, is below 0.05. Table 3 summarizes these tests, for each metric we display the mean value of its distribution along with the associated error given by the standard deviation of this distribution. Moreover, in the column named R. we display the results of the Wilcoxon test: check marks, ✓or ✓✓, indicate that the test rejects the null-hypothesis, and a cross mark, ✗, indicates that is not rejected. Table 3 contains the results for the samples described in Table 1.
IV.2 Discussion
The first feature to note from Fig. 2, is that the values for ACC are stable across the different samples. The reason for this is that this metric takes an average of both the signal and background classification results. This metric is appropriate when describing a balanced data sample. A similar pattern is observed in the values found for the AUC. Conversely, large fluctuations arise when analyzing the signal precision, PRC+, and the background precision PREC-. From the plots in Fig. 2, the most noticeable observation is when we look at the lines corresponding to the samples with imbalance 3:1 and 10:1. This poor behavior of most of the classifiers is expected since when there are not enough samples of one kind during the training phase, the support vector machine fails to describe both classes. Note that there could be a misleading assessment regarding a given classifier, as this classifier can achieve high AUC, ACC, and PREC+, while the PREC- is near zero. Therefore, this suggests that the most important metrics are the positive and negative precisions. The latter implies that a good classifier is expected to be robust against imbalances in data samples, which is typically the case in high energy physics. Remarkably, our proposed physics-informed classifier phys-DY shows high values for all the metrics presented here. The reason for this could be that we have effectively captured the intrinsic properties of the data samples by incorporating physics information into the kernel of the support vector machine. Other classifiers also exhibit stable metrics across the samples, which can be explained by the fact that their kernels are similar to the one inspired by the Drell-Yan process.
From Table 3, we can quantitatively compare the physics-informed kernel against the best-performing kernels. The first notable feature is that, in most cases, we can reject . This is evident as the ✓or ✓✓ appears in almost every case. Upon inspecting the metric values, when we reject , there are two scenarios. First, the physics-informed kernel outperforms the exotic kernel, indicated by a double check mark (✓✓). Second, the exotic kernel outperforms the physics-informed kernel, indicated by a single check mark (✓). In almost all of the metrics presented in this study, our proposed physics-informed kernel in Eq. (15) performs better than the other kernels. Specifically, when analyzing PRC+, our physics-informed kernel performs excellently for the most imbalanced data samples, 1dec_10dec. PRC+ is the metric that provides information about the performance of a classifier at finding signal events in the sample. Therefore, the PRC+ attained by the physics-informed kernel demonstrates that this kernel is useful for high energy physics data. Moreover, the physics-informed kernel presents a stable PRC- when describing all the samples, demonstrating the robustness of this kernel against imbalance in data samples. There are two other kernels that show competitive metrics, namely, the single-sum-rbf-pol and single-sum-rbf-lin kernels. These kernels are the sums of the individual kernels defined in Eqs. (3) and (5), and Eqs. (3) and (2), respectively. When comparing them with the physics-informed kernel in Eq. (15), we conclude that these kernels can both capture the dynamical properties of the Drell-Yan cross-section.
V Conclusions
In this work, we analyze several types of kernels that define a support vector machine. A physics-informed kernel is proposed to describe simulated data of a simple and well-known Standard Model process. The physics of this process is introduced to the kernel in a simple and straightforward manner, by considering the functional form of the kinematic variables found in the cross-section and then transforming the matrix that represents the data according to these functional forms. To test the effectiveness of this method, we construct unconventional kernels that a priori can overcome the typical challenges of high energy physics. We carry out statistical tests to determine if the physics-informed kernel is competitive compared to kernels constructed with sophisticated machine-learning algorithms. Remarkably, it is found that our proposed physics-informed kernel outperformed these algorithms. This finding motivates further investigation into the improvement of machine learning algorithms for more complex high energy physics data using the proposed approach. This simple method of introducing physics insights to kernel methods is proven to be effective, and since there is a connection between kernel methods and neural networks [53], the techniques we study in this paper can be extended to more modern machine learning algorithms based on kernel methods.
Acknowledgements.
This work was funded by the CONAHCYT project I1200/311/2023. T. C. P. thanks a CONAHCYT postdoctoral fellowship. A. G. R. thanks SNII (México).References
- [1] S. Whiteson, D. Whiteson, Eng. Appl. Artif. Intell 22, 8 (2009)
- [2] P.T. Komiske, E.M. Metodiev, J. Thaler , JHEP 01, 121 (2019).
- [3] K.K. Sharma, MPLA. 36, 02 (2021).
- [4] P. Baldi, P. Sadowski, D. Whiteson, Nat. Commun. 05, 4308 (2014).
- [5] A. Alves, JINST 12, 05 (2017)
- [6] T. Biswas, A. Datta, JHEP 05, 104 (2023).
- [7] P.C. Bhat, R. Gilmartin, H.B. Prosper, Phys. Rev. D 62, 074022 (2000)
- [8] P. Baldi, K. Cranmer, T. Faucett,. Sadowski, D. Whiteson, Eur. Phys. J. C. 76, 235 (2016).
- [9] A. Aurisano et al, JINST 11, P09001 (2016).
- [10] F. Bishara, A. Paul, J. Dy, Sci. Rep 14, 5294 (2024).
- [11] C.W.Murphy, Phys. Rev. D 97, 015007 (2018).
- [12] H.B. Prosper, Phys. Rev. 37, 1153 (1988)
- [13] P. Baldi, P. Sadowski, and D. Whiteson, Phys. Rev. Lett. 114, 111801 (2015)
- [14] G.C. Strong, Mach. Learn.: Sci. Technol. 1, 045006 (2020).
- [15] E. Barberio, B. Le, E. Richter-Was, Z. Was, J. Zaremba, D. Zanzi, Phys. Rev. 96, 073002 (2017).
- [16] J.Amacker, W.Balunas, L.Beresford, D.Bortoletto, J.Frost, C.Issever, J.Liu, J. McKee, A. Micheli, S.P.Saenz, M.Spannowsky, B, Stanislaus JHEP 12, 115 (2020).
- [17] G.E. Karniadakis, I.G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang. Nat. Rev. Phys. 03, 422-440 (2021).
- [18] K. Mudunuru, S. Karra, Comput. Methods Appl. Mech. Eng. 374, 113560 (2021).
- [19] V.S. Ngairangbam, M. Spannowsky, JHEP 05, 004 (2024).
- [20] C. Li, H. Qu, S. Qian, Q. Meng, S. Gong, J. Zhang, TY. Liu, Q. Li, Phys. Rev. D 109, 056003 (2024).
- [21] Z. Hao, R. Kansal, J. Duarte, N. Chernyavskaya, Eur. Phys. J. C 83, 485, (2023).
- [22] O. Atkinson, A. Bhardwaj, C. Englert, P. Konar, V.S Ngairangbam, M. Spannowsky, Front. Artif. Intell. 5, 943135 (2022).
- [23] M.Ö Sahin, D. Krücker, I.A. Melzer-Pellmann, Nucl. Instrum. Methods Phys. Res., Sect. A. 838, 137-146 (2016).
- [24] M. Aaboud et al. (ATLAS Collaboration) Phys. Rev. D 108, 032014 (2023).
- [25] A. Vaiciulis, Nucl. Instrum. Methods Phys. Res., Sect. A. 502, 2-3 (2003).
- [26] F. Sforza, V. Lippi, Nucl. Instrum. Methods Phys. Res., Sect. A. 722, 11-19 (2013).
- [27] S.. Wu, S. Sun, W. Guan, C. Zhou, J. Chan, C.L. Cheng, T. Pham, Y. Qian, A.Z. Wang, R. Zhang, M. Livny, J. Glick, P. Kl. Barkoutsos, S. Woerner, I. Tavernelli, F. Carminati, A.D. Meglio, A. C. Y. Li, J. Lykken, P. Spentzouris, S. Y. Chen, S.Yoo, T Wei, Phys. Rev. Research 3, 033221 (2021).
- [28] A. Ramirez-Morales, J.U. Salmon-Gamboa, J Li, A.G. Sanchez-Reyna, A. Palli-Valappil, Appl. Intell 53, 4996–5012 (2023).
- [29] C. Cortes, V. Vapnik, Mach. Learn. 20, 273–297 (1995).
- [30] J. Shawe-Taylor, N. Cristianini, Cambridge University Press (2004).
- [31] R.A. Horn, C.R. Johnson, Matrix Analysis. Cambridge, Cambridge University Press (2012). Horn, Roger A.; Johnson, Charles R. (2012). Matrix Analysis (2nd ed.). Cambridge University Press.
- [32] H.T. Lin, C.J. Lin, Neural Comput. 3, 1-32 (2003).
- [33] Y.W. Chang, C.J. Hsieh, K.W. Chang, M. Ringgaard, C.J. Lin, JMLR 11,4 (2010).
- [34] C. Zhang, Y. Ma, Springer 144 (2012).
- [35] O. Sagi, L. Rokach, WIRES DMKD 8, 1249 (2018).
- [36] R.E. Schapire, Y. Singer, COLT 37 (3), 297-336 (1999).
- [37] J.H. Holland, Adaptation in Natural and Artificial Systems, Univ. of Michigan Press 2ed (1992).
- [38] E.E.E. Ali, E. Elamin, King Saud Univ., Coll. of Comput. and Inf. Sci. In Proceedings of the 1st NITS (2006).
- [39] S.D. Drell, T.M. Yan, Phys. Rev. Lett. 25, 316-320 (1970).
- [40] J. M. Campbell et al, Rep. Prog. Phys. 89, 70 (2007).
- [41] C. Bierlich, et al, SciPost Phys. Codebases, 8, (2022).
- [42] R. L. Workman et al. [Particle Data Group], PTEP 2022, 083C01 (2022).
- [43] The ATLAS collaboration., M. Aaboud, G. Aad, et al. J. High Energ. Phys. 12, 59 (2017).
- [44] The CMS collaboration., A.M. Sirunyan, , A. Tumasyan, et al. J. High Energ. Phys. 12, 59 (2019).
- [45] M.Kuhn, K.Johnson, Applied Predictive Modeling, Springer 26 (2013).
- [46] D.M.W. Powers, J. Mach. Learn. Technol. 2 (2008).
- [47] A.P. Bradley, Pattern Recognition 30(7), 1145-1159 (1997).
- [48] C.R. Harris, K.J.Millman, S.J. van der Walt, et al. Nature 585, 357–362 (2020).
- [49] C.C. Chang, C.J. Lin, ACM TIST 2, 3 (2011).
- [50] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion , O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, J. Mach. Learn. Res. 12, 2825–2830 (2011).
- [51] A. Ramirez-Morales, A. Davila-Rivera, Github: SVM-physics code. https://github.com/andrex-naranjas/SVM-physics.
- [52] F. Wilcoxon, Biometrics Bulletin 1 (6), 80–83 (1945).
- [53] Wang, S., Yu, X. and Perdikaris, P., Preprint at arXiv https://arxiv.org/abs/2007.14527 (2020).