-
Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC
Authors:
Benjamin Lieberman,
Andreas Crivellin,
Salah-Eddine Dahbi,
Finn Stevenson,
Nidhi Tripathi,
Mukesh Kumar,
Bruce Mellado
Abstract:
To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a tr…
▽ More
To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a trials factor. As an example, we consider simulated $Zγ$ data to perform narrow resonance searches using semi-supervised NN classifiers. The results from this analysis provide substantiation that the look-elsewhere effect induced by the semi-supervised NN is under control.
△ Less
Submitted 27 June, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
COVID-19 South African Vaccine Hesitancy Models Show Boost in Performance Upon Fine-Tuning on M-pox Tweets
Authors:
Nicholas Perikli,
Srimoy Bhattacharya,
Blessing Ogbuokiri,
Zahra Movahedi Nia,
Benjamin Lieberman,
Nidhi Tripathi,
Salah-Eddine Dahbi,
Finn Stevenson,
Nicola Bragazzi,
Jude Kong,
Bruce Mellado
Abstract:
Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-p…
▽ More
Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-pox dataset before and after fine-tuning. More than 20k M-pox-related tweets from South Africa were hand-labelled as being either positive, negative or neutral. After fine-tuning these COVID-19 models on the M-pox dataset, the F1-scores increased by more than 8% falling just short of 70%, but still outperforming state-of-the-art models and well-known classification algorithms. An LDA-based topic modelling procedure was used to compare the miss-classified M-pox tweets of the original COVID-19 RoBERTa model with its fine-tuned version, and from this analysis, we were able to draw conclusions on how to build more sophisticated models.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning
Authors:
Nicholas Perikli,
Srimoy Bhattacharya,
Blessing Ogbuokiri,
Zahra Movahedi Nia,
Benjamin Lieberman,
Nidhi Tripathi,
Salah-Eddine Dahbi,
Finn Stevenson,
Nicola Bragazzi,
Jude Kong,
Bruce Mellado
Abstract:
Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, w…
▽ More
Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
An investigation of over-training within semi-supervised machine learning models in the search for heavy resonances at the LHC
Authors:
Benjamin Lieberman,
Joshua Choma,
Salah-Eddine Dahbi,
Bruce Mellado,
Xifeng Ruan
Abstract:
In particle physics, semi-supervised machine learning is an attractive option to reduce model dependencies searches beyond the Standard Model. When utilizing semi-supervised techniques in training machine learning models in the search for bosons at the Large Hadron Collider, the over-training of the model must be investigated. Internal fluctuations of the phase space and bias in training can cause…
▽ More
In particle physics, semi-supervised machine learning is an attractive option to reduce model dependencies searches beyond the Standard Model. When utilizing semi-supervised techniques in training machine learning models in the search for bosons at the Large Hadron Collider, the over-training of the model must be investigated. Internal fluctuations of the phase space and bias in training can cause semi-supervised models to label false signals within the phase space due to over-fitting. The issue of false signal generation in semi-supervised models has not been fully analyzed and therefore utilizing a toy Monte Carlo model, the probability of such situations occurring must be quantified. This investigation of $Zγ$ resonances is performed using a pure background Monte Carlo sample. Through unique pure background samples extracted to mimic ATLAS data in a background-plus-signal region, multiple runs enable the probability of these fake signals occurring due to over-training to be thoroughly investigated.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Machine learning approach for the search of resonances with topological features at the Large Hadron Collider
Authors:
Salah-eddine Dahbi,
Joshua Choma,
Bruce Mellado,
Gaogalalwe Mokgatitswane,
Xifeng Ruan,
Benjamin Lieberman,
Turgay Celik
Abstract:
The observation of resonances is unequivocal evidence of new physics beyond the Standard Model at the Large Hadron Collider (LHC). So far, inclusive and model dependent searches have not provided evidence of new resonances, indicating that these could be driven by subtle topologies. Here, we use machine learning techniques based on weak supervision to perform searches. Weak supervision based on mi…
▽ More
The observation of resonances is unequivocal evidence of new physics beyond the Standard Model at the Large Hadron Collider (LHC). So far, inclusive and model dependent searches have not provided evidence of new resonances, indicating that these could be driven by subtle topologies. Here, we use machine learning techniques based on weak supervision to perform searches. Weak supervision based on mixed samples can be used to search for resonances with little or no prior knowledge on the production mechanism. Also, it offers the advantage that sidebands or control regions can be used to effectively model backgrounds with minimal reliance on simulations. However, weak supervision alone is found to be highly inefficient in identifying corners of the multi-dimensional space of interest. Instead, we propose an approach to search for new resonances that involves a classification procedure that is signature and topology based. A combination of weak supervision with Deep Neural Network algorithms are applied following this classification. The performance of this strategy is evaluated on the production of SM Higgs boson decaying to a pair of photons inclusively and in exclusive regions of phase space tailored for specific production modes at the LHC. After verifying the ability of the methodology to extract different SM Higgs boson signal mechanisms, a search for new phenomena in high-mass final states is setup for the LHC.
△ Less
Submitted 27 October, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Prediction Regions for Poisson and Over-Dispersed Poisson Regression Models with Applications to Forecasting Number of Deaths during the COVID-19 Pandemic
Authors:
T. KIm,
B. Lieberman,
G. Luta,
E. Pena
Abstract:
Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions…
▽ More
Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.
△ Less
Submitted 6 July, 2020; v1 submitted 4 July, 2020;
originally announced July 2020.
-
Supernova Triggers for End-Devonian Extinctions
Authors:
Brian D. Fields,
Adrian L. Melott,
John Ellis,
Adrienne F. Ertel,
Brian J. Fry,
Bruce S. Lieberman,
Zhenghai Liu,
Jesse A. Miller,
Brian C. Thomas
Abstract:
The Late Devonian was a protracted period of low speciation resulting in biodiversity decline, culminating in extinction events near the Devonian-Carboniferous boundary. Recent evidence indicates that the final extinction event may have coincided with a dramatic drop in stratospheric ozone, possibly due to a global temperature rise. Here we study an alternative possible cause for the postulated oz…
▽ More
The Late Devonian was a protracted period of low speciation resulting in biodiversity decline, culminating in extinction events near the Devonian-Carboniferous boundary. Recent evidence indicates that the final extinction event may have coincided with a dramatic drop in stratospheric ozone, possibly due to a global temperature rise. Here we study an alternative possible cause for the postulated ozone drop: a nearby supernova explosion that could inflict damage by accelerating cosmic rays that can deliver ionizing radiation for up to $\sim 100$ kyr. We therefore propose that the end-Devonian extinctions were triggered by supernova explosions at $\sim 20$ pc, somewhat beyond the "kill distance" that would have precipitated a full mass extinction. Such nearby supernovae are likely due to core-collapses of massive stars; these are concentrated in the thin Galactic disk where the Sun resides. Detecting either of the long-lived radioisotopes Sm-146 or Pu-244 in one or more end-Devonian extinction strata would confirm a supernova origin, point to the core-collapse explosion of a massive star, and probe supernova nucleosythesis. Other possible tests of the supernova hypothesis are discussed.
△ Less
Submitted 25 August, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
The anomalous production of multi-lepton and its impact on the measurement of $Wh$ production at the LHC
Authors:
Yesenia Hernandez,
Mukesh Kumar,
Alan S. Cornell,
Salah-Eddine Dahbi,
Yaquan Fang,
Benjamin Lieberman,
Bruce Mellado,
Kgomotso Monnakgotla,
Xifeng Ruan,
Shuiting Xin
Abstract:
Anomalies in multi-lepton final states at the Large Hadron Collider (LHC) have been reported in Refs.~\cite{vonBuddenbrock:2017gvy,vonBuddenbrock:2019ajh}. These can be interpreted in terms of the production of a heavy boson, $H$, decaying into a Standard Model (SM) Higgs boson, $h$, and a singlet scalar, $S$, which is treated as a SM Higgs-like boson. This process would naturally affect the measu…
▽ More
Anomalies in multi-lepton final states at the Large Hadron Collider (LHC) have been reported in Refs.~\cite{vonBuddenbrock:2017gvy,vonBuddenbrock:2019ajh}. These can be interpreted in terms of the production of a heavy boson, $H$, decaying into a Standard Model (SM) Higgs boson, $h$, and a singlet scalar, $S$, which is treated as a SM Higgs-like boson. This process would naturally affect the measurement of the $Wh$ signal strength at the LHC, where $h$ is produced in association with leptons and di-jets. Here, $h$ would be produced with lower transverse momentum, $p_{Th}$, compared to SM processes. Corners of the phase-space are fixed according to the model parameters derived in Refs.~\cite{vonBuddenbrock:2016rmr,vonBuddenbrock:2017gvy} without additional tuning, thus nullifying potential look-else-where effects or selection biases. Provided that no stringent requirements are made on $p_{Th}$ or related observables, the signal strength of $Wh$ is $μ(Wh)=2.41 \pm 0.37$. This corresponds to a deviation from the SM of $3.8σ$. This result further strengthens the need to measure with precision the SM Higgs boson couplings in $e^+e^-$, and $e^-p$ collisions, in addition to $pp$ collisions.
△ Less
Submitted 13 April, 2021; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Does the Planetary Dynamo Go Cycling On? Re-examining the Evidence for Cycles in Magnetic Reversal Rate
Authors:
Adrian L. Melott,
Anthony Pivarunas,
Joseph G. Meert,
Bruce S. Lieberman
Abstract:
The record of reversals of the geomagnetic field has played an integral role in the development of plate tectonic theory. Statistical analyses of the reversal record are aimed at detailing patterns and linking those patterns to core-mantle processes. The geomagnetic polarity timescale is a dynamic record and new paleomagnetic and geochronologic data provide additional detail. In this paper, we exa…
▽ More
The record of reversals of the geomagnetic field has played an integral role in the development of plate tectonic theory. Statistical analyses of the reversal record are aimed at detailing patterns and linking those patterns to core-mantle processes. The geomagnetic polarity timescale is a dynamic record and new paleomagnetic and geochronologic data provide additional detail. In this paper, we examine the periodicity revealed in the reversal record back to 375 million years ago (Ma) using Fourier analysis. Four significant peaks were found in the reversal power spectra within the 16-40-million-year range (Myr). Plotting the function constructed from the sum of the frequencies of the proximal peaks yield a transient 26 Myr periodicity, suggesting chaotic motion with a periodic attractor. The possible 16 Myr periodicity, a previously recognized result, may be correlated with pulsation of mantle plumes and perhaps; more tentatively, with core-mantle dynamics originating near the large low shear velocity layers in the Pacific and Africa. Planetary magnetic fields shield against charged particles which can give rise to radiation at the surface and ionize the atmosphere, which is a loss mechanism particularly relevant to M stars. Understanding the origin and development of planetary magnetic fields can shed light on the habitable zone.
△ Less
Submitted 18 January, 2017; v1 submitted 25 August, 2016;
originally announced August 2016.
-
Declining Volatility, a General Property of Disparate Systems: From Fossils, to Stocks, to the Stars
Authors:
Bruce S. Lieberman,
Adrian L. Melott
Abstract:
There may be structural principles pertaining to the general behavior of systems that lead to similarities in a variety of different contexts. Classic examples include the descriptive power of fractals, the importance of surface area to volume constraints, the universality of entropy in systems, and mathematical rules of growth and form. Documenting such overarching principles may represent a rejo…
▽ More
There may be structural principles pertaining to the general behavior of systems that lead to similarities in a variety of different contexts. Classic examples include the descriptive power of fractals, the importance of surface area to volume constraints, the universality of entropy in systems, and mathematical rules of growth and form. Documenting such overarching principles may represent a rejoinder to the Neodarwinian synthesis that emphasizes adaptation and competition. Instead, these principles could indicate the importance of constraint and structure on form and evolution. Here we document a potential example of a phenomenon suggesting congruent behavior of very different systems. We focus on the notion that universally there has been a tendency for more volatile entities to disappear from systems such that the net volatility in these systems tends to decline. We specifically focus on origination and extinction rates in the marine animal fossil record, the performance of stocks in the stock market, and the characters of stars and stellar systems. We consider the evidence that each is experiencing declining volatility, and also consider the broader significance of this.
△ Less
Submitted 17 December, 2013; v1 submitted 8 June, 2012;
originally announced June 2012.
-
Whilst this Planet Has Gone Cycling On: What Role for Periodic Astronomical Phenomena in Large Scale Patterns in the History of Life?
Authors:
B. S. Lieberman,
A. L. Melott
Abstract:
One of the longstanding debates in the history of paleontology focuses on the issue of whether or not there have been long term cycles (operating over tens of millions of years) in biodiversity and extinction. Here we consider the history of this debate by connecting the skein from Grabau up to 2008. We focus on the evidence for periodicity that has emerged thus far, and conclude that there is ind…
▽ More
One of the longstanding debates in the history of paleontology focuses on the issue of whether or not there have been long term cycles (operating over tens of millions of years) in biodiversity and extinction. Here we consider the history of this debate by connecting the skein from Grabau up to 2008. We focus on the evidence for periodicity that has emerged thus far, and conclude that there is indeed some evidence that periodicity may be real, though of course more work is needed. We also comment on possible causal mechanisms, focusing especially on the motion of our solar system in the Galaxy. Moreover, we consider the reasons why some scientists have opposed periodicity over the years. Finally, we consider the significance of this for our understanding of evolution and the history of life.
△ Less
Submitted 8 June, 2012; v1 submitted 20 January, 2009;
originally announced January 2009.
-
Considering the Case for Biodiversity Cycles: Reexamining the Evidence for Periodicity in the Fossil Record
Authors:
Bruce S. Lieberman,
Adrian L. Melott
Abstract:
Medvedev and Melott (2007) have suggested that periodicity in fossil biodiversity may be induced by cosmic rays which vary as the Solar System oscillates normal to the galactic disk. We re-examine the evidence for a 62 million year (Myr) periodicity in biodiversity throughout the Phanerozoic history of animal life reported by Rohde & Mueller (2005), as well as related questions of periodicity in…
▽ More
Medvedev and Melott (2007) have suggested that periodicity in fossil biodiversity may be induced by cosmic rays which vary as the Solar System oscillates normal to the galactic disk. We re-examine the evidence for a 62 million year (Myr) periodicity in biodiversity throughout the Phanerozoic history of animal life reported by Rohde & Mueller (2005), as well as related questions of periodicity in origination and extinction. We find that the signal is robust against variations in methods of analysis, and is based on fluctuations in the Paleozoic and a substantial part of the Mesozoic. Examination of origination and extinction is somewhat ambiguous, with results depending upon procedure. Origination and extinction intensity as defined by RM may be affected by an artifact at 27 Myr in the duration of stratigraphic intervals. Nevertheless, when a procedure free of this artifact is implemented, the 27 Myr periodicity appears in origination, suggesting that the artifact may ultimately be based on a signal in the data. A 62 Myr feature appears in extinction, when this same procedure is used. We conclude that evidence for a periodicity at 62 Myr is robust, and evidence for periodicity at approximately 27 Myr is also present, albeit more ambiguous.
△ Less
Submitted 22 August, 2007; v1 submitted 22 April, 2007;
originally announced April 2007.
-
Fossil Biodiversity: Red Noise Plus Signal
Authors:
Adrian L. Melott,
Bruce S. Lieberman
Abstract:
We have examined the Fourier power spectrum as well as the Hurst exponent of extinction, origination, and total biodiversity in the marine fossil record, using a recently improved geologic timescale. We find all of them strongly inconsistent with past claims of self-similarity as well as inconsistent with random walk behavior. Instead, they are dominated by low-frequency power, with approximate…
▽ More
We have examined the Fourier power spectrum as well as the Hurst exponent of extinction, origination, and total biodiversity in the marine fossil record, using a recently improved geologic timescale. We find all of them strongly inconsistent with past claims of self-similarity as well as inconsistent with random walk behavior. Instead, they are dominated by low-frequency power, with approximate f^-2 power over one decade in frequency. The spectrum turns over at about 10^8 y, lending plausibility to connections with galactic dynamics. Even in the background of this low-frequency dominance, a previously noted 62 My biodiversity cycle stands out with better than 99% confidence above the noise level, accounting for about 35% of the total variance in the fossil biodiversity record.
△ Less
Submitted 14 June, 2006; v1 submitted 13 June, 2006;
originally announced June 2006.
-
Did a gamma-ray burst initiate the late Ordovician mass extinction?
Authors:
A. Melott,
B. Lieberman,
C. Laird,
L. Martin,
M. Medvedev,
B. Thomas,
J. Cannizzo,
N. Gehrels,
C. Jackman
Abstract:
Gamma-ray bursts (hereafter GRB) produce a flux of radiation detectable across the observable Universe, and at least some of them are associated with galaxies. A GRB within our own Ggalaxy could do considerable damage to the Earth's biosphere; rate estimates suggest that a dangerously near GRB should occur on average two or more times per billion years. At least five times in the history of life…
▽ More
Gamma-ray bursts (hereafter GRB) produce a flux of radiation detectable across the observable Universe, and at least some of them are associated with galaxies. A GRB within our own Ggalaxy could do considerable damage to the Earth's biosphere; rate estimates suggest that a dangerously near GRB should occur on average two or more times per billion years. At least five times in the history of life, the Earth experienced mass extinctions that eliminated a large percentage of the biota. Many possible causes have been documented, and GRB may also have contributed. The late Ordovician mass extinction approximately 440 million years ago may be at least partly the result of a GRB. A special feature of GRB in terms of terrestrial effects is a nearly impulsive energy input of order 10 s. Due to expected severe depletion of the ozone layer, intense solar ultraviolet radiation would result from a nearby GRB, and some of the patterns of extinction and survivorship at this time may be attributable to elevated levels of UV radiation reaching the Earth. In addition a GRB could trigger the global cooling which occurs at the end of the Ordovician period that follows an interval of relatively warm climate. Intense rapid cooling and glaciation at that time, previously identified as the probable cause of this mass extinction, may have resulted from a GRB.
△ Less
Submitted 23 April, 2004; v1 submitted 15 September, 2003;
originally announced September 2003.