Search | arXiv e-print repository

Permutation Entropy for Signal Analysis

Authors: Bill Kay, Audun Myers, Thad Boydston, Emily Ellwein, Cameron Mackenzie, Iliana Alvarez, Erik Lentz

Abstract: Shannon Entropy is the preeminent tool for measuring the level of uncertainty (and conversely, information content) in a random variable. In the field of communications, entropy can be used to express the information content of given signals (represented as time series) by considering random variables which sample from specified subsequences. In this paper, we will discuss how an entropy variant,… ▽ More Shannon Entropy is the preeminent tool for measuring the level of uncertainty (and conversely, information content) in a random variable. In the field of communications, entropy can be used to express the information content of given signals (represented as time series) by considering random variables which sample from specified subsequences. In this paper, we will discuss how an entropy variant, the \textit{permutation entropy} can be used to study and classify radio frequency signals in a noisy environment. The permutation entropy is the entropy of the random variable which samples occurrences of permutation patterns from time series given a fixed window length, making it a function of the distribution of permutation patterns. Since the permutation entropy is a function of the relative order of data, it is (global) amplitude agnostic and thus allows for comparison between signals at different scales. This article is intended to describe a permutation patterns approach to a data driven problem in radio frequency communications research, and includes a primer on all non-permutation pattern specific background. An empirical analysis of the methods herein on radio frequency data is included. No prior knowledge of signals analysis is assumed, and permutation pattern specific notation will be included. This article serves as a self-contained introduction to the relationship between permutation patterns, entropy, and signals analysis for studying radio frequency signals and includes results on a classification task. △ Less

Submitted 24 June, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2312.00023 [pdf, other]

Hypergraph Topological Features for Autoencoder-Based Intrusion Detection for Cybersecurity Data

Authors: Bill Kay, Sinan G. Aksoy, Molly Baird, Daniel M. Best, Helen Jenne, Cliff Joslyn, Christopher Potvin, Gregory Henselman-Petrusek, Garret Seppala, Stephen J. Young, Emilie Purvine

Abstract: In this position paper, we argue that when hypergraphs are used to capture multi-way local relations of data, their resulting topological features describe global behaviour. Consequently, these features capture complex correlations that can then serve as high fidelity inputs to autoencoder-driven anomaly detection pipelines. We propose two such potential pipelines for cybersecurity data, one that… ▽ More In this position paper, we argue that when hypergraphs are used to capture multi-way local relations of data, their resulting topological features describe global behaviour. Consequently, these features capture complex correlations that can then serve as high fidelity inputs to autoencoder-driven anomaly detection pipelines. We propose two such potential pipelines for cybersecurity data, one that uses an autoencoder directly to determine network intrusions, and one that de-noises input data for a persistent homology system, PHANTOM. We provide heuristic justification for the use of the methods described therein for an intrusion detection pipeline for cyber data. We conclude by showing a small example over synthetic cyber attack data. △ Less

Submitted 9 November, 2023; originally announced December 2023.

MSC Class: 55N31

arXiv:2311.16154 [pdf]

Stepping out of Flatland: Discovering Behavior Patterns as Topological Structures in Cyber Hypergraphs

Authors: Helen Jenne, Sinan G. Aksoy, Daniel Best, Alyson Bittner, Gregory Henselman-Petrusek, Cliff Joslyn, Bill Kay, Audun Myers, Garret Seppala, Jackson Warley, Stephen J. Young, Emilie Purvine

Abstract: Data breaches and ransomware attacks occur so often that they have become part of our daily news cycle. This is due to a myriad of factors, including the increasing number of internet-of-things devices, shift to remote work during the pandemic, and advancement in adversarial techniques, which all contribute to the increase in both the complexity of data captured and the challenge of protecting our… ▽ More Data breaches and ransomware attacks occur so often that they have become part of our daily news cycle. This is due to a myriad of factors, including the increasing number of internet-of-things devices, shift to remote work during the pandemic, and advancement in adversarial techniques, which all contribute to the increase in both the complexity of data captured and the challenge of protecting our networks. At the same time, cyber research has made strides, leveraging advances in machine learning and natural language processing to focus on identifying sophisticated attacks that are known to evade conventional measures. While successful, the shortcomings of these methods, particularly the lack of interpretability, are inherent and difficult to overcome. Consequently, there is an ever-increasing need to develop new tools for analyzing cyber data to enable more effective attack detection. In this paper, we present a novel framework based in the theory of hypergraphs and topology to understand data from cyber networks through topological signatures, which are both flexible and can be traced back to the log data. While our approach's mathematical grounding requires some technical development, this pays off in interpretability, which we will demonstrate with concrete examples in a large-scale cyber network dataset. These examples are an introduction to the broader possibilities that lie ahead; our goal is to demonstrate the value of applying methods from the burgeoning fields of hypernetwork science and applied topology to understand relationships among behaviors in cyber data. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 18 pages, 11 figures. This paper is written for a general audience

MSC Class: 55N31

arXiv:2309.08010 [pdf, other]

Malicious Cyber Activity Detection Using Zigzag Persistence

Authors: Audun Myers, Alyson Bittner, Sinan Aksoy, Daniel M. Best, Gregory Henselman-Petrusek, Helen Jenne, Cliff Joslyn, Bill Kay, Garret Seppala, Stephen J. Young, Emilie Purvine

Abstract: In this study we synthesize zigzag persistence from topological data analysis with autoencoder-based approaches to detect malicious cyber activity and derive analytic insights. Cybersecurity aims to safeguard computers, networks, and servers from various forms of malicious attacks, including network damage, data theft, and activity monitoring. Here we focus on the detection of malicious activity u… ▽ More In this study we synthesize zigzag persistence from topological data analysis with autoencoder-based approaches to detect malicious cyber activity and derive analytic insights. Cybersecurity aims to safeguard computers, networks, and servers from various forms of malicious attacks, including network damage, data theft, and activity monitoring. Here we focus on the detection of malicious activity using log data. To do this we consider the dynamics of the data by exploring the changing topology of a hypergraph representation gaining insights into the underlying activity. Hypergraphs provide a natural representation of cyber log data by capturing complex interactions between processes. To study the changing topology we use zigzag persistence which captures how topological features persist at multiple dimensions over time. We observe that the resulting barcodes represent malicious activity differently than benign activity. To automate this detection we implement an autoencoder trained on a vectorization of the resulting zigzag persistence barcodes. Our experimental results demonstrate the effectiveness of the autoencoder in detecting malicious activity in comparison to standard summary statistics. Overall, this study highlights the potential of zigzag persistence and its combination with temporal hypergraphs for analyzing cybersecurity log data and detecting malicious behavior. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.04537 [pdf, other]

Community Detection in Hypergraphs via Mutual Information Maximization

Authors: Jurgen Kritschgau, Daniel Kaiser, Oliver Alvarado Rodriguez, Ilya Amburg, Jessalyn Bolkema, Thomas Grubb, Fangfei Lan, Sepideh Maleki, Phil Chodrow, Bill Kay

Abstract: The hypergraph community detection problem seeks to identify groups of related nodes in hypergraph data. We propose an information-theoretic hypergraph community detection algorithm which compresses the observed data in terms of community labels and community-edge intersections. This algorithm can also be viewed as maximum-likelihood inference in a degree-corrected microcanonical stochastic blockm… ▽ More The hypergraph community detection problem seeks to identify groups of related nodes in hypergraph data. We propose an information-theoretic hypergraph community detection algorithm which compresses the observed data in terms of community labels and community-edge intersections. This algorithm can also be viewed as maximum-likelihood inference in a degree-corrected microcanonical stochastic blockmodel. We perform the inference/compression step via simulated annealing. Unlike several recent algorithms based on canonical models, our microcanonical algorithm does not require inference of statistical parameters such as node degrees or pairwise group connection rates. Through synthetic experiments, we find that our algorithm succeeds down to recently-conjectured thresholds for sparse random hypergraphs. We also find competitive performance in cluster recovery tasks on several hypergraph data sets. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Submitted

arXiv:2303.11464 [pdf, other]

Seven open problems in applied combinatorics

Authors: Sinan G. Aksoy, Ryan Bennink, Yuzhou Chen, José Frías, Yulia R. Gel, Bill Kay, Uwe Naumann, Carlos Ortiz Marrero, Anthony V. Petyuk, Sandip Roy, Ignacio Segovia-Dominguez, Nate Veldt, Stephen J. Young

Abstract: We present and discuss seven different open problems in applied combinatorics. The application areas relevant to this compilation include quantum computing, algorithmic differentiation, topological data analysis, iterative methods, hypergraph cut algorithms, and power systems. We present and discuss seven different open problems in applied combinatorics. The application areas relevant to this compilation include quantum computing, algorithmic differentiation, topological data analysis, iterative methods, hypergraph cut algorithms, and power systems. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 43 pages, 5 figures

MSC Class: 05C90; 65Y04; 65D25; 05C65; 81P68; 62R40; 55N31; 65F10

arXiv:2302.02857 [pdf, other]

Topological Analysis of Temporal Hypergraphs

Authors: Audun Myers, Cliff Joslyn, Bill Kay, Emilie Purvine, Gregory Roek, Madelyn Shapiro

Abstract: In this work we study the topological properties of temporal hypergraphs. Hypergraphs provide a higher dimensional generalization of a graph that is capable of capturing multi-way connections. As such, they have become an integral part of network science. A common use of hypergraphs is to model events as hyperedges in which the event can involve many elements as nodes. This provides a more complet… ▽ More In this work we study the topological properties of temporal hypergraphs. Hypergraphs provide a higher dimensional generalization of a graph that is capable of capturing multi-way connections. As such, they have become an integral part of network science. A common use of hypergraphs is to model events as hyperedges in which the event can involve many elements as nodes. This provides a more complete picture of the event, which is not limited by the standard dyadic connections of a graph. However, a common attribution to events is temporal information as an interval for when the event occurred. Consequently, a temporal hypergraph is born, which accurately captures both the temporal information of events and their multi-way connections. Common tools for studying these temporal hypergraphs typically capture changes in the underlying dynamics with summary statistics of snapshots sampled in a sliding window procedure. However, these tools do not characterize the evolution of hypergraph structure over time, nor do they provide insight on persistent components which are influential to the underlying system. To alleviate this need, we leverage zigzag persistence from the field of Topological Data Analysis (TDA) to study the change in topological structure of time-evolving hypergraphs. We apply our pipeline to both a cyber security and social network dataset and show how the topological structure of their temporal hypergraphs change and can be used to understand the underlying dynamics. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2104.13983 [pdf, other]

Neuromorphic Computing is Turing-Complete

Authors: Prasanna Date, Catherine Schuman, Bill Kay, Thomas Potok

Abstract: Neuromorphic computing is a non-von Neumann computing paradigm that performs computation by emulating the human brain. Neuromorphic systems are extremely energy-efficient and known to consume thousands of times less power than CPUs and GPUs. They have the potential to drive critical use cases such as autonomous vehicles, edge computing and internet of things in the future. For this reason, they ar… ▽ More Neuromorphic computing is a non-von Neumann computing paradigm that performs computation by emulating the human brain. Neuromorphic systems are extremely energy-efficient and known to consume thousands of times less power than CPUs and GPUs. They have the potential to drive critical use cases such as autonomous vehicles, edge computing and internet of things in the future. For this reason, they are sought to be an indispensable part of the future computing landscape. Neuromorphic systems are mainly used for spike-based machine learning applications, although there are some non-machine learning applications in graph theory, differential equations, and spike-based simulations. These applications suggest that neuromorphic computing might be capable of general-purpose computing. However, general-purpose computability of neuromorphic computing has not been established yet. In this work, we prove that neuromorphic computing is Turing-complete and therefore capable of general-purpose computing. Specifically, we present a model of neuromorphic computing, with just two neuron parameters (threshold and leak), and two synaptic parameters (weight and delay). We devise neuromorphic circuits for computing all the μ-recursive functions (i.e., constant, successor and projection functions) and all the μ-recursive operators (i.e., composition, primitive recursion and minimization operators). Given that the μ-recursive functions and operators are precisely the ones that can be computed using a Turing machine, this work establishes the Turing-completeness of neuromorphic computing. △ Less

Submitted 28 April, 2021; originally announced April 2021.

MSC Class: 68Q07; 03D10 ACM Class: F.1.1; D.1.m

arXiv:2012.14600 [pdf, other]

doi 10.1371/journal.pone.0296879

doi 10.5281/zenodo.10462795

A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset

Authors: Miki E. Verma, Robert A. Bridges, Michael D. Iannacone, Samuel C. Hollifield, Pablo Moriano, Steven C. Hespeler, Bill Kay, Frank L. Combs

Abstract: Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions on CANs. Producing vehicular CAN data with a variety of intrusions is out of reach for most researchers as it requires expensive assets and expertise. To assist researchers, we… ▽ More Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions on CANs. Producing vehicular CAN data with a variety of intrusions is out of reach for most researchers as it requires expensive assets and expertise. To assist researchers, we present the first comprehensive guide to the existing open CAN intrusion datasets, including a quality analysis of each dataset and an enumeration of each's benefits, drawbacks, and suggested use case. Current public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, which lack fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but not a corresponding raw binary version. Overall, the available data pigeon-holes CAN IDS works into testing on limited, often inappropriate data (usually with attacks that are too easily detectable to truly test the method), and this lack data has stymied comparability and reproducibility of results. As our primary contribution, we present the ROAD (Real ORNL Automotive Dynamometer) CAN Intrusion Dataset, consisting of over 3.5 hours of one vehicle's CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real fuzzing, fabrication, and unique advanced attacks, as well as simulated masquerade attacks. To facilitate benchmarking CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS field. △ Less

Submitted 7 February, 2024; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: title changed and author added from original version

Journal ref: PLoS one 19, no. 1 (2024): e0296879

arXiv:2005.04171 [pdf, other]

Hyperparameter Optimization in Binary Communication Networks for Neuromorphic Deployment

Authors: Maryam Parsa, Catherine D. Schuman, Prasanna Date, Derek C. Rose, Bill Kay, J. Parker Mitchell, Steven R. Young, Ryan Dellana, William Severa, Thomas E. Potok, Kaushik Roy

Abstract: Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperpar… ▽ More Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperparameters to achieve optimal performance. In this work, we introduce a Bayesian approach for optimizing the hyperparameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware. We show that by optimizing the hyperparameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset (by up to 15 percent). This jump in performance continues to emphasize the potential when converting traditional neural networks to binary communication applicable to neuromorphic hardware. △ Less

Submitted 20 April, 2020; originally announced May 2020.

Comments: 9 pages, 3 figures, To appear in WCCI 2020

arXiv:1903.04523 [pdf, other]

The Iterated Local Model for Social Networks

Authors: Anthony Bonato, Huda Chuangpishit, Sean English, Bill Kay, Erin Meger

Abstract: On-line social networks, such as in Facebook and Twitter, are often studied from the perspective of friendship ties between agents in the network. Adversarial ties, however, also play an important role in the structure and function of social networks, but are often hidden. Underlying generative mechanisms of social networks are predicted by structural balance theory, which postulates that triads o… ▽ More On-line social networks, such as in Facebook and Twitter, are often studied from the perspective of friendship ties between agents in the network. Adversarial ties, however, also play an important role in the structure and function of social networks, but are often hidden. Underlying generative mechanisms of social networks are predicted by structural balance theory, which postulates that triads of agents, prefer to be transitive, where friends of friends are more likely friends, or anti-transitive, where adversaries of adversaries become friends. The previously proposed Iterated Local Transitivity (ILT) and Iterated Local Anti-Transitivity (ILAT) models incorporated transitivity and anti-transitivity, respectively, as evolutionary mechanisms. These models resulted in graphs with many observable properties of social networks, such as low diameter, high clustering, and densification. We propose a new, generative model, referred to as the Iterated Local Model (ILM) for social networks synthesizing both transitive and anti-transitive triads over time. In ILM, we are given a countably infinite binary sequence as input, and that sequence determines whether we apply a transitive or an anti-transitive step. The resulting model exhibits many properties of complex networks observed in the ILT and ILAT models. In particular, for any input binary sequence, we show that asymptotically the model generates finite graphs that densify, have clustering coefficient bounded away from 0, have diameter at most 3, and exhibit bad spectral expansion. We also give a thorough analysis of the chromatic number, domination number, Hamiltonicity, and isomorphism types of induced subgraphs of ILM graphs. △ Less

Submitted 11 March, 2019; originally announced March 2019.

arXiv:1211.2151 [pdf, ps, other]

Graph Odometry

Authors: Aaron Dutle, Bill Kay

Abstract: We address problem of determining edge weights on a graph using non-backtracking closed walks from a vertex. We show that the weights of all of the edges can be determined from any starting vertex exactly when the graph has minimum degree at least three. We also determine the minimum number of walks required to reveal all edge weights. We address problem of determining edge weights on a graph using non-backtracking closed walks from a vertex. We show that the weights of all of the edges can be determined from any starting vertex exactly when the graph has minimum degree at least three. We also determine the minimum number of walks required to reveal all edge weights. △ Less

Submitted 9 November, 2012; originally announced November 2012.

Comments: 14 pages, 5 figures

MSC Class: 05C22

arXiv:1109.0522 [pdf, ps, other]

Graham's Tree Reconstruction Conjecture and a Waring-Type Problem on Partitions

Authors: Joshua Cooper, Bill Kay, Anton Swifton

Abstract: Suppose $G$ is a tree. Graham's "Tree Reconstruction Conjecture" states that $G$ is uniquely determined by the integer sequence $|G|$, $|L(G)|$, $|L(L(G))|$, $|L(L(L(G)))|$, $\ldots$, where $L(H)$ denotes the line graph of the graph $H$. Little is known about this question apart from a few simple observations. We show that the number of trees on $n$ vertices which can be distinguished by their ass… ▽ More Suppose $G$ is a tree. Graham's "Tree Reconstruction Conjecture" states that $G$ is uniquely determined by the integer sequence $|G|$, $|L(G)|$, $|L(L(G))|$, $|L(L(L(G)))|$, $\ldots$, where $L(H)$ denotes the line graph of the graph $H$. Little is known about this question apart from a few simple observations. We show that the number of trees on $n$ vertices which can be distinguished by their associated integer sequences is $e^{Ω((\log n)^{3/2})}$. The proof strategy involves constructing a large collection of caterpillar graphs using partitions arising from the Prouhet-Tarry-Escott problem. △ Less

Submitted 23 August, 2017; v1 submitted 2 September, 2011; originally announced September 2011.

Comments: 18 pages, 1 figure

MSC Class: 05C76 (Primary) 05C05; 05C60; 11P05; 11P81 (Secondary) ACM Class: G.2.2

Showing 1–13 of 13 results for author: Kay, B