-
Are LLMs Naturally Good at Synthetic Tabular Data Generation?
Authors:
Shengzhe Xu,
Cho-Ting Lee,
Mandar Sharma,
Raquib Bin Yousuf,
Nikhil Muralidhar,
Naren Ramakrishnan
Abstract:
Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Du…
▽ More
Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Due to the autoregressive nature of LLMs, fine-tuning with random order permutation runs counter to the importance of modeling functional dependencies, and renders LLMs unable to model conditional mixtures of distributions (key to capturing real world constraints). We showcase how LLMs can be made to overcome some of these deficiencies by making them permutation-aware.
△ Less
Submitted 21 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Information Guided Regularization for Fine-tuning Language Models
Authors:
Mandar Sharma,
Nikhil Muralidhar,
Shengzhe Xu,
Raquib Bin Yousuf,
Naren Ramakrishnan
Abstract:
The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is…
▽ More
The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is affected by these task-sensitive parameters through an information-theoretic lens. We then leverage the findings from our investigations to devise a novel approach to dropout for improved model regularization and better downstream generalization. This approach, named guided dropout, is both task & architecture agnostic and adds no computational overhead to the fine-tuning process. Through empirical evaluations, we showcase that our approach to regularization yields consistently better performance, even in scenarios of data paucity, compared to standardized baselines.
△ Less
Submitted 21 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Laying Anchors: Semantically Priming Numerals in Language Modeling
Authors:
Mandar Sharma,
Rutuja Murlidhar Taware,
Pravesh Koirala,
Nikhil Muralidhar,
Naren Ramakrishnan
Abstract:
Off-the-shelf pre-trained language models have become the de facto standard in NLP pipelines for a multitude of downstream tasks. However, the inability of these models to properly encode numerals limits their performance on tasks requiring numeric comprehension. We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in s…
▽ More
Off-the-shelf pre-trained language models have become the de facto standard in NLP pipelines for a multitude of downstream tasks. However, the inability of these models to properly encode numerals limits their performance on tasks requiring numeric comprehension. We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in said corpus, thereby enabling mathematically grounded representations of these numeral tokens. We establish the superiority of our proposed techniques through evaluation on a range of numeracy tasks for both in-domain (seen) and out-domain (unseen) numerals. Further, we expand our empirical evaluations to numerals ranging from 1 to 10 billion, a significantly broader range compared to previous studies of the same nature, and we demonstrate significant improvements in the mathematical grounding of our learned embeddings.
△ Less
Submitted 7 August, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless Systems
Authors:
Shengzhe Xu,
Christo Kurisummoottil Thomas,
Omar Hashash,
Nikhil Muralidhar,
Walid Saad,
Naren Ramakrishnan
Abstract:
Large language models (LLMs) and foundation models have been recently touted as a game-changer for 6G systems. However, recent efforts on LLMs for wireless networks are limited to a direct application of existing language models that were designed for natural language processing (NLP) applications. To address this challenge and create wireless-centric foundation models, this paper presents a compr…
▽ More
Large language models (LLMs) and foundation models have been recently touted as a game-changer for 6G systems. However, recent efforts on LLMs for wireless networks are limited to a direct application of existing language models that were designed for natural language processing (NLP) applications. To address this challenge and create wireless-centric foundation models, this paper presents a comprehensive vision on how to design universal foundation models that are tailored towards the deployment of artificial intelligence (AI)-native networks. Diverging from NLP-based foundation models, the proposed framework promotes the design of large multi-modal models (LMMs) fostered by three key capabilities: 1) processing of multi-modal sensing data, 2) grounding of physical symbol representations in real-world wireless systems using causal reasoning and retrieval-augmented generation (RAG), and 3) enabling instructibility from the wireless environment feedback to facilitate dynamic network adaptation thanks to logical and mathematical reasoning facilitated by neuro-symbolic AI. In essence, these properties enable the proposed LMM framework to build universal capabilities that cater to various cross-layer networking tasks and alignment of intents across different domains. Preliminary results from experimental evaluation demonstrate the efficacy of grounding using RAG in LMMs, and showcase the alignment of LMMs with wireless system designs. Furthermore, the enhanced rationale exhibited in the responses to mathematical questions by LMMs, compared to vanilla LLMs, demonstrates the logical and mathematical reasoning capabilities inherent in LMMs. Building on those results, we present a sequel of open questions and challenges for LMMs. We then conclude with a set of recommendations that ignite the path towards LMM-empowered AI-native systems.
△ Less
Submitted 7 February, 2024; v1 submitted 29 January, 2024;
originally announced February 2024.
-
Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency
Authors:
Mandar Sharma,
Nikhil Muralidhar,
Naren Ramakrishnan
Abstract:
The field of Math-NLP has witnessed significant growth in recent years, motivated by the desire to expand LLM performance to the learning of non-linguistic notions (numerals, and subsequently, arithmetic reasoning). However, non-linguistic skill injection typically comes at a cost for LLMs: it leads to catastrophic forgetting of core linguistic skills, a consequence that often remains unaddressed…
▽ More
The field of Math-NLP has witnessed significant growth in recent years, motivated by the desire to expand LLM performance to the learning of non-linguistic notions (numerals, and subsequently, arithmetic reasoning). However, non-linguistic skill injection typically comes at a cost for LLMs: it leads to catastrophic forgetting of core linguistic skills, a consequence that often remains unaddressed in the literature. As Math-NLP has been able to create LLMs that can closely approximate the mathematical skills of a grade-schooler or the arithmetic reasoning skills of a calculator, the practicality of these models fail if they concomitantly shed their linguistic capabilities. In this work, we take a closer look into the phenomena of catastrophic forgetting as it pertains to LLMs and subsequently offer a novel framework for non-linguistic skill injection for LLMs based on information theoretic interventions and skill-specific losses that enable the learning of strict arithmetic reasoning. Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention, and does so with a fraction of the non-linguistic training data (1/4) and zero additional synthetic linguistic training data.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Channel Simulation: Finite Blocklengths and Broadcast Channels
Authors:
Michael X. Cao,
Navneeth Ramakrishnan,
Mario Berta,
Marco Tomamichel
Abstract:
We study channel simulation under common randomness assistance in the finite-blocklength regime and identify the smooth channel max-information as a linear program one-shot converse on the minimal simulation cost for fixed error tolerance. We show that this one-shot converse can be achieved exactly using no-signaling-assisted codes, and approximately achieved using common randomness-assisted codes…
▽ More
We study channel simulation under common randomness assistance in the finite-blocklength regime and identify the smooth channel max-information as a linear program one-shot converse on the minimal simulation cost for fixed error tolerance. We show that this one-shot converse can be achieved exactly using no-signaling-assisted codes, and approximately achieved using common randomness-assisted codes. Our one-shot converse thus takes on an analogous role to the celebrated meta-converse in the complementary problem of channel coding, and we find tight relations between these two bounds. We asymptotically expand our bounds on the simulation cost for discrete memoryless channels, leading to the second-order as well as the moderate deviation rate expansion, which can be expressed in terms of the channel capacity and channel dispersion known from noisy channel coding. Our bounds imply the well-known fact that the optimal asymptotic rate of one channel to simulate another under common randomness assistance is given by the ratio of their respective capacities. Additionally, our higher-order asymptotic expansion shows that this reversibility falls apart in the second order. Our techniques extend to discrete memoryless broadcast channels. In stark contrast to the elusive broadcast channel capacity problem, we show that the reverse problem of broadcast channel simulation under common randomness assistance allows for an efficiently computable single-letter characterization of the asymptotic rate region in terms of the broadcast channel's multipartite mutual information.
△ Less
Submitted 5 August, 2024; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
Authors:
Mandar Sharma,
Nikhil Muralidhar,
Naren Ramakrishnan
Abstract:
Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or s…
▽ More
Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Minutiae-Guided Fingerprint Embeddings via Vision Transformers
Authors:
Steven A. Grosz,
Joshua J. Engelsma,
Rajeev Ranjan,
Naveen Ramakrishnan,
Manoj Aggarwal,
Gerard G. Medioni,
Anil K. Jain
Abstract:
Minutiae matching has long dominated the field of fingerprint recognition. However, deep networks can be used to extract fixed-length embeddings from fingerprints. To date, the few studies that have explored the use of CNN architectures to extract such embeddings have shown extreme promise. Inspired by these early works, we propose the first use of a Vision Transformer (ViT) to learn a discriminat…
▽ More
Minutiae matching has long dominated the field of fingerprint recognition. However, deep networks can be used to extract fixed-length embeddings from fingerprints. To date, the few studies that have explored the use of CNN architectures to extract such embeddings have shown extreme promise. Inspired by these early works, we propose the first use of a Vision Transformer (ViT) to learn a discriminative fixed-length fingerprint embedding. We further demonstrate that by guiding the ViT to focus in on local, minutiae related features, we can boost the recognition performance. Finally, we show that by fusing embeddings learned by CNNs and ViTs we can reach near parity with a commercial state-of-the-art (SOTA) matcher. In particular, we obtain a TAR=94.23% @ FAR=0.1% on the NIST SD 302 public-domain dataset, compared to a SOTA commercial matcher which obtains TAR=96.71% @ FAR=0.1%. Additionally, our fixed-length embeddings can be matched orders of magnitude faster than the commercial system (2.5 million matches/second compared to 50K matches/second). We make our code and models publicly available to encourage further research on this topic: https://github.com/tba.
△ Less
Submitted 25 October, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Detecting Irregular Network Activity with Adversarial Learning and Expert Feedback
Authors:
Gopikrishna Rathinavel,
Nikhil Muralidhar,
Timothy O'Shea,
Naren Ramakrishnan
Abstract:
Anomaly detection is a ubiquitous and challenging task relevant across many disciplines. With the vital role communication networks play in our daily lives, the security of these networks is imperative for smooth functioning of society. To this end, we propose a novel self-supervised deep learning framework CAAD for anomaly detection in wireless communication systems. Specifically, CAAD employs co…
▽ More
Anomaly detection is a ubiquitous and challenging task relevant across many disciplines. With the vital role communication networks play in our daily lives, the security of these networks is imperative for smooth functioning of society. To this end, we propose a novel self-supervised deep learning framework CAAD for anomaly detection in wireless communication systems. Specifically, CAAD employs contrastive learning in an adversarial setup to learn effective representations of normal and anomalous behavior in wireless networks. We conduct rigorous performance comparisons of CAAD with several state-of-the-art anomaly detection techniques and verify that CAAD yields a mean performance improvement of 92.84%. Additionally, we also augment CAAD enabling it to systematically incorporate expert feedback through a novel contrastive learning feedback loop to improve the learned representations and thereby reduce prediction uncertainty (CAAD-EF). We view CAAD-EF as a novel, holistic and widely applicable solution to anomaly detection.
△ Less
Submitted 15 October, 2022; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Memetic algorithms for Spatial Partitioning problems
Authors:
Subhodip Biswas,
Fanglan Chen,
Zhiqian Chen,
Chang-Tien Lu,
Naren Ramakrishnan
Abstract:
Spatial optimization problems (SOPs) are characterized by spatial relationships governing the decision variables, objectives, and/or constraint functions. In this article, we focus on a specific type of SOP called spatial partitioning, which is a combinatorial problem due to the presence of discrete spatial units. Exact optimization methods do not scale with the size of the problem, especially wit…
▽ More
Spatial optimization problems (SOPs) are characterized by spatial relationships governing the decision variables, objectives, and/or constraint functions. In this article, we focus on a specific type of SOP called spatial partitioning, which is a combinatorial problem due to the presence of discrete spatial units. Exact optimization methods do not scale with the size of the problem, especially within practicable time limits. This motivated us to develop population-based metaheuristics for solving such SOPs. However, the search operators employed by these population-based methods are mostly designed for real-parameter continuous optimization problems. For adapting these methods to SOPs, we apply domain knowledge in designing spatially-aware search operators for efficiently searching through the discrete search space while preserving the spatial constraints. To this end, we put forward a simple yet effective algorithm called swarm-based spatial memetic algorithm (SPATIAL) and test it on the school (re)districting problem. Detailed experimental investigations are performed on real-world datasets to evaluate the performance of SPATIAL. Besides, ablation studies are performed to understand the role of the individual components of SPATIAL. Additionally, we discuss how SPATIAL~is helpful in the real-life planning process and its applicability to different scenarios and motivate future research directions.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Scrutinizing Shipment Records To Thwart Illegal Timber Trade
Authors:
Debanjan Datta,
Sathappan Muthiah,
John Simeone,
Amelia Meadows,
Naren Ramakrishnan
Abstract:
Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain…
▽ More
Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain and have been tied to illicit financial flows, like trade-based money laundering, document fraud, species mislabeling, and other illegal activities. The task of finding such fraudulent activities using trade data, in the absence of ground truth, can be modelled as an unsupervised anomaly detection problem. However existing approaches suffer from certain shortcomings in their applicability towards large scale trade data. Trade data is heterogeneous, with both categorical and numerical attributes in a tabular format. The overall challenge lies in the complexity, volume and velocity of data, with large number of entities and lack of ground truth labels. To mitigate these, we propose a novel unsupervised anomaly detection -- Contrastive Learning based Heterogeneous Anomaly Detection (CHAD) that is generally applicable for large-scale heterogeneous tabular data. We demonstrate our model CHAD performs favorably against multiple comparable baselines for public benchmark datasets, and outperforms them in the case of trade data. More importantly we demonstrate our approach reduces assumptions and efforts required hyperparameter tuning, which is a key challenging aspect in an unsupervised training paradigm. Specifically, our overarching objective pertains to detecting suspicious timber shipments and patterns using Bill of Lading trade record data. Detecting anomalous transactions in shipment records can enable further investigation by government agencies and supply chain constituents.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Innovations in Neural Data-to-text Generation: A Survey
Authors:
Mandar Sharma,
Ajay Gogineni,
Naren Ramakrishnan
Abstract:
The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest…
▽ More
The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.
△ Less
Submitted 1 April, 2024; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn't, and Future Directions
Authors:
Raquib Bin Yousuf,
Subhodip Biswas,
Kulendra Kumar Kaushal,
James Dunham,
Rebecca Gelles,
Sathappan Muthiah,
Nathan Self,
Patrick Butler,
Naren Ramakrishnan
Abstract:
Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key insights are only available when considering full-text. Although researchers have made significant progress in information extraction from short documents, extract…
▽ More
Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key insights are only available when considering full-text. Although researchers have made significant progress in information extraction from short documents, extraction of scientific entities from full-text scholarly literature remains a challenging problem. This work presents an automated End-to-end Research Entity Extractor called EneRex to extract technical facets such as dataset usage, objective task, method from full-text scholarly research articles. Additionally, we extracted three novel facets, e.g., links to source code, computing resources, programming language/libraries from full-text articles. We demonstrate how EneRex is able to extract key insights and trends from a large-scale dataset in the domain of computer science. We further test our pipeline on multiple datasets and found that the EneRex improves upon a state of the art model. We highlight how the existing datasets are limited in their capacity and how EneRex may fit into an existing knowledge graph. We also present a detailed discussion with pointers for future research. Our code and data are publicly available at https://github.com/DiscoveryAnalyticsCenter/EneRex.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Framing Algorithmic Recourse for Anomaly Detection
Authors:
Debanjan Datta,
Feng Chen,
Naren Ramakrishnan
Abstract:
The problem of algorithmic recourse has been explored for supervised machine learning models, to provide more interpretable, transparent and robust outcomes from decision support systems. An unexplored area is that of algorithmic recourse for anomaly detection, specifically for tabular data with only discrete feature values. Here the problem is to present a set of counterfactuals that are deemed n…
▽ More
The problem of algorithmic recourse has been explored for supervised machine learning models, to provide more interpretable, transparent and robust outcomes from decision support systems. An unexplored area is that of algorithmic recourse for anomaly detection, specifically for tabular data with only discrete feature values. Here the problem is to present a set of counterfactuals that are deemed normal by the underlying anomaly detection model so that applications can utilize this information for explanation purposes or to recommend countermeasures. We present an approach -- Context preserving Algorithmic Recourse for Anomalies in Tabular data (CARAT), that is effective, scalable, and agnostic to the underlying anomaly detection model. CARAT uses a transformer based encoder-decoder model to explain an anomaly by finding features with low likelihood. Subsequently semantically coherent counterfactuals are generated by modifying the highlighted features, using the overall context of features in the anomalous instance(s). Extensive experiments help demonstrate the efficacy of CARAT.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Sampling-based techniques for designing school boundaries
Authors:
Subhodip Biswas,
Fanglan Chen,
Zhiqian Chen,
Chang-Tien Lu,
Naren Ramakrishnan
Abstract:
Recently, an increasing number of researchers, especially in the realm of political redistricting, have proposed sampling-based techniques to generate a subset of plans from the vast space of districting plans. These techniques have been increasingly adopted by U.S. courts of law and independent commissions as a tool for identifying partisan gerrymanders. Motivated by these recent developments, we…
▽ More
Recently, an increasing number of researchers, especially in the realm of political redistricting, have proposed sampling-based techniques to generate a subset of plans from the vast space of districting plans. These techniques have been increasingly adopted by U.S. courts of law and independent commissions as a tool for identifying partisan gerrymanders. Motivated by these recent developments, we develop a set of similar sampling techniques for designing school boundaries based on the flip proposal. Note that the flip proposal here refers to the change in the districting plan by a single assignment. These sampling-based techniques serve a dual purpose. They can be used as a baseline for comparing redistricting algorithms based on local search. Additionally, these techniques can help to infer the problem characteristics that may be further used for developing efficient redistricting methods. We empirically touch on both these aspects in regards to the problem of school redistricting.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Improving Zero-Shot Event Extraction via Sentence Simplification
Authors:
Sneha Mehta,
Huzefa Rangwala,
Naren Ramakrishnan
Abstract:
The success of sites such as ACLED and Our World in Data have demonstrated the massive utility of extracting events in structured formats from large volumes of textual data in the form of news, social media, blogs and discussion forums. Event extraction can provide a window into ongoing geopolitical crises and yield actionable intelligence. With the proliferation of large pretrained language model…
▽ More
The success of sites such as ACLED and Our World in Data have demonstrated the massive utility of extracting events in structured formats from large volumes of textual data in the form of news, social media, blogs and discussion forums. Event extraction can provide a window into ongoing geopolitical crises and yield actionable intelligence. With the proliferation of large pretrained language models, Machine Reading Comprehension (MRC) has emerged as a new paradigm for event extraction in recent times. In this approach, event argument extraction is framed as an extractive question-answering task. One of the key advantages of the MRC-based approach is its ability to perform zero-shot extraction. However, the problem of long-range dependencies, i.e., large lexical distance between trigger and argument words and the difficulty of processing syntactically complex sentences plague MRC-based approaches. In this paper, we present a general approach to improve the performance of MRC-based event extraction by performing unsupervised sentence simplification guided by the MRC model itself. We evaluate our approach on the ICEWS geopolitical event extraction dataset, with specific attention to `Actor' and `Target' argument roles. We show how such context simplification can improve the performance of MRC-based event extraction by more than 5% for actor extraction and more than 10% for target extraction.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
PseudoProp: Robust Pseudo-Label Generation for Semi-Supervised Object Detection in Autonomous Driving Systems
Authors:
Shu Hu,
Chun-Hao Liu,
Jayanta Dutta,
Ming-Ching Chang,
Siwei Lyu,
Naveen Ramakrishnan
Abstract:
Semi-supervised object detection methods are widely used in autonomous driving systems, where only a fraction of objects are labeled. To propagate information from the labeled objects to the unlabeled ones, pseudo-labels for unlabeled objects must be generated. Although pseudo-labels have proven to improve the performance of semi-supervised object detection significantly, the applications of image…
▽ More
Semi-supervised object detection methods are widely used in autonomous driving systems, where only a fraction of objects are labeled. To propagate information from the labeled objects to the unlabeled ones, pseudo-labels for unlabeled objects must be generated. Although pseudo-labels have proven to improve the performance of semi-supervised object detection significantly, the applications of image-based methods to video frames result in numerous miss or false detections using such generated pseudo-labels. In this paper, we propose a new approach, PseudoProp, to generate robust pseudo-labels by leveraging motion continuity in video frames. Specifically, PseudoProp uses a novel bidirectional pseudo-label propagation approach to compensate for misdetection. A feature-based fusion technique is also used to suppress inference noise. Extensive experiments on the large-scale Cityscapes dataset demonstrate that our method outperforms the state-of-the-art semi-supervised object detection methods by 7.4% on mAP75.
△ Less
Submitted 16 April, 2022; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Contrastive Graph Convolutional Networks for Hardware Trojan Detection in Third Party IP Cores
Authors:
Nikhil Muralidhar,
Abdullah Zubair,
Nathanael Weidler,
Ryan Gerdes,
Naren Ramakrishnan
Abstract:
The availability of wide-ranging third-party intellectual property (3PIP) cores enables integrated circuit (IC) designers to focus on designing high-level features in ASICs/SoCs. The massive proliferation of ICs brings with it an increased number of bad actors seeking to exploit those circuits for various nefarious reasons. This is not surprising as integrated circuits affect every aspect of socie…
▽ More
The availability of wide-ranging third-party intellectual property (3PIP) cores enables integrated circuit (IC) designers to focus on designing high-level features in ASICs/SoCs. The massive proliferation of ICs brings with it an increased number of bad actors seeking to exploit those circuits for various nefarious reasons. This is not surprising as integrated circuits affect every aspect of society. Thus, malicious logic (Hardware Trojans, HT) being surreptitiously injected by untrusted vendors into 3PIP cores used in IC design is an ever present threat. In this paper, we explore methods for identification of trigger-based HT in designs containing synthesizable IP cores without a golden model. Specifically, we develop methods to detect hardware trojans by detecting triggers embedded in ICs purely based on netlists acquired from the vendor. We propose GATE-Net, a deep learning model based on graph-convolutional networks (GCN) trained using supervised contrastive learning, for flagging designs containing randomly-inserted triggers using only the corresponding netlist. Our proposed architecture achieves significant improvements over state-of-the-art learning models yielding an average 46.99% improvement in detection performance for combinatorial triggers and 21.91% improvement for sequential triggers across a variety of circuit types. Through rigorous experimentation, qualitative and quantitative performance evaluations, we demonstrate effectiveness of GATE-Net and the supervised contrastive training of GATE-Net for HT detection.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
EINNs: Epidemiologically-informed Neural Networks
Authors:
Alexander RodrÃguez,
Jiaming Cui,
Naren Ramakrishnan,
Bijaya Adhikari,
B. Aditya Prakash
Abstract:
We introduce EINNs, a framework crafted for epidemic forecasting that builds upon the theoretical grounds provided by mechanistic models as well as the data-driven expressibility afforded by AI models, and their capabilities to ingest heterogeneous information. Although neural forecasting models have been successful in multiple tasks, predictions well-correlated with epidemic trends and long-term…
▽ More
We introduce EINNs, a framework crafted for epidemic forecasting that builds upon the theoretical grounds provided by mechanistic models as well as the data-driven expressibility afforded by AI models, and their capabilities to ingest heterogeneous information. Although neural forecasting models have been successful in multiple tasks, predictions well-correlated with epidemic trends and long-term predictions remain open challenges. Epidemiological ODE models contain mechanisms that can guide us in these two tasks; however, they have limited capability of ingesting data sources and modeling composite signals. Thus, we propose to leverage work in physics-informed neural networks to learn latent epidemic dynamics and transfer relevant knowledge to another neural network which ingests multiple data sources and has more appropriate inductive bias. In contrast with previous work, we do not assume the observability of complete dynamics and do not need to numerically solve the ODE equations during training. Our thorough experiments on all US states and HHS regions for COVID-19 and influenza forecasting showcase the clear benefits of our approach in both short-term and long-term forecasting as well as in learning the mechanistic dynamics over other non-trivial alternatives.
△ Less
Submitted 10 January, 2023; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Moderate deviation expansion for fully quantum tasks
Authors:
Navneeth Ramakrishnan,
Marco Tomamichel,
Mario Berta
Abstract:
The moderate deviation regime is concerned with the finite block length trade-off between communication cost and error for information processing tasks in the asymptotic regime, where the communication cost approaches a capacity-like quantity and the error vanishes at the same time. We find exact characterisations of these trade-offs for a variety of fully quantum communication tasks, including qu…
▽ More
The moderate deviation regime is concerned with the finite block length trade-off between communication cost and error for information processing tasks in the asymptotic regime, where the communication cost approaches a capacity-like quantity and the error vanishes at the same time. We find exact characterisations of these trade-offs for a variety of fully quantum communication tasks, including quantum source coding, quantum state splitting, entanglement-assisted quantum channel coding, and entanglement-assisted quantum channel simulation. The main technical tool we derive is a tight relation between the partially smoothed max-information and the hypothesis testing relative entropy. This allows us to obtain the expansion of the partially smoothed max-information for i.i.d. states in the moderate deviation regime.
△ Less
Submitted 8 October, 2023; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Deep diffusion-based forecasting of COVID-19 by incorporating network-level mobility information
Authors:
Padmaksha Roy,
Shailik Sarkar,
Subhodip Biswas,
Fanglan Chen,
Zhiqian Chen,
Naren Ramakrishnan,
Chang-Tien Lu
Abstract:
Modeling the spatiotemporal nature of the spread of infectious diseases can provide useful intuition in understanding the time-varying aspect of the disease spread and the underlying complex spatial dependency observed in people's mobility patterns. Besides, the county level multiple related time series information can be leveraged to make a forecast on an individual time series. Adding to this ch…
▽ More
Modeling the spatiotemporal nature of the spread of infectious diseases can provide useful intuition in understanding the time-varying aspect of the disease spread and the underlying complex spatial dependency observed in people's mobility patterns. Besides, the county level multiple related time series information can be leveraged to make a forecast on an individual time series. Adding to this challenge is the fact that real-time data often deviates from the unimodal Gaussian distribution assumption and may show some complex mixed patterns. Motivated by this, we develop a deep learning-based time-series model for probabilistic forecasting called Auto-regressive Mixed Density Dynamic Diffusion Network(ARM3Dnet), which considers both people's mobility and disease spread as a diffusion process on a dynamic directed graph. The Gaussian Mixture Model layer is implemented to consider the multimodal nature of the real-time data while learning from multiple related time series. We show that our model, when trained with the best combination of dynamic covariate features and mixture components, can outperform both traditional statistical and deep learning models in forecasting the number of Covid-19 deaths and cases at the county level in the United States.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
TCube: Domain-Agnostic Neural Time-series Narration
Authors:
Mandar Sharma,
John S. Brownstein,
Naren Ramakrishnan
Abstract:
The task of generating rich and fluent narratives that aptly describe the characteristics, trends, and anomalies of time-series data is invaluable to the sciences (geology, meteorology, epidemiology) or finance (trades, stocks, or sales and inventory). The efforts for time-series narration hitherto are domain-specific and use predefined templates that offer consistency but lead to mechanical narra…
▽ More
The task of generating rich and fluent narratives that aptly describe the characteristics, trends, and anomalies of time-series data is invaluable to the sciences (geology, meteorology, epidemiology) or finance (trades, stocks, or sales and inventory). The efforts for time-series narration hitherto are domain-specific and use predefined templates that offer consistency but lead to mechanical narratives. We present TCube (Time-series-to-text), a domain-agnostic neural framework for time-series narration, that couples the representation of essential time-series elements in the form of a dense knowledge graph and the translation of said knowledge graph into rich and fluent narratives through the transfer-learning capabilities of PLMs (Pre-trained Language Models). TCube's design primarily addresses the challenge that lies in building a neural framework in the complete paucity of annotated training data for time-series. The design incorporates knowledge graphs as an intermediary for the representation of essential time-series elements which can be linearized for textual translation. To the best of our knowledge, TCube is the first investigation of the use of neural strategies for time-series narration. Through extensive evaluations, we show that TCube can improve the lexical diversity of the generated narratives by up to 65.38% while still maintaining grammatical integrity. The practicality and deployability of TCube is further validated through an expert review (n=21) where 76.2% of participating experts wary of auto-generated narratives favored TCube as a deployable system for time-series narration due to its richer narratives. Our code-base, models, and datasets, with detailed instructions for reproducibility is publicly hosted at https://github.com/Mandar-Sharma/TCube.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Using AntiPatterns to avoid MLOps Mistakes
Authors:
Nikhil Muralidhar,
Sathappah Muthiah,
Patrick Butler,
Manish Jain,
Yu Yu,
Katy Burne,
Weipeng Li,
David Jones,
Prakash Arunachalam,
Hays 'Skip' McCormick,
Naren Ramakrishnan
Abstract:
We describe lessons learned from developing and deploying machine learning models at scale across the enterprise in a range of financial analytics applications. These lessons are presented in the form of antipatterns. Just as design patterns codify best software engineering practices, antipatterns provide a vocabulary to describe defective practices and methodologies. Here we catalog and document…
▽ More
We describe lessons learned from developing and deploying machine learning models at scale across the enterprise in a range of financial analytics applications. These lessons are presented in the form of antipatterns. Just as design patterns codify best software engineering practices, antipatterns provide a vocabulary to describe defective practices and methodologies. Here we catalog and document numerous antipatterns in financial ML operations (MLOps). Some antipatterns are due to technical errors, while others are due to not having sufficient knowledge of the surrounding context in which ML results are used. By providing a common vocabulary to discuss these situations, our intent is that antipatterns will support better documentation of issues, rapid communication between stakeholders, and faster resolution of problems. In addition to cataloging antipatterns, we describe solutions, best practices, and future directions toward MLOps maturity.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Detecting Anomalies Through Contrast in Heterogeneous Data
Authors:
Debanjan Datta,
Sathappan Muthiah,
Naren Ramakrishnan
Abstract:
Detecting anomalies has been a fundamental approach in detecting potentially fraudulent activities. Tasked with detection of illegal timber trade that threatens ecosystems and economies and association with other illegal activities, we formulate our problem as one of anomaly detection. Among other challenges annotations are unavailable for our large-scale trade data with heterogeneous features (ca…
▽ More
Detecting anomalies has been a fundamental approach in detecting potentially fraudulent activities. Tasked with detection of illegal timber trade that threatens ecosystems and economies and association with other illegal activities, we formulate our problem as one of anomaly detection. Among other challenges annotations are unavailable for our large-scale trade data with heterogeneous features (categorical and continuous), that can assist in building automated systems to detect fraudulent transactions. Modelling the task as unsupervised anomaly detection, we propose a novel model Contrastive Learning based Heterogeneous Anomaly Detector to address shortcomings of prior models. Our model uses an asymmetric autoencoder that can effectively handle large arity categorical variables, but avoids assumptions about structure of data in low-dimensional latent space and is robust to changes to hyper-parameters. The likelihood of data is approximated through an estimator network, which is jointly trained with the autoencoder,using negative sampling. Further the details and intuition for an effective negative sample generation approach for heterogeneous data are outlined. We provide a qualitative study to showcase the effectiveness of our model in detecting anomalies in timber trade.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Incorporating Expert Guidance in Epidemic Forecasting
Authors:
Alexander RodrÃguez,
Bijaya Adhikari,
Naren Ramakrishnan,
B. Aditya Prakash
Abstract:
Forecasting influenza like illnesses (ILI) has rapidly progressed in recent years from an art to a science with a plethora of data-driven methods. While these methods have achieved qualified success, their applicability is limited due to their inability to incorporate expert feedback and guidance systematically into the forecasting framework. We propose a new approach leveraging the Seldonian opti…
▽ More
Forecasting influenza like illnesses (ILI) has rapidly progressed in recent years from an art to a science with a plethora of data-driven methods. While these methods have achieved qualified success, their applicability is limited due to their inability to incorporate expert feedback and guidance systematically into the forecasting framework. We propose a new approach leveraging the Seldonian optimization framework from AI safety and demonstrate how it can be adapted to epidemic forecasting. We study two types of guidance: smoothness and regional consistency of errors, where we show that by its successful incorporation, we are able to not only bound the probability of undesirable behavior to happen, but also to reduce RMSE on test data by up to 17%.
△ Less
Submitted 24 December, 2020;
originally announced January 2021.
-
Better call Surrogates: A hybrid Evolutionary Algorithm for Hyperparameter optimization
Authors:
Subhodip Biswas,
Adam D Cobb,
Andreea Sistrunk,
Naren Ramakrishnan,
Brian Jalaian
Abstract:
In this paper, we propose a surrogate-assisted evolutionary algorithm (EA) for hyperparameter optimization of machine learning (ML) models. The proposed STEADE model initially estimates the objective function landscape using RadialBasis Function interpolation, and then transfers the knowledge to an EA technique called Differential Evolution that is used to evolve new solutions guided by a Bayesian…
▽ More
In this paper, we propose a surrogate-assisted evolutionary algorithm (EA) for hyperparameter optimization of machine learning (ML) models. The proposed STEADE model initially estimates the objective function landscape using RadialBasis Function interpolation, and then transfers the knowledge to an EA technique called Differential Evolution that is used to evolve new solutions guided by a Bayesian optimization framework. We empirically evaluate our model on the hyperparameter optimization problems as a part of the black box optimization challenge at NeurIPS 2020 and demonstrate the improvement brought about by STEADE over the vanilla EA.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
STAN: Synthetic Network Traffic Generation with Generative Neural Models
Authors:
Shengzhe Xu,
Manish Marwah,
Martin Arlitt,
Naren Ramakrishnan
Abstract:
Deep learning models have achieved great success in recent years but progress in some domains like cybersecurity is stymied due to a paucity of realistic datasets. Organizations are reluctant to share such data, even internally, due to privacy reasons. An alternative is to use synthetically generated data but existing methods are limited in their ability to capture complex dependency structures, b…
▽ More
Deep learning models have achieved great success in recent years but progress in some domains like cybersecurity is stymied due to a paucity of realistic datasets. Organizations are reluctant to share such data, even internally, due to privacy reasons. An alternative is to use synthetically generated data but existing methods are limited in their ability to capture complex dependency structures, between attributes and across time. This paper presents STAN (Synthetic network Traffic generation with Autoregressive Neural models), a tool to generate realistic synthetic network traffic datasets for subsequent downstream applications. Our novel neural architecture captures both temporal dependencies and dependence between attributes at any given time. It integrates convolutional neural layers with mixture density neural layers and softmax layers, and models both continuous and discrete variables. We evaluate the performance of STAN in terms of the quality of data generated, by training it on both a simulated dataset and a real network traffic data set. Finally, to answer the question - can real network traffic data be substituted with synthetic data to train models of comparable accuracy? We train two anomaly detection models based on self-supervision. The results show only a small decline in the accuracy of models trained solely on synthetic data. While current results are encouraging in terms of quality of data generated and absence of any obvious data leakage from training data, in the future we plan to further validate this fact by conducting privacy attacks on the generated data. Other future work includes validating capture of long term dependencies and making model training
△ Less
Submitted 2 August, 2021; v1 submitted 27 September, 2020;
originally announced September 2020.
-
Steering a Historical Disease Forecasting Model Under a Pandemic: Case of Flu and COVID-19
Authors:
Alexander RodrÃguez,
Nikhil Muralidhar,
Bijaya Adhikari,
Anika Tabassum,
Naren Ramakrishnan,
B. Aditya Prakash
Abstract:
Forecasting influenza in a timely manner aids health organizations and policymakers in adequate preparation and decision making. However, effective influenza forecasting still remains a challenge despite increasing research interest. It is even more challenging amidst the COVID pandemic, when the influenza-like illness (ILI) counts are affected by various factors such as symptomatic similarities w…
▽ More
Forecasting influenza in a timely manner aids health organizations and policymakers in adequate preparation and decision making. However, effective influenza forecasting still remains a challenge despite increasing research interest. It is even more challenging amidst the COVID pandemic, when the influenza-like illness (ILI) counts are affected by various factors such as symptomatic similarities with COVID-19 and shift in healthcare seeking patterns of the general population. Under the current pandemic, historical influenza models carry valuable expertise about the disease dynamics but face difficulties adapting. Therefore, we propose CALI-Net, a neural transfer learning architecture which allows us to 'steer' a historical disease forecasting model to new scenarios where flu and COVID co-exist. Our framework enables this adaptation by automatically learning when it should emphasize learning from COVID-related signals and when it should learn from the historical model. Thus, we exploit representations learned from historical ILI data as well as the limited COVID-related signals. Our experiments demonstrate that our approach is successful in adapting a historical forecasting model to the current pandemic. In addition, we show that success in our primary goal, adaptation, does not sacrifice overall performance as compared with state-of-the-art influenza forecasting approaches.
△ Less
Submitted 23 December, 2020; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Once Upon A Time In Visualization: Understanding the Use of Textual Narratives for Causality
Authors:
Arjun Choudhry,
Mandar Sharma,
Pramod Chundury,
Thomas Kapler,
Derek W. S. Gray,
Naren Ramakrishnan,
Niklas Elmqvist
Abstract:
Causality visualization can help people understand temporal chains of events, such as messages sent in a distributed system, cause and effect in a historical conflict, or the interplay between political actors over time. However, as the scale and complexity of these event sequences grows, even these visualizations can become overwhelming to use. In this paper, we propose the use of textual narrati…
▽ More
Causality visualization can help people understand temporal chains of events, such as messages sent in a distributed system, cause and effect in a historical conflict, or the interplay between political actors over time. However, as the scale and complexity of these event sequences grows, even these visualizations can become overwhelming to use. In this paper, we propose the use of textual narratives as a data-driven storytelling method to augment causality visualization. We first propose a design space for how textual narratives can be used to describe causal data. We then present results from a crowdsourced user study where participants were asked to recover causality information from two causality visualizations--causal graphs and Hasse diagrams--with and without an associated textual narrative. Finally, we describe CAUSEWORKS, a causality visualization system for understanding how specific interventions influence a causal model. The system incorporates an automatic textual narrative mechanism based on our design space. We validate CAUSEWORKS through interviews with experts who used the system for understanding complex events.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis
Authors:
Bing He,
Caleb Ziems,
Sandeep Soni,
Naren Ramakrishnan,
Diyi Yang,
Srijan Kumar
Abstract:
The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterspeech in mitigating this spread. In this work, we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and…
▽ More
The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterspeech in mitigating this spread. In this work, we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months, containing over 206 million tweets, and a social network with over 127 million nodes. By creating a novel hand-labeled dataset of 3,355 tweets, we train a text classifier to identify hate and counterspeech tweets that achieves an average macro-F1 score of 0.832. Using this dataset, we conduct longitudinal analysis of tweets and users. Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities. We find that nodes were highly likely to become hateful after being exposed to hateful content. Notably, counterspeech messages may discourage users from turning hateful, potentially suggesting a solution to curb hate on web and social media platforms. Data and code is at http://claws.cc.gatech.edu/covid.
△ Less
Submitted 10 November, 2021; v1 submitted 25 May, 2020;
originally announced May 2020.
-
Low Rank Factorization for Compact Multi-Head Self-Attention
Authors:
Sneha Mehta,
Huzefa Rangwala,
Naren Ramakrishnan
Abstract:
Effective representation learning from text has been an active area of research in the fields of NLP and text mining. Attention mechanisms have been at the forefront in order to learn contextual sentence representations. Current state-of-the-art approaches for many NLP tasks use large pre-trained language models such as BERT, XLNet and so on for learning representations. These models are based on…
▽ More
Effective representation learning from text has been an active area of research in the fields of NLP and text mining. Attention mechanisms have been at the forefront in order to learn contextual sentence representations. Current state-of-the-art approaches for many NLP tasks use large pre-trained language models such as BERT, XLNet and so on for learning representations. These models are based on the Transformer architecture that involves recurrent blocks of computation consisting of multi-head self-attention and feedforward networks. One of the major bottlenecks largely contributing to the computational complexity of the Transformer models is the self-attention layer, that is both computationally expensive and parameter intensive. In this work, we introduce a novel multi-head self-attention mechanism operating on GRUs that is shown to be computationally cheaper and more parameter efficient than self-attention mechanism proposed in Transformers for text classification tasks. The efficiency of our approach mainly stems from two optimizations; 1) we use low-rank matrix factorization of the affinity matrix to efficiently get multiple attention distributions instead of having separate parameters for each head 2) attention scores are obtained by querying a global context vector instead of densely querying all the words in the sentence. We evaluate the performance of the proposed model on tasks such as sentiment analysis from movie reviews, predicting business ratings from reviews and classifying news articles into topics. We find that the proposed approach matches or outperforms a series of strong baselines and is more parameter efficient than comparable multi-head approaches. We also perform qualitative analyses to verify that the proposed approach is interpretable and captures context-dependent word importance.
△ Less
Submitted 9 August, 2020; v1 submitted 26 November, 2019;
originally announced December 2019.
-
Physics-guided Design and Learning of Neural Networks for Predicting Drag Force on Particle Suspensions in Moving Fluids
Authors:
Nikhil Muralidhar,
Jie Bu,
Ze Cao,
Long He,
Naren Ramakrishnan,
Danesh Tafti,
Anuj Karpatne
Abstract:
Physics-based simulations are often used to model and understand complex physical systems and processes in domains like fluid dynamics. Such simulations, although used frequently, have many limitations which could arise either due to the inability to accurately model a physical process owing to incomplete knowledge about certain facets of the process or due to the underlying process being too comp…
▽ More
Physics-based simulations are often used to model and understand complex physical systems and processes in domains like fluid dynamics. Such simulations, although used frequently, have many limitations which could arise either due to the inability to accurately model a physical process owing to incomplete knowledge about certain facets of the process or due to the underlying process being too complex to accurately encode into a simulation model. In such situations, it is often useful to rely on machine learning methods to fill in the gap by learning a model of the complex physical process directly from simulation data. However, as data generation through simulations is costly, we need to develop models, being cognizant of data paucity issues. In such scenarios it is often helpful if the rich physical knowledge of the application domain is incorporated in the architectural design of machine learning models. Further, we can also use information from physics-based simulations to guide the learning process using aggregate supervision to favorably constrain the learning process. In this paper, we propose PhyDNN, a deep learning model using physics-guided structural priors and physics-guided aggregate supervision for modeling the drag forces acting on each particle in a Computational Fluid Dynamics-Discrete Element Method(CFD-DEM). We conduct extensive experiments in the context of drag force prediction and showcase the usefulness of including physics knowledge in our deep learning formulation both in the design and through learning process. Our proposed PhyDNN model has been compared to several state-of-the-art models and achieves a significant performance improvement of 8.46% on average across all baseline models. The source code has been made available and the dataset used is detailed in [1, 2].
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
Mitigating Uncertainty in Document Classification
Authors:
Xuchao Zhang,
Fanglan Chen,
Chang-Tien Lu,
Naren Ramakrishnan
Abstract:
The uncertainty measurement of classifiers' predictions is especially important in applications such as medical diagnoses that need to ensure limited human resources can focus on the most uncertain predictions returned by machine learning models. However, few existing uncertainty models attempt to improve overall prediction accuracy where human resources are involved in the text classification tas…
▽ More
The uncertainty measurement of classifiers' predictions is especially important in applications such as medical diagnoses that need to ensure limited human resources can focus on the most uncertain predictions returned by machine learning models. However, few existing uncertainty models attempt to improve overall prediction accuracy where human resources are involved in the text classification task. In this paper, we propose a novel neural-network-based model that applies a new dropout-entropy method for uncertainty measurement. We also design a metric learning method on feature representations, which can boost the performance of dropout-based uncertainty methods with smaller prediction variance in accurate prediction trials. Extensive experiments on real-world data sets demonstrate that our method can achieve a considerable improvement in overall prediction accuracy compared to existing approaches. In particular, our model improved the accuracy from 0.78 to 0.92 when 30\% of the most uncertain predictions were handed over to human experts in "20NewsGroup" data.
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Patent Citation Dynamics Modeling via Multi-Attention Recurrent Networks
Authors:
Taoran Ji,
Zhiqian Chen,
Nathan Self,
Kaiqun Fu,
Chang-Tien Lu,
Naren Ramakrishnan
Abstract:
Modeling and forecasting forward citations to a patent is a central task for the discovery of emerging technologies and for measuring the pulse of inventive progress. Conventional methods for forecasting these forward citations cast the problem as analysis of temporal point processes which rely on the conditional intensity of previously received citations. Recent approaches model the conditional i…
▽ More
Modeling and forecasting forward citations to a patent is a central task for the discovery of emerging technologies and for measuring the pulse of inventive progress. Conventional methods for forecasting these forward citations cast the problem as analysis of temporal point processes which rely on the conditional intensity of previously received citations. Recent approaches model the conditional intensity as a chain of recurrent neural networks to capture memory dependency in hopes of reducing the restrictions of the parametric form of the intensity function. For the problem of patent citations, we observe that forecasting a patent's chain of citations benefits from not only the patent's history itself but also from the historical citations of assignees and inventors associated with that patent. In this paper, we propose a sequence-to-sequence model which employs an attention-of-attention mechanism to capture the dependencies of these multiple time sequences. Furthermore, the proposed model is able to forecast both the timestamp and the category of a patent's next citation. Extensive experiments on a large patent citation dataset collected from USPTO demonstrate that the proposed model outperforms state-of-the-art models at forward citation forecasting.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Neural Abstractive Text Summarization with Sequence-to-Sequence Models
Authors:
Tian Shi,
Yaser Keneshloo,
Naren Ramakrishnan,
Chandan K. Reddy
Abstract:
In the past few years, neural abstractive text summarization with sequence-to-sequence (seq2seq) models have gained a lot of popularity. Many interesting techniques have been proposed to improve seq2seq models, making them capable of handling different challenges, such as saliency, fluency and human readability, and generate high-quality summaries. Generally speaking, most of these techniques diff…
▽ More
In the past few years, neural abstractive text summarization with sequence-to-sequence (seq2seq) models have gained a lot of popularity. Many interesting techniques have been proposed to improve seq2seq models, making them capable of handling different challenges, such as saliency, fluency and human readability, and generate high-quality summaries. Generally speaking, most of these techniques differ in one of these three categories: network structure, parameter inference, and decoding/generation. There are also other concerns, such as efficiency and parallelism for training a model. In this paper, we provide a comprehensive literature survey on different seq2seq models for abstractive text summarization from the viewpoint of network structures, training strategies, and summary generation algorithms. Several models were first proposed for language modeling and generation tasks, such as machine translation, and later applied to abstractive text summarization. Hence, we also provide a brief review of these models. As part of this survey, we also develop an open source library, namely, Neural Abstractive Text Summarizer (NATS) toolkit, for the abstractive text summarization. An extensive set of experiments have been conducted on the widely used CNN/Daily Mail dataset to examine the effectiveness of several different neural network components. Finally, we benchmark two models implemented in NATS on the two recently released datasets, namely, Newsroom and Bytecup.
△ Less
Submitted 18 September, 2020; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Deep Transfer Reinforcement Learning for Text Summarization
Authors:
Yaser Keneshloo,
Naren Ramakrishnan,
Chandan K. Reddy
Abstract:
Deep neural networks are data hungry models and thus face difficulties when attempting to train on small text datasets. Transfer learning is a potential solution but their effectiveness in the text domain is not as explored as in areas such as image analysis. In this paper, we study the problem of transfer learning for text summarization and discuss why existing state-of-the-art models fail to gen…
▽ More
Deep neural networks are data hungry models and thus face difficulties when attempting to train on small text datasets. Transfer learning is a potential solution but their effectiveness in the text domain is not as explored as in areas such as image analysis. In this paper, we study the problem of transfer learning for text summarization and discuss why existing state-of-the-art models fail to generalize well on other (unseen) datasets. We propose a reinforcement learning framework based on a self-critic policy gradient approach which achieves good generalization and state-of-the-art results on a variety of datasets. Through an extensive set of experiments, we also show the ability of our proposed framework to fine-tune the text summarization model using only a few training samples. To the best of our knowledge, this is the first work that studies transfer learning in text summarization and provides a generic solution that works well on unseen data.
△ Less
Submitted 24 January, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Deep Reinforcement Learning For Sequence to Sequence Models
Authors:
Yaser Keneshloo,
Tian Shi,
Naren Ramakrishnan,
Chandan K. Reddy
Abstract:
In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide state-of-the-art performance in a wide variety of tasks such as machine translation, headline generation, text summarization, speech to text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Al…
▽ More
In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide state-of-the-art performance in a wide variety of tasks such as machine translation, headline generation, text summarization, speech to text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder-decoder models produce competitive results, many researchers have proposed additional improvements over these sequence-to-sequence models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq models, leveraging methods from reinforcement learning (RL). In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with sequence-to-sequence models that enable remembering long-term memories. We present some of the most recent frameworks that combine concepts from RL and deep neural networks and explain how these two areas could benefit from each other in solving complex seq2seq tasks. Our work aims to provide insights into some of the problems that inherently arise with current approaches and how we can address them with better RL models. We also provide the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization.
△ Less
Submitted 15 April, 2019; v1 submitted 23 May, 2018;
originally announced May 2018.
-
New methods to generate massive synthetic networks
Authors:
Malay Chakrabarti,
Lenwood Heath,
Naren Ramakrishnan
Abstract:
One of the biggest needs in network science research is access to large realistic datasets. As data analytics methods permeate a range of diverse disciplines---e.g., computational epidemiology, sustainability, social media analytics, biology, and transportation--- network datasets that can exhibit characteristics encountered in each of these disciplines becomes paramount. The key technical issue i…
▽ More
One of the biggest needs in network science research is access to large realistic datasets. As data analytics methods permeate a range of diverse disciplines---e.g., computational epidemiology, sustainability, social media analytics, biology, and transportation--- network datasets that can exhibit characteristics encountered in each of these disciplines becomes paramount. The key technical issue is to be able to generate synthetic topologies with pre-specified, arbitrary, degree distributions. Existing methods are limited in their ability to faithfully reproduce macro-level characteristics of networks while at the same time respecting particular degree distributions. We present a suite of three algorithms that exploit the principle of residual degree attenuation to generate synthetic topologies that adhere to macro-level real-world characteristics. By evaluating these algorithms w.r.t. several real-world datasets we demonstrate their ability to faithfully reproduce network characteristics such as node degree, clustering coefficient, hop length, and k-core structure distributions.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
A Framework for Evaluating Epidemic Forecasts
Authors:
Farzaneh Sadat Tabataba,
Prithwish Chakraborty,
Naren Ramakrishnan,
Srinivasan Venkatramanan,
Jiangzhuo Chen,
Bryan Lewis,
Madhav Marathe
Abstract:
Background: Over the past few decades, numerous forecasting methods have been proposed in the field of epidemic forecasting. Such methods can be classified into different categories such as deterministic vs. probabilistic, comparative methods vs. generative methods, and so on. In some of the more popular comparative methods, researchers compare observed epidemiological data from early stages of an…
▽ More
Background: Over the past few decades, numerous forecasting methods have been proposed in the field of epidemic forecasting. Such methods can be classified into different categories such as deterministic vs. probabilistic, comparative methods vs. generative methods, and so on. In some of the more popular comparative methods, researchers compare observed epidemiological data from early stages of an outbreak with the output of proposed models to forecast the future trend and prevalence of the pandemic. A significant problem in this area is the lack of standard well-defined evaluation measures to select the best algorithm among different ones, as well as for selecting the best possible configuration for a particular algorithm.
Results: In this paper, we present an evaluation framework which allows for combining different features, error measures, and ranking schema to evaluate forecasts. We describe the various epidemic features (Epi-features) included to characterize the output of forecasting methods and provide suitable error measures that could be used to evaluate the accuracy of the methods with respect to these Epi-features. We focus on long-term predictions rather than short-term forecasting and demonstrate the utility of the framework by evaluating six forecasting methods for predicting influenza in the United States. Our results demonstrate that different error measures lead to different rankings even for a single Epi-feature. Further, our experimental analyses show that no single method dominates the rest in predicting all Epi-features, when evaluated across error measures. As an alternative, we provide various consensus ranking schema that summarizes individual rankings, thus accounting for different error measures. We believe that a comprehensive evaluation framework, as presented in this paper, will add value to the computational epidemiology community.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Authors:
Rupinder Paul Khandpur,
Taoran Ji,
Steve Jan,
Gang Wang,
Chang-Tien Lu,
Naren Ramakrishnan
Abstract:
Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner usin…
▽ More
Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
Distributed Representation of Subgraphs
Authors:
Bijaya Adhikari,
Yao Zhang,
Naren Ramakrishnan,
B. Aditya Prakash
Abstract:
Network embeddings have become very popular in learning effective feature representations of networks. Motivated by the recent successes of embeddings in natural language processing, researchers have tried to find network embeddings in order to exploit machine learning algorithms for mining tasks like node classification and edge prediction. However, most of the work focuses on finding distributed…
▽ More
Network embeddings have become very popular in learning effective feature representations of networks. Motivated by the recent successes of embeddings in natural language processing, researchers have tried to find network embeddings in order to exploit machine learning algorithms for mining tasks like node classification and edge prediction. However, most of the work focuses on finding distributed representations of nodes, which are inherently ill-suited to tasks such as community detection which are intuitively dependent on subgraphs.
Here, we propose sub2vec, an unsupervised scalable algorithm to learn feature representations of arbitrary subgraphs. We provide means to characterize similarties between subgraphs and provide theoretical analysis of sub2vec and demonstrate that it preserves the so-called local proximity. We also highlight the usability of sub2vec by leveraging it for network mining tasks, like community detection. We show that sub2vec gets significant gains over state-of-the-art methods and node-embedding methods. In particular, sub2vec offers an approach to generate a richer vocabulary of features of subgraphs to support representation and reasoning.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
Distributed Representations of Signed Networks
Authors:
Mohammad Raihanul Islam,
B. Aditya Prakash,
Naren Ramakrishnan
Abstract:
Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Such network embedding methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings th…
▽ More
Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Such network embedding methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings that respect polarities inherent in edges. We propose SIGNet, a fast scalable embedding method suitable for signed networks. Our proposed objective function aims to carefully model the social structure implicit in signed networks by reinforcing the principles of social balance theory. Our method builds upon the traditional word2vec family of embedding approaches and adds a new targeted node sampling strategy to maintain structural balance in higher-order neighborhoods. We demonstrate the superiority of SIGNet over state-of-the-art methods proposed for both signed and unsigned networks on several real world datasets from different domains. In particular, SIGNet offers an approach to generate a richer vocabulary of features of signed networks to support representation and reasoning.
△ Less
Submitted 7 April, 2019; v1 submitted 22 February, 2017;
originally announced February 2017.
-
Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources
Authors:
Saurav Ghosh,
Prithwish Chakraborty,
Bryan L. Lewis,
Maimuna S. Majumder,
Emily Cohn,
John S. Brownstein,
Madhav V. Marathe,
Naren Ramakrishnan
Abstract:
Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of…
▽ More
Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of a specific disease early enough to stop or control the outbreak. However, construction of such line lists requires considerable human supervision and therefore, difficult to generate in real-time. In this paper, we motivate Guided Deep List, the first tool for building automated line lists (in near real-time) from open source reports of emerging disease outbreaks. Specifically, we focus on deriving epidemiological characteristics of an emerging disease and the affected population from reports of illness. Guided Deep List uses distributed vector representations (ala word2vec) to discover a set of indicators for each line list feature. This discovery of indicators is followed by the use of dependency parsing based techniques for final extraction in tabular form. We evaluate the performance of Guided Deep List against a human annotated line list provided by HealthMap corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided Deep List extracts line list features with increased accuracy compared to a baseline method. We further show how these automatically extracted line list features can be used for making epidemiological inferences, such as inferring demographics and symptoms-to-hospitalization period of affected individuals.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.
-
Deep Symbolic Representation Learning for Heterogeneous Time-series Classification
Authors:
Shengdong Zhang,
Soheil Bahrampour,
Naveen Ramakrishnan,
Mohak Shah
Abstract:
In this paper, we consider the problem of event classification with multi-variate time series data consisting of heterogeneous (continuous and categorical) variables. The complex temporal dependencies between the variables combined with sparsity of the data makes the event classification problem particularly challenging. Most state-of-art approaches address this either by designing hand-engineered…
▽ More
In this paper, we consider the problem of event classification with multi-variate time series data consisting of heterogeneous (continuous and categorical) variables. The complex temporal dependencies between the variables combined with sparsity of the data makes the event classification problem particularly challenging. Most state-of-art approaches address this either by designing hand-engineered features or breaking up the problem over homogeneous variates. In this work, we propose and compare three representation learning algorithms over symbolized sequences which enables classification of heterogeneous time-series data using a deep architecture. The proposed representations are trained jointly along with the rest of the network architecture in an end-to-end fashion that makes the learned features discriminative for the given task. Experiments on three real-world datasets demonstrate the effectiveness of the proposed approaches.
△ Less
Submitted 5 December, 2016;
originally announced December 2016.
-
Can Self-Censorship in News Media be Detected Algorithmically? A Case Study in Latin America
Authors:
Rongrong Tao,
Baojian Zhou,
Feng Chen,
Naifeng Liu,
David Mares,
Patrick Butler,
Naren Ramakrishnan
Abstract:
Censorship in social media has been well studied and provides insight into how governments stifle freedom of expression online. Comparatively less (or no) attention has been paid to detecting (self) censorship in traditional media (e.g., news) using social media as a bellweather. We present a novel unsupervised approach that views social media as a sensor to detect censorship in news media wherein…
▽ More
Censorship in social media has been well studied and provides insight into how governments stifle freedom of expression online. Comparatively less (or no) attention has been paid to detecting (self) censorship in traditional media (e.g., news) using social media as a bellweather. We present a novel unsupervised approach that views social media as a sensor to detect censorship in news media wherein statistically significant differences between information published in the news media and the correlated information published in social media are automatically identified as candidate censored events. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords, and a new near-linear-time algorithm (called GraphDPD) to identify the highest scoring clusters as indicators of censorship. We outline extensive experiments on semi-synthetic data as well as real datasets (with Twitter and local news media) from Mexico and Venezuela, highlighting the capability to accurately detect real-world self censorship events.
△ Less
Submitted 17 March, 2017; v1 submitted 21 November, 2016;
originally announced November 2016.
-
Universum Learning for Multiclass SVM
Authors:
Sauptik Dhar,
Naveen Ramakrishnan,
Vladimir Cherkassky,
Mohak Shah
Abstract:
We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.
We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.
△ Less
Submitted 28 September, 2016;
originally announced September 2016.
-
Interactive and Iterative Discovery of Entity Network Subgraphs
Authors:
Hao Wu,
Maoyuan Sun,
Jilles Vreeken,
Nikolaj Tatti,
Chris North,
Naren Ramakrishnan
Abstract:
Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural…
▽ More
Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions.
△ Less
Submitted 12 August, 2016;
originally announced August 2016.
-
Temporal Topic Modeling to Assess Associations between News Trends and Infectious Disease Outbreaks
Authors:
Saurav Ghosh,
Prithwish Chakraborty,
Elaine O. Nsoesie,
Emily Cohn,
Sumiko R. Mekaru,
John S. Brownstein,
Naren Ramakrishnan
Abstract:
In retrospective assessments, internet news reports have been shown to capture early reports of unknown infectious disease transmission prior to official laboratory confirmation. In general, media interest and reporting peaks and wanes during the course of an outbreak. In this study, we quantify the extent to which media interest during infectious disease outbreaks is indicative of trends of repor…
▽ More
In retrospective assessments, internet news reports have been shown to capture early reports of unknown infectious disease transmission prior to official laboratory confirmation. In general, media interest and reporting peaks and wanes during the course of an outbreak. In this study, we quantify the extent to which media interest during infectious disease outbreaks is indicative of trends of reported incidence. We introduce an approach that uses supervised temporal topic models to transform large corpora of news articles into temporal topic trends. The key advantages of this approach include, applicability to a wide range of diseases, and ability to capture disease dynamics - including seasonality, abrupt peaks and troughs. We evaluated the method using data from multiple infectious disease outbreaks reported in the United States of America (U.S.), China and India. We noted that temporal topic trends extracted from disease-related news reports successfully captured the dynamics of multiple outbreaks such as whooping cough in U.S. (2012), dengue outbreaks in India (2013) and China (2014). Our observations also suggest that efficient modeling of temporal topic trends using time-series regression techniques can estimate disease case counts with increased precision before official reports by health organizations.
△ Less
Submitted 1 June, 2016;
originally announced June 2016.
-
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecasting System
Authors:
Sathappan Muthiah,
Patrick Butler,
Rupinder Paul Khandpur,
Parang Saraf,
Nathan Self,
Alla Rozovskaya,
Liang Zhao,
Jose Cadena,
Chang-Tien Lu,
Anil Vullikanti,
Achla Marathe,
Kristen Summers,
Graham Katz,
Andy Doyle,
Jaime Arredondo,
Dipak K. Gupta,
David Mares,
Naren Ramakrishnan
Abstract:
EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for near…
▽ More
EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for nearly 4 years, with specific attention to the discoveries it has enabled, correct as well as missed forecasts, and lessons learnt from participating in a forecasting tournament including our perspectives on the limits of forecasting and ethical considerations.
△ Less
Submitted 31 March, 2016;
originally announced April 2016.
-
Hierarchical Quickest Change Detection via Surrogates
Authors:
Prithwish Chakraborty,
Sathappan Muthiah,
Ravi Tandon,
Naren Ramakrishnan
Abstract:
Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change dete…
▽ More
Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change detection (HQCD), a framework that formalizes the process of incorporating additional correlated sources for early changepoint detection. The core ideas behind HQCD are rooted in the theory of quickest detection and HQCD can be regarded as its novel generalization to a hierarchical setting. The sources are classified into targets and surrogates, and HQCD leverages this structure to systematically assimilate observed data to update changepoint statistics across layers. The decision on actual changepoints are provided by minimizing the delay while still maintaining reliability bounds. In addition, HQCD also uncovers interesting relations between changes at targets from changes across surrogates. We validate HQCD for reliability and performance against several state-of-the-art methods for both synthetic dataset (known changepoints) and several real-life examples (unknown changepoints). Our experiments indicate that we gain significant robustness without loss of detection delay through HQCD. Our real-life experiments also showcase the usefulness of the hierarchical setting by connecting the surrogate sources (such as Twitter chatter) to target sources (such as Employment related protests that ultimately lead to major uprisings).
△ Less
Submitted 31 March, 2016;
originally announced March 2016.