Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics
We are making public a dataset of 21 disjoint graphs representing communications among machines r... more We are making public a dataset of 21 disjoint graphs representing communications among machines running different distributed applications in various enterprises. We provide a ground truth grouping for one graph. The grouping is useful for evaluating tasks such as clustering hosts based on network communications. We describe the graphs and present a brief exploratory analysis to illustrate some of the properties, possible uses of the data, and some of the challenges. CCS CONCEPTS • Computing methodologies → Unsupervised learning; • Networks → Network monitoring; Network management.
We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaning... more We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaningful character strings (n-grams), and their part-related and prediction edges. The learning is self-supervised in that the concepts discovered are used as predictors as well as targets of prediction. We devise an objective for segmenting with the learned concepts, derived from comparing to a baseline (reference) prediction system, that promotes making and using larger concepts, which in turn allows for predicting larger spans of text, and we describe a simple technique to promote exploration, i.e. trying out newly generated concepts in the segmentation process. We motivate and explain a layering of the concepts, to help separate the (conditional) distributions learnt among concepts. The layering of the concepts roughly corresponds to a part-whole concept hierarchy in this work. With rudimentary segmentation and learning algorithms, the system is promising in that it acquires many concepts...
An important task of community discovery in networks is assessing significance of the results and... more An important task of community discovery in networks is assessing significance of the results and robust ranking of the generated candidate groups. Often in practice, numerous candidate communities are discovered, and focusing the analyst’s time on the most salient and promising findings is crucial. We develop simple efficient group scoring functions derived from tail probabilities using binomial models. Experiments on synthetic and numerous real-world data provides evidence that binomial scoring leads to a more robust ranking than other inexpensive scoring functions, such as conductance. Furthermore, we obtain confidence values (p-values) that can be used for filtering and labeling the discovered groups. Our analyses shed light on various properties of the approach. The binomial tail is simple and versatile, and we describe two other applications for community analysis: degree of community membership (which in turn yields group-scoring functions), and the discovery of significant e...
Classical learning assumes the learner is given a labeled data sample, from which it learns a mod... more Classical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optimal model. To better understand this task, this paper presents and analyses the simplified "(budgeted) active model selection" version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation. Here the learner can use a fixed budget of "model probes" (where each probe evaluates the specified model on a random indistinguishable instance) to identify which of a given set of possible models has the highest expected accuracy. Our goal is a policy that sequentially determines which model to probe next, based on the information observed so far. We present a formal description of this task, and show that it is NPhard...
Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These m... more Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These models arise in diverse applications and have been developed extensively in fields such as operations research, control engineering, and the decision sciences in general. Recent research, especially in artificial intelligence, has highlighted the significance of studying the computational properties of MDP problems. We address
In the context of binary classification, we define disagreement as a mea-sure of how often two in... more In the context of binary classification, we define disagreement as a mea-sure of how often two independently-trained models differ in their clas-sification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plen-tiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (gen-eralization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where vari-ance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in a...
Abstract. This paper explores the formulation of image interpretation as a Markov Decision Proces... more Abstract. This paper explores the formulation of image interpretation as a Markov Decision Process (MDP) problem, highlighting the important assumptions in the MDP formulation. Furthermore state abstraction, value function and action approximations as well as lookahead search are presented as necessary solution methodologies. We view the task of image interpretation as a dynamic control problem where the optimal vision operator is selected responsively based on the problem solving state at hand. The control policy, therefore, maps problem-solving states to operators in an attempt to minimize the total problem-solving time while reliably interpreting the image. Real world domains, like that of image interpretation, usually have incredibly large state spaces which require methods of abstraction in order to be manageable by today’s information processing systems. In addition an optimal value function (V ∗ ) used to evaluate state quality is also generally unavailable requiring approxim...
Many learning tasks, such as large-scale text categorization and word prediction, can benefit fro... more Many learning tasks, such as large-scale text categorization and word prediction, can benefit from efficient training and classification when the number of classes, in addition to instances and features, is large, that is, in the thousands and beyond. We investigate the learning of sparse class indices to address this challenge. An index is a mapping from features to classes. We compare the index-learning methods against other techniques, including one-versus-rest and top-down classification using perceptrons and support vector machines. We find that index learning is highly advantageous for space and time efficiency, at both training and classification times. Moreover, this approach yields similar and at times better accuracies. On problems with hundreds of thousands of instances and thousands of classes, the index is learned in minutes, while other methods can take hours or days. As we explain, the design of the learning update enables conveniently constraining each feature to con...
We describe techniques for the construction of term co-occurrence graphs and explore an applicati... more We describe techniques for the construction of term co-occurrence graphs and explore an application of such graphs to the discovery of tens of thousands of fine-grained, that is specific rather than broad, topics. A topic corresponds to a small dense subgraph in our work. We discover topics by randomized local searches (constrained random walks) initiated at each term (node) in the graph. The mined topics are highly interpretable, and reveal the different meanings of a term in the corpus. We explore document tagging via the induced topics, and demonstrate the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in text classification accuracy over the standard bag-of-words tfidf representation, using SVM classification, even at high training proportions when it is difficult to improve over the tfidf representation. We investigate the effect of various options and parameters, including window ...
A fundamental activity of intelligence is to efficiently detect to which of myriad categories a g... more A fundamental activity of intelligence is to efficiently detect to which of myriad categories a given entity belongs. The problem occurs in many incarnations and applications, including: (1) categorizing web pages into the Yahoo! topic hierarchy
Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate... more Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user’s query. We instead define a user-independent and queryindependent model of topic-familiarity required to read a document, so it can be matched to a given user in response to a query. An introductory web page is defined as A web page that doesn’t presuppose any background knowledge of the topic it is on, and to an extent introduces or defines the key terms in the topic. while an advanced web page is defined as A web page that assumes sufficient background knowledge of the topic it is on, and familiarity with the key technical / important terms in the topic, and potentially builds on them. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position ...
We introduce and motivate the task of learning under a budget. We focus on a basic problem in thi... more We introduce and motivate the task of learning under a budget. We focus on a basic problem in this space: selecting the optimal bandit after a period of experimentation in a multi-armed bandit setting, where each experiment is costly, our total costs cannot exceed a fixed pre-specified budget, and there is no reward collection during the learning period. We address the computational complexity of the problem, propose a number of algorithms, and report on the performance of the algorithms, including their (worst-case) approximation properties, as well as their empirical performance on various different problem instances. Our
We extend the traditional active learning framework to include feedback on features in addition t... more We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text categorization. Our experiments on a variety of categorization tasks indicate that there is significant potential in improving classifier performance by feature re-weighting, beyond that achieved via membership queries alone (traditional active learning) if we have access to an oracle that can point to the important (most predictive) features. Our experiments on human subjects indicate that human feedback on feature relevance can identify a sufficient proportion of the most relevant features (over 50 % in our experiments). We find that on average, labeling a feature takes much less time than labeling a document. We devise an algorithm that interleaves labeling features and documents which significantly accelerates standard active learning in our s...
Value iteration is a commonly used and em-pirically competitive method in solving many Markov dec... more Value iteration is a commonly used and em-pirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We estab-lish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest aver-age reward cycle on a DMDP problem in iterations, or total time, where denotes the number of states, and the number of edges. We give two extensions of value iteration that solve the DMDP in time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs. 1
We describe the functionality of a large scale system that, given a stream of characters from a r... more We describe the functionality of a large scale system that, given a stream of characters from a rich source, such as the pages on the web, engages in repeated pre-diction and learning. Its activity includes adding, re-moving, and updating connection weights and category nodes. Over time, the system learns to predict better and acquires new useful categories. In this work, cate-gories are strings of characters. The system scales well and the learning is massive: in the course of 100s of millions of learning episodes, a few hours on a single machine, hundreds of thousands of categories and mil-lions of prediction connections among them are learned.
A number of tasks, such as large-scale text categorization and word prediction, can benefit from ... more A number of tasks, such as large-scale text categorization and word prediction, can benefit from efficient learning and classification when the number of classes (categories), in addition to instances and features, is large, that is, in the thousands and beyond. We investigate learning of sparse category indices to address this chal-lenge. An index is a weighted bipartite graph mapping features to categories. On presentation of an instance, the index retrieves and scores a small set of candidate categories. The candidates can then be ranked and the ranking or the scores can be used for cat-egory assignment. We present novel online index learning algo-rithms. When compared to other approaches, including one-versus-rest and top-down learning and classification using support vector machines, we find that indexing is highly advantageous in terms of space and time efficiency, at both training and classification times, while yielding similar and often better accuracies. On problems with h...
ABSTRACT: Efficient learning and categorization in the face of myriad categories and instances is... more ABSTRACT: Efficient learning and categorization in the face of myriad categories and instances is an important challenge. We investigate algorithms that efficiently learn sparse but accurate category indices. An index is a weighted bipartite graph mapping features to categories. Given an instance, the index retrieves, scores, and ranks a set of candidate categories. The ranking or the scores can then be used for category assignment. We compare index learning against other classification ap-proaches, including one-versus-rest and top-down classification using support vector machines. We find that the indexing approach is highly advantageous in terms of space and time efficiency, at both training and classification times, while retaining competitive accuracy. On problems with hundreds of thousands of instances and thousands of categories, the index is learned in minutes, while other methods can take orders of magnitude longer. ∗Part of this research was performed while the author was ...
Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics
We are making public a dataset of 21 disjoint graphs representing communications among machines r... more We are making public a dataset of 21 disjoint graphs representing communications among machines running different distributed applications in various enterprises. We provide a ground truth grouping for one graph. The grouping is useful for evaluating tasks such as clustering hosts based on network communications. We describe the graphs and present a brief exploratory analysis to illustrate some of the properties, possible uses of the data, and some of the challenges. CCS CONCEPTS • Computing methodologies → Unsupervised learning; • Networks → Network monitoring; Network management.
We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaning... more We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaningful character strings (n-grams), and their part-related and prediction edges. The learning is self-supervised in that the concepts discovered are used as predictors as well as targets of prediction. We devise an objective for segmenting with the learned concepts, derived from comparing to a baseline (reference) prediction system, that promotes making and using larger concepts, which in turn allows for predicting larger spans of text, and we describe a simple technique to promote exploration, i.e. trying out newly generated concepts in the segmentation process. We motivate and explain a layering of the concepts, to help separate the (conditional) distributions learnt among concepts. The layering of the concepts roughly corresponds to a part-whole concept hierarchy in this work. With rudimentary segmentation and learning algorithms, the system is promising in that it acquires many concepts...
An important task of community discovery in networks is assessing significance of the results and... more An important task of community discovery in networks is assessing significance of the results and robust ranking of the generated candidate groups. Often in practice, numerous candidate communities are discovered, and focusing the analyst’s time on the most salient and promising findings is crucial. We develop simple efficient group scoring functions derived from tail probabilities using binomial models. Experiments on synthetic and numerous real-world data provides evidence that binomial scoring leads to a more robust ranking than other inexpensive scoring functions, such as conductance. Furthermore, we obtain confidence values (p-values) that can be used for filtering and labeling the discovered groups. Our analyses shed light on various properties of the approach. The binomial tail is simple and versatile, and we describe two other applications for community analysis: degree of community membership (which in turn yields group-scoring functions), and the discovery of significant e...
Classical learning assumes the learner is given a labeled data sample, from which it learns a mod... more Classical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optimal model. To better understand this task, this paper presents and analyses the simplified "(budgeted) active model selection" version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation. Here the learner can use a fixed budget of "model probes" (where each probe evaluates the specified model on a random indistinguishable instance) to identify which of a given set of possible models has the highest expected accuracy. Our goal is a policy that sequentially determines which model to probe next, based on the information observed so far. We present a formal description of this task, and show that it is NPhard...
Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These m... more Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These models arise in diverse applications and have been developed extensively in fields such as operations research, control engineering, and the decision sciences in general. Recent research, especially in artificial intelligence, has highlighted the significance of studying the computational properties of MDP problems. We address
In the context of binary classification, we define disagreement as a mea-sure of how often two in... more In the context of binary classification, we define disagreement as a mea-sure of how often two independently-trained models differ in their clas-sification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plen-tiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (gen-eralization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where vari-ance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in a...
Abstract. This paper explores the formulation of image interpretation as a Markov Decision Proces... more Abstract. This paper explores the formulation of image interpretation as a Markov Decision Process (MDP) problem, highlighting the important assumptions in the MDP formulation. Furthermore state abstraction, value function and action approximations as well as lookahead search are presented as necessary solution methodologies. We view the task of image interpretation as a dynamic control problem where the optimal vision operator is selected responsively based on the problem solving state at hand. The control policy, therefore, maps problem-solving states to operators in an attempt to minimize the total problem-solving time while reliably interpreting the image. Real world domains, like that of image interpretation, usually have incredibly large state spaces which require methods of abstraction in order to be manageable by today’s information processing systems. In addition an optimal value function (V ∗ ) used to evaluate state quality is also generally unavailable requiring approxim...
Many learning tasks, such as large-scale text categorization and word prediction, can benefit fro... more Many learning tasks, such as large-scale text categorization and word prediction, can benefit from efficient training and classification when the number of classes, in addition to instances and features, is large, that is, in the thousands and beyond. We investigate the learning of sparse class indices to address this challenge. An index is a mapping from features to classes. We compare the index-learning methods against other techniques, including one-versus-rest and top-down classification using perceptrons and support vector machines. We find that index learning is highly advantageous for space and time efficiency, at both training and classification times. Moreover, this approach yields similar and at times better accuracies. On problems with hundreds of thousands of instances and thousands of classes, the index is learned in minutes, while other methods can take hours or days. As we explain, the design of the learning update enables conveniently constraining each feature to con...
We describe techniques for the construction of term co-occurrence graphs and explore an applicati... more We describe techniques for the construction of term co-occurrence graphs and explore an application of such graphs to the discovery of tens of thousands of fine-grained, that is specific rather than broad, topics. A topic corresponds to a small dense subgraph in our work. We discover topics by randomized local searches (constrained random walks) initiated at each term (node) in the graph. The mined topics are highly interpretable, and reveal the different meanings of a term in the corpus. We explore document tagging via the induced topics, and demonstrate the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in text classification accuracy over the standard bag-of-words tfidf representation, using SVM classification, even at high training proportions when it is difficult to improve over the tfidf representation. We investigate the effect of various options and parameters, including window ...
A fundamental activity of intelligence is to efficiently detect to which of myriad categories a g... more A fundamental activity of intelligence is to efficiently detect to which of myriad categories a given entity belongs. The problem occurs in many incarnations and applications, including: (1) categorizing web pages into the Yahoo! topic hierarchy
Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate... more Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user’s query. We instead define a user-independent and queryindependent model of topic-familiarity required to read a document, so it can be matched to a given user in response to a query. An introductory web page is defined as A web page that doesn’t presuppose any background knowledge of the topic it is on, and to an extent introduces or defines the key terms in the topic. while an advanced web page is defined as A web page that assumes sufficient background knowledge of the topic it is on, and familiarity with the key technical / important terms in the topic, and potentially builds on them. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position ...
We introduce and motivate the task of learning under a budget. We focus on a basic problem in thi... more We introduce and motivate the task of learning under a budget. We focus on a basic problem in this space: selecting the optimal bandit after a period of experimentation in a multi-armed bandit setting, where each experiment is costly, our total costs cannot exceed a fixed pre-specified budget, and there is no reward collection during the learning period. We address the computational complexity of the problem, propose a number of algorithms, and report on the performance of the algorithms, including their (worst-case) approximation properties, as well as their empirical performance on various different problem instances. Our
We extend the traditional active learning framework to include feedback on features in addition t... more We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text categorization. Our experiments on a variety of categorization tasks indicate that there is significant potential in improving classifier performance by feature re-weighting, beyond that achieved via membership queries alone (traditional active learning) if we have access to an oracle that can point to the important (most predictive) features. Our experiments on human subjects indicate that human feedback on feature relevance can identify a sufficient proportion of the most relevant features (over 50 % in our experiments). We find that on average, labeling a feature takes much less time than labeling a document. We devise an algorithm that interleaves labeling features and documents which significantly accelerates standard active learning in our s...
Value iteration is a commonly used and em-pirically competitive method in solving many Markov dec... more Value iteration is a commonly used and em-pirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We estab-lish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest aver-age reward cycle on a DMDP problem in iterations, or total time, where denotes the number of states, and the number of edges. We give two extensions of value iteration that solve the DMDP in time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs. 1
We describe the functionality of a large scale system that, given a stream of characters from a r... more We describe the functionality of a large scale system that, given a stream of characters from a rich source, such as the pages on the web, engages in repeated pre-diction and learning. Its activity includes adding, re-moving, and updating connection weights and category nodes. Over time, the system learns to predict better and acquires new useful categories. In this work, cate-gories are strings of characters. The system scales well and the learning is massive: in the course of 100s of millions of learning episodes, a few hours on a single machine, hundreds of thousands of categories and mil-lions of prediction connections among them are learned.
A number of tasks, such as large-scale text categorization and word prediction, can benefit from ... more A number of tasks, such as large-scale text categorization and word prediction, can benefit from efficient learning and classification when the number of classes (categories), in addition to instances and features, is large, that is, in the thousands and beyond. We investigate learning of sparse category indices to address this chal-lenge. An index is a weighted bipartite graph mapping features to categories. On presentation of an instance, the index retrieves and scores a small set of candidate categories. The candidates can then be ranked and the ranking or the scores can be used for cat-egory assignment. We present novel online index learning algo-rithms. When compared to other approaches, including one-versus-rest and top-down learning and classification using support vector machines, we find that indexing is highly advantageous in terms of space and time efficiency, at both training and classification times, while yielding similar and often better accuracies. On problems with h...
ABSTRACT: Efficient learning and categorization in the face of myriad categories and instances is... more ABSTRACT: Efficient learning and categorization in the face of myriad categories and instances is an important challenge. We investigate algorithms that efficiently learn sparse but accurate category indices. An index is a weighted bipartite graph mapping features to categories. Given an instance, the index retrieves, scores, and ranks a set of candidate categories. The ranking or the scores can then be used for category assignment. We compare index learning against other classification ap-proaches, including one-versus-rest and top-down classification using support vector machines. We find that the indexing approach is highly advantageous in terms of space and time efficiency, at both training and classification times, while retaining competitive accuracy. On problems with hundreds of thousands of instances and thousands of categories, the index is learned in minutes, while other methods can take orders of magnitude longer. ∗Part of this research was performed while the author was ...
Uploads
Papers by Omid Madani