We present a real-world online scheduling application. In this application, the problem input is ... more We present a real-world online scheduling application. In this application, the problem input is fed incrementally to the scheduler, and the scheduler has only a portion of the entire job available before it has to start making scheduling decisions. In fact, job submission, scheduling and execution may all happen in parallel and at di erent speeds.
In many applications, there are a variety of ways of referring to the same underlying real-world ... more In many applications, there are a variety of ways of referring to the same underlying real-world entity. For example, J. Doe, Jonathan Doe, and Jon Doe may all refer to the same person. In addition, entity references may be linked or grouped together. For example, Jonathan Doe may be married to Jeanette Doe and may have dependents James Doe, Jason Doe, and Jacqueline Doe, and Jon Doe may be married to Jean Doe and J. Doe may have dependents Jim Doe, Jason Doe, and Jackie Doe.
In this paper we present a new approach to feature selection for sequence data. We identify gener... more In this paper we present a new approach to feature selection for sequence data. We identify general feature categories and give construction algorithms for each of them. We show how they can be integrated in a system that tightly couples feature construction and feature selection. This integrated process, which we refer to as feature generation, allows us to systematically search a large space of potential features.
Abstract This work investigates design choices in modeling a discourse scheme for improving opini... more Abstract This work investigates design choices in modeling a discourse scheme for improving opinion polarity classification. For this, two diverse global inference paradigms are used: a supervised collective classification framework and an unsupervised optimization framework. Both approaches perform substantially better than baseline approaches, establishing the efficacy of the methods and the underlying discourse scheme.
Abstract There has been a recent, growing interest in classification and link prediction in struc... more Abstract There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as CRFs (Lafferty et al., 2001) and RMNs (Taskar et al., 2002) support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data.
Over the past few years, a number of approximate inference algorithms for networked data have bee... more Over the past few years, a number of approximate inference algorithms for networked data have been put forth. We empirically compare the performance of three of the popular algorithms: loopy belief propagation, mean field relaxation labeling and iterative classification. We rate each algorithm in terms of its robustness to noise, both in attribute values and correlations across links. We also compare them across varying types of correlations across links.
R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding... more R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding to the same real-world entities. Given the abundance of publicly available databases where entities are not resolved, we motivate the problem of quickly processing queries that require resolved entities from such'unclean'databases. We propose a two-stage collective resolution strategy for processing queries.
Abstract Recently, much attention has been given to extracting tables from Web data. In this prob... more Abstract Recently, much attention has been given to extracting tables from Web data. In this problem, the column definitions and tuples (such as what" company" is headquartered in what" city,") are extracted from Web text, structured Web data such as lists, or results of querying the deep Web, creating the table of interest. In this paper, we examine the problem of extracting and discovering multiple tables in a given domain, generating a truly multi-relational database as output.
Abstract Learning structured representations has emerged as an important problem in many domains,... more Abstract Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation [7] to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies.
Abstract Statistical relational learning is a newly emerging area of machine learning that combin... more Abstract Statistical relational learning is a newly emerging area of machine learning that combines statistical modeling with relational representations. Here we argue that it provides a unified framework for the discovery of structural information that can be exploited by a data management system.
Interest in XML databases has been growing over the last few years. In this paper, we study the p... more Interest in XML databases has been growing over the last few years. In this paper, we study the problem of incorporating probabilistic information into XML databases. We propose the Probabilistic Interval XML (PIXml for short) data model in this paper. Using this data model, users can express probabilistic information within XML markups. In addition, we provide two alternative formal model-theoretic semantics for PIXml data. The first semantics is a ���global��� semantics which is relatively intuitive, but is not directly amenable to computation.
Abstract Many social Web sites allow users to annotate the content with descriptive metadata, suc... more Abstract Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content.
Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for r... more Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for representing statistical patterns in propositional domains. Recent work (Cooper and Herskovits 1992; Heckerman 1998) develops effective techniques for learning these models directly from data. The learning algorithms have been quite successful, however the techniques apply only to attribute-value or flat, representations of the data. Any richer relational structure in the domain cannot be modeled.
Introduction: New technology is creating large stores of digital video. Real time processing of t... more Introduction: New technology is creating large stores of digital video. Real time processing of these huge data sets is extremely challenging; retrospective or forensic analysis creates even greater problems when one must rapidly examine hours or days of video from thousands of cameras. We develop a new method for controlling processing, so that available resources are directed at the most relevant portions of the video.
Page 1. Statistical Relational Learning and Link Mining Lise Getoor University of Maryland, Colle... more Page 1. Statistical Relational Learning and Link Mining Lise Getoor University of Maryland, College Park Students: Indrajit Bhattacharya, Mustafa Bilgic, Rezarta Islamaj, Louis Licamele and Prithviraj Sen Page 2.
Abstract There is a growing wealth of data describing networks of various types, including social... more Abstract There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices.
Abstract: Our work has focused on developing new cost sensitive feature acquisition and classific... more Abstract: Our work has focused on developing new cost sensitive feature acquisition and classification algorithms, mapping these algorithms onto camera networks, and creating a test bed of video data and implemented vision algorithms that we can use to implement these. First, we will describe a new algorithm that we have developed for feature acquisition in Hidden Markov Models (HMMs).
Abstract While extensive work has been done on evaluating queries over tuple-independent probabil... more Abstract While extensive work has been done on evaluating queries over tuple-independent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, eg, information extraction, data integration, computer vision, etc.
Abstract In this paper, we introduce a method for analyzing the temporal dynamics of affiliation ... more Abstract In this paper, we introduce a method for analyzing the temporal dynamics of affiliation networks. We define affiliation groups which describe temporally related subsets of actors and describe an approach for exploring changing memberships in these affiliation groups over time.
In this project, I design and implement a system for querying large uncertain graphs with identit... more In this project, I design and implement a system for querying large uncertain graphs with identity uncertainty. We use novel indexing techniques and query optimization methods to enable querying probabilistic graphs of millions of edges on a single machine and reporting the results in seconds.
We present a real-world online scheduling application. In this application, the problem input is ... more We present a real-world online scheduling application. In this application, the problem input is fed incrementally to the scheduler, and the scheduler has only a portion of the entire job available before it has to start making scheduling decisions. In fact, job submission, scheduling and execution may all happen in parallel and at di erent speeds.
In many applications, there are a variety of ways of referring to the same underlying real-world ... more In many applications, there are a variety of ways of referring to the same underlying real-world entity. For example, J. Doe, Jonathan Doe, and Jon Doe may all refer to the same person. In addition, entity references may be linked or grouped together. For example, Jonathan Doe may be married to Jeanette Doe and may have dependents James Doe, Jason Doe, and Jacqueline Doe, and Jon Doe may be married to Jean Doe and J. Doe may have dependents Jim Doe, Jason Doe, and Jackie Doe.
In this paper we present a new approach to feature selection for sequence data. We identify gener... more In this paper we present a new approach to feature selection for sequence data. We identify general feature categories and give construction algorithms for each of them. We show how they can be integrated in a system that tightly couples feature construction and feature selection. This integrated process, which we refer to as feature generation, allows us to systematically search a large space of potential features.
Abstract This work investigates design choices in modeling a discourse scheme for improving opini... more Abstract This work investigates design choices in modeling a discourse scheme for improving opinion polarity classification. For this, two diverse global inference paradigms are used: a supervised collective classification framework and an unsupervised optimization framework. Both approaches perform substantially better than baseline approaches, establishing the efficacy of the methods and the underlying discourse scheme.
Abstract There has been a recent, growing interest in classification and link prediction in struc... more Abstract There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as CRFs (Lafferty et al., 2001) and RMNs (Taskar et al., 2002) support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data.
Over the past few years, a number of approximate inference algorithms for networked data have bee... more Over the past few years, a number of approximate inference algorithms for networked data have been put forth. We empirically compare the performance of three of the popular algorithms: loopy belief propagation, mean field relaxation labeling and iterative classification. We rate each algorithm in terms of its robustness to noise, both in attribute values and correlations across links. We also compare them across varying types of correlations across links.
R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding... more R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding to the same real-world entities. Given the abundance of publicly available databases where entities are not resolved, we motivate the problem of quickly processing queries that require resolved entities from such'unclean'databases. We propose a two-stage collective resolution strategy for processing queries.
Abstract Recently, much attention has been given to extracting tables from Web data. In this prob... more Abstract Recently, much attention has been given to extracting tables from Web data. In this problem, the column definitions and tuples (such as what" company" is headquartered in what" city,") are extracted from Web text, structured Web data such as lists, or results of querying the deep Web, creating the table of interest. In this paper, we examine the problem of extracting and discovering multiple tables in a given domain, generating a truly multi-relational database as output.
Abstract Learning structured representations has emerged as an important problem in many domains,... more Abstract Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation [7] to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies.
Abstract Statistical relational learning is a newly emerging area of machine learning that combin... more Abstract Statistical relational learning is a newly emerging area of machine learning that combines statistical modeling with relational representations. Here we argue that it provides a unified framework for the discovery of structural information that can be exploited by a data management system.
Interest in XML databases has been growing over the last few years. In this paper, we study the p... more Interest in XML databases has been growing over the last few years. In this paper, we study the problem of incorporating probabilistic information into XML databases. We propose the Probabilistic Interval XML (PIXml for short) data model in this paper. Using this data model, users can express probabilistic information within XML markups. In addition, we provide two alternative formal model-theoretic semantics for PIXml data. The first semantics is a ���global��� semantics which is relatively intuitive, but is not directly amenable to computation.
Abstract Many social Web sites allow users to annotate the content with descriptive metadata, suc... more Abstract Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content.
Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for r... more Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for representing statistical patterns in propositional domains. Recent work (Cooper and Herskovits 1992; Heckerman 1998) develops effective techniques for learning these models directly from data. The learning algorithms have been quite successful, however the techniques apply only to attribute-value or flat, representations of the data. Any richer relational structure in the domain cannot be modeled.
Introduction: New technology is creating large stores of digital video. Real time processing of t... more Introduction: New technology is creating large stores of digital video. Real time processing of these huge data sets is extremely challenging; retrospective or forensic analysis creates even greater problems when one must rapidly examine hours or days of video from thousands of cameras. We develop a new method for controlling processing, so that available resources are directed at the most relevant portions of the video.
Page 1. Statistical Relational Learning and Link Mining Lise Getoor University of Maryland, Colle... more Page 1. Statistical Relational Learning and Link Mining Lise Getoor University of Maryland, College Park Students: Indrajit Bhattacharya, Mustafa Bilgic, Rezarta Islamaj, Louis Licamele and Prithviraj Sen Page 2.
Abstract There is a growing wealth of data describing networks of various types, including social... more Abstract There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices.
Abstract: Our work has focused on developing new cost sensitive feature acquisition and classific... more Abstract: Our work has focused on developing new cost sensitive feature acquisition and classification algorithms, mapping these algorithms onto camera networks, and creating a test bed of video data and implemented vision algorithms that we can use to implement these. First, we will describe a new algorithm that we have developed for feature acquisition in Hidden Markov Models (HMMs).
Abstract While extensive work has been done on evaluating queries over tuple-independent probabil... more Abstract While extensive work has been done on evaluating queries over tuple-independent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, eg, information extraction, data integration, computer vision, etc.
Abstract In this paper, we introduce a method for analyzing the temporal dynamics of affiliation ... more Abstract In this paper, we introduce a method for analyzing the temporal dynamics of affiliation networks. We define affiliation groups which describe temporally related subsets of actors and describe an approach for exploring changing memberships in these affiliation groups over time.
In this project, I design and implement a system for querying large uncertain graphs with identit... more In this project, I design and implement a system for querying large uncertain graphs with identity uncertainty. We use novel indexing techniques and query optimization methods to enable querying probabilistic graphs of millions of edges on a single machine and reporting the results in seconds.
Uploads
Papers by Lise Getoor