Skip to main content

Lise Getoor

University of Maryland, Computer Science, Faculty Member

Followers

137

Following

1

Public Views

Interests

Uploads

Papers by Lise Getoor

Online Scheduling for Reprographic Machines

We present a real-world online scheduling application. In this application, the problem input is ... more We present a real-world online scheduling application. In this application, the problem input is fed incrementally to the scheduler, and the scheduler has only a portion of the entire job available before it has to start making scheduling decisions. In fact, job submission, scheduling and execution may all happen in parallel and at di erent speeds.

Entity resolution in graphs

In many applications, there are a variety of ways of referring to the same underlying real-world ... more In many applications, there are a variety of ways of referring to the same underlying real-world entity. For example, J. Doe, Jonathan Doe, and Jon Doe may all refer to the same person. In addition, entity references may be linked or grouped together. For example, Jonathan Doe may be married to Jeanette Doe and may have dependents James Doe, Jason Doe, and Jacqueline Doe, and Jon Doe may be married to Jean Doe and J. Doe may have dependents Jim Doe, Jason Doe, and Jackie Doe.

A feature generation algorithm for sequences with application to splice-site prediction

In this paper we present a new approach to feature selection for sequence data. We identify gener... more In this paper we present a new approach to feature selection for sequence data. We identify general feature categories and give construction algorithms for each of them. We show how they can be integrated in a system that tightly couples feature construction and feature selection. This integrated process, which we refer to as feature generation, allows us to systematically search a large space of potential features.

Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification

Abstract This work investigates design choices in modeling a discourse scheme for improving opini... more Abstract This work investigates design choices in modeling a discourse scheme for improving opinion polarity classification. For this, two diverse global inference paradigms are used: a supervised collective classification framework and an unsupervised optimization framework. Both approaches perform substantially better than baseline approaches, establishing the efficacy of the methods and the underlying discourse scheme.

Cost-sensitive learning with conditional markov networks

Abstract There has been a recent, growing interest in classification and link prediction in struc... more Abstract There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as CRFs (Lafferty et al., 2001) and RMNs (Taskar et al., 2002) support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data.

Link-based classification

Over the past few years, a number of approximate inference algorithms for networked data have bee... more Over the past few years, a number of approximate inference algorithms for networked data have been put forth. We empirically compare the performance of three of the popular algorithms: loopy belief propagation, mean field relaxation labeling and iterative classification. We rate each algorithm in terms of its robustness to noise, both in attribute values and correlations across links. We also compare them across varying types of correlations across links.

Query-time entity resolution

R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding... more R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding to the same real-world entities. Given the abundance of publicly available databases where entities are not resolved, we motivate the problem of quickly processing queries that require resolved entities from such'unclean'databases. We propose a two-stage collective resolution strategy for processing queries.

Materializing multi-relational databases from the web using taxonomic queries

Abstract Recently, much attention has been given to extracting tables from Web data. In this prob... more Abstract Recently, much attention has been given to extracting tables from Web data. In this problem, the column definitions and tuples (such as what" company" is headquartered in what" city,") are extracted from Web text, structured Web data such as lists, or results of querying the deep Web, creating the table of interest. In this paper, we examine the problem of extracting and discovering multiple tables in a given domain, generating a truly multi-relational database as output.

A probabilistic approach for learning folksonomies from structured data

Abstract Learning structured representations has emerged as an important problem in many domains,... more Abstract Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation [7] to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies.

Structure discovery using statistical relational learning

Abstract Statistical relational learning is a newly emerging area of machine learning that combin... more

Probabilistic interval XML

Interest in XML databases has been growing over the last few years. In this paper, we study the p... more Interest in XML databases has been growing over the last few years. In this paper, we study the problem of incorporating probabilistic information into XML databases. We propose the Probabilistic Interval XML (PIXml for short) data model in this paper. Using this data model, users can express probabilistic information within XML markups. In addition, we provide two alternative formal model-theoretic semantics for PIXml data. The first semantics is a ��global�� semantics which is relatively intuitive, but is not directly amenable to computation.

Growing a tree in the forest: Constructing folksonomies by integrating structured metadata

Abstract Many social Web sites allow users to annotate the content with descriptive metadata, suc... more Abstract Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content.

From Bayesian Networks to Probabilistic Relational Models: Bridging the Gap

Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for r... more Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for representing statistical patterns in propositional domains. Recent work (Cooper and Herskovits 1992; Heckerman 1998) develops effective techniques for learning these models directly from data. The learning algorithms have been quite successful, however the techniques apply only to attribute-value or flat, representations of the data. Any richer relational structure in the domain cannot be modeled.

Efficient Resource-constrained Retrospective Analysis of Long Video Sequences

Introduction: New technology is creating large stores of digital video. Real time processing of t... more Introduction: New technology is creating large stores of digital video. Real time processing of these huge data sets is extremely challenging; retrospective or forensic analysis creates even greater problems when one must rapidly examine hours or days of video from thousands of cameras. We develop a new method for controlling processing, so that available resources are directed at the most relevant portions of the video.

Statistical Relational Learning and Link Mining

Page 1. Statistical Relational Learning and Link Mining Lise Getoor University of Maryland, Colle... more

Identifying graphs from noisy and incomplete data

Abstract There is a growing wealth of data describing networks of various types, including social... more Abstract There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices.

Statistical Relational Learning as an Enabling Technology for Data Acquisition and Data Fusion in Heterogeneous Sensor Networks

Abstract: Our work has focused on developing new cost sensitive feature acquisition and classific... more Abstract: Our work has focused on developing new cost sensitive feature acquisition and classification algorithms, mapping these algorithms onto camera networks, and creating a test bed of video data and implemented vision algorithms that we can use to implement these. First, we will describe a new algorithm that we have developed for feature acquisition in Hidden Markov Models (HMMs).

Local structure and determinism in probabilistic databases

Abstract While extensive work has been done on evaluating queries over tuple-independent probabil... more Abstract While extensive work has been done on evaluating queries over tuple-independent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, eg, information extraction, data integration, computer vision, etc.

The dynamics of actor loyalty to groups in affiliation networks

Abstract In this paper, we introduce a method for analyzing the temporal dynamics of affiliation ... more

Walaa Eldin M. Moustafa

In this project, I design and implement a system for querying large uncertain graphs with identit... more

Online Scheduling for Reprographic Machines

We present a real-world online scheduling application. In this application, the problem input is ... more We present a real-world online scheduling application. In this application, the problem input is fed incrementally to the scheduler, and the scheduler has only a portion of the entire job available before it has to start making scheduling decisions. In fact, job submission, scheduling and execution may all happen in parallel and at di erent speeds.

Entity resolution in graphs

In many applications, there are a variety of ways of referring to the same underlying real-world ... more In many applications, there are a variety of ways of referring to the same underlying real-world entity. For example, J. Doe, Jonathan Doe, and Jon Doe may all refer to the same person. In addition, entity references may be linked or grouped together. For example, Jonathan Doe may be married to Jeanette Doe and may have dependents James Doe, Jason Doe, and Jacqueline Doe, and Jon Doe may be married to Jean Doe and J. Doe may have dependents Jim Doe, Jason Doe, and Jackie Doe.

A feature generation algorithm for sequences with application to splice-site prediction

In this paper we present a new approach to feature selection for sequence data. We identify gener... more In this paper we present a new approach to feature selection for sequence data. We identify general feature categories and give construction algorithms for each of them. We show how they can be integrated in a system that tightly couples feature construction and feature selection. This integrated process, which we refer to as feature generation, allows us to systematically search a large space of potential features.

Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification

Abstract This work investigates design choices in modeling a discourse scheme for improving opini... more Abstract This work investigates design choices in modeling a discourse scheme for improving opinion polarity classification. For this, two diverse global inference paradigms are used: a supervised collective classification framework and an unsupervised optimization framework. Both approaches perform substantially better than baseline approaches, establishing the efficacy of the methods and the underlying discourse scheme.

Cost-sensitive learning with conditional markov networks

Abstract There has been a recent, growing interest in classification and link prediction in struc... more Abstract There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as CRFs (Lafferty et al., 2001) and RMNs (Taskar et al., 2002) support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data.

Link-based classification

Over the past few years, a number of approximate inference algorithms for networked data have bee... more Over the past few years, a number of approximate inference algorithms for networked data have been put forth. We empirically compare the performance of three of the popular algorithms: loopy belief propagation, mean field relaxation labeling and iterative classification. We rate each algorithm in terms of its robustness to noise, both in attribute values and correlations across links. We also compare them across varying types of correlations across links.

Query-time entity resolution

R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding... more R��sum��/Abstract The goal of entity resolution is to reconcile database references corresponding to the same real-world entities. Given the abundance of publicly available databases where entities are not resolved, we motivate the problem of quickly processing queries that require resolved entities from such'unclean'databases. We propose a two-stage collective resolution strategy for processing queries.

Materializing multi-relational databases from the web using taxonomic queries

Abstract Recently, much attention has been given to extracting tables from Web data. In this prob... more Abstract Recently, much attention has been given to extracting tables from Web data. In this problem, the column definitions and tuples (such as what" company" is headquartered in what" city,") are extracted from Web text, structured Web data such as lists, or results of querying the deep Web, creating the table of interest. In this paper, we examine the problem of extracting and discovering multiple tables in a given domain, generating a truly multi-relational database as output.

A probabilistic approach for learning folksonomies from structured data

Abstract Learning structured representations has emerged as an important problem in many domains,... more Abstract Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation [7] to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies.

Structure discovery using statistical relational learning

Abstract Statistical relational learning is a newly emerging area of machine learning that combin... more

Probabilistic interval XML

Interest in XML databases has been growing over the last few years. In this paper, we study the p... more Interest in XML databases has been growing over the last few years. In this paper, we study the problem of incorporating probabilistic information into XML databases. We propose the Probabilistic Interval XML (PIXml for short) data model in this paper. Using this data model, users can express probabilistic information within XML markups. In addition, we provide two alternative formal model-theoretic semantics for PIXml data. The first semantics is a ��global�� semantics which is relatively intuitive, but is not directly amenable to computation.

Growing a tree in the forest: Constructing folksonomies by integrating structured metadata

Abstract Many social Web sites allow users to annotate the content with descriptive metadata, suc... more Abstract Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content.

From Bayesian Networks to Probabilistic Relational Models: Bridging the Gap

Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for r... more Abstract Probabilistic graphical models, in particular Bayesian networks, are useful models for representing statistical patterns in propositional domains. Recent work (Cooper and Herskovits 1992; Heckerman 1998) develops effective techniques for learning these models directly from data. The learning algorithms have been quite successful, however the techniques apply only to attribute-value or flat, representations of the data. Any richer relational structure in the domain cannot be modeled.

Efficient Resource-constrained Retrospective Analysis of Long Video Sequences

Introduction: New technology is creating large stores of digital video. Real time processing of t... more Introduction: New technology is creating large stores of digital video. Real time processing of these huge data sets is extremely challenging; retrospective or forensic analysis creates even greater problems when one must rapidly examine hours or days of video from thousands of cameras. We develop a new method for controlling processing, so that available resources are directed at the most relevant portions of the video.

Statistical Relational Learning and Link Mining

Page 1. Statistical Relational Learning and Link Mining Lise Getoor University of Maryland, Colle... more

Identifying graphs from noisy and incomplete data

Abstract There is a growing wealth of data describing networks of various types, including social... more Abstract There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices.

Statistical Relational Learning as an Enabling Technology for Data Acquisition and Data Fusion in Heterogeneous Sensor Networks

Abstract: Our work has focused on developing new cost sensitive feature acquisition and classific... more Abstract: Our work has focused on developing new cost sensitive feature acquisition and classification algorithms, mapping these algorithms onto camera networks, and creating a test bed of video data and implemented vision algorithms that we can use to implement these. First, we will describe a new algorithm that we have developed for feature acquisition in Hidden Markov Models (HMMs).

Local structure and determinism in probabilistic databases

Abstract While extensive work has been done on evaluating queries over tuple-independent probabil... more Abstract While extensive work has been done on evaluating queries over tuple-independent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, eg, information extraction, data integration, computer vision, etc.

The dynamics of actor loyalty to groups in affiliation networks

Abstract In this paper, we introduce a method for analyzing the temporal dynamics of affiliation ... more

Walaa Eldin M. Moustafa

In this project, I design and implement a system for querying large uncertain graphs with identit... more