Abstract Pointwise mutual information (PMI) is a widely used word similarity measure, but it lack... more Abstract Pointwise mutual information (PMI) is a widely used word similarity measure, but it lacks a clear explanation of how it works. We explore how PMI differs from distributional similarity, and we introduce a novel metric, PMI_max, that augments PMI with information about a word's number of senses. The coefficients of PMI_max are determined empirically by maximizing a utility function based on the performance of automatic thesaurus generation.
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces... more Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM~2013~and SemEval-2014~tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM~2013~task on Semantic Textual Similarity, our best performing system ranked first among the~89~submitted runs. In the SemEval-2014~task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014~task on Cross--Level Semantic Similarity, we ranked first in Sentence--Phrase, Phrase--Word, and Word--Sense subtasks and second in the Paragraph--Sentence subtask.
We demonstrate an end-to-end use case of the semantic web's utility for synthesizing ecologic... more We demonstrate an end-to-end use case of the semantic web's utility for synthesizing ecological and environmental data. ELVIS (the Ecosystem Location Visualization and Information System) is a suite of tools for constructing food webs for a given location. ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with other semantic web resources. In particular, we describe using a Triple Shop application to answer SPARQL queries from a collection of semantic web documents.
We describe our on -going work in using the semantic web in support of ecological informatics, an... more We describe our on -going work in using the semantic web in support of ecological informatics, and demonstrate a distributed platform for constructing end - to -end use cases. Specific ally, we describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface
OWL (the Web Ontology Language) and the related RDF (Resource Description Framework) are XML-base... more OWL (the Web Ontology Language) and the related RDF (Resource Description Framework) are XML-based languages designed to represent the semantics of data. These languages enable systems to go beyond simple controlled vocabularies and specify the contexts and logical relationships among terms. Formal ontologies use classes (eg, Species A) and properties (eg, is a member of, or eats, or has body mass) to represent concepts and relationships as assertions. For example, two assertions might be “Species A is a member ...
Proceedings of the Eigth International Semantic Web Conference, Springer, 2009
The Semantic Web was designed to unambiguously define and use ontologies to encode data and knowl... more The Semantic Web was designed to unambiguously define and use ontologies to encode data and knowledge on the Web. Many people find it difficult, however, to write complex RDF statements and queries because it requires familiarity with the appropriate ontologies and the terms they define. We describe a framework that eases the experiences in authoring and querying RDF data, in which we focus on automatically finding a set of appropriate Semantic Web ontology terms from a set of words used as the labels of nodes ...
Proc. 21st National Conference on Artificial Intelligence. Menlo Park, CA, USA: AAAI Press, 2006
We describe RDF123, a highly flexible open-source tool for transforming spreadsheet data to RDF. ... more We describe RDF123, a highly flexible open-source tool for transforming spreadsheet data to RDF. Existing spreadsheet-to-rdf tools typically map only to star-shaped RDF graphs, ie each spreadsheet row is an instance, with each column representing a property. RDF123, on the other hand, allows users to define mappings to arbitrary graphs, thus allowing much richer spreadsheet semantics to be expressed. Further, each row in the spreadsheet can be mapped with a different RDF scheme. Two interfaces are available. The first is a graphical ...
8th International Workshop on Semantic Evaluation (SemEval 2014)
ABSTRACT We describe UMBC's systems developed for the SemEval 2014 tasks on Multi-lingual... more ABSTRACT We describe UMBC's systems developed for the SemEval 2014 tasks on Multi-lingual Semantic Textual Similarity (Task 10) and Cross-Level Semantic Similarity (Task 3). Our best submission in the Multilingual task ranked second in both English and Spanish subtasks using an unsupervised approach. Our best sys-tems for Cross-Level task ranked second in Paragraph-Sentence and first in both Sentence-Phrase and Word-Sense subtask. The system ranked first for the Phrase-Word subtask but was not included in the official results due to a late submission.
Abstract Pointwise mutual information (PMI) is a widely used word similarity measure, but it lack... more Abstract Pointwise mutual information (PMI) is a widely used word similarity measure, but it lacks a clear explanation of how it works. We explore how PMI differs from distributional similarity, and we introduce a novel metric, PMI_max, that augments PMI with information about a word's number of senses. The coefficients of PMI_max are determined empirically by maximizing a utility function based on the performance of automatic thesaurus generation.
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces... more Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM~2013~and SemEval-2014~tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM~2013~task on Semantic Textual Similarity, our best performing system ranked first among the~89~submitted runs. In the SemEval-2014~task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014~task on Cross--Level Semantic Similarity, we ranked first in Sentence--Phrase, Phrase--Word, and Word--Sense subtasks and second in the Paragraph--Sentence subtask.
We demonstrate an end-to-end use case of the semantic web's utility for synthesizing ecologic... more We demonstrate an end-to-end use case of the semantic web's utility for synthesizing ecological and environmental data. ELVIS (the Ecosystem Location Visualization and Information System) is a suite of tools for constructing food webs for a given location. ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with other semantic web resources. In particular, we describe using a Triple Shop application to answer SPARQL queries from a collection of semantic web documents.
We describe our on -going work in using the semantic web in support of ecological informatics, an... more We describe our on -going work in using the semantic web in support of ecological informatics, and demonstrate a distributed platform for constructing end - to -end use cases. Specific ally, we describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface
OWL (the Web Ontology Language) and the related RDF (Resource Description Framework) are XML-base... more OWL (the Web Ontology Language) and the related RDF (Resource Description Framework) are XML-based languages designed to represent the semantics of data. These languages enable systems to go beyond simple controlled vocabularies and specify the contexts and logical relationships among terms. Formal ontologies use classes (eg, Species A) and properties (eg, is a member of, or eats, or has body mass) to represent concepts and relationships as assertions. For example, two assertions might be “Species A is a member ...
Proceedings of the Eigth International Semantic Web Conference, Springer, 2009
The Semantic Web was designed to unambiguously define and use ontologies to encode data and knowl... more The Semantic Web was designed to unambiguously define and use ontologies to encode data and knowledge on the Web. Many people find it difficult, however, to write complex RDF statements and queries because it requires familiarity with the appropriate ontologies and the terms they define. We describe a framework that eases the experiences in authoring and querying RDF data, in which we focus on automatically finding a set of appropriate Semantic Web ontology terms from a set of words used as the labels of nodes ...
Proc. 21st National Conference on Artificial Intelligence. Menlo Park, CA, USA: AAAI Press, 2006
We describe RDF123, a highly flexible open-source tool for transforming spreadsheet data to RDF. ... more We describe RDF123, a highly flexible open-source tool for transforming spreadsheet data to RDF. Existing spreadsheet-to-rdf tools typically map only to star-shaped RDF graphs, ie each spreadsheet row is an instance, with each column representing a property. RDF123, on the other hand, allows users to define mappings to arbitrary graphs, thus allowing much richer spreadsheet semantics to be expressed. Further, each row in the spreadsheet can be mapped with a different RDF scheme. Two interfaces are available. The first is a graphical ...
8th International Workshop on Semantic Evaluation (SemEval 2014)
ABSTRACT We describe UMBC's systems developed for the SemEval 2014 tasks on Multi-lingual... more ABSTRACT We describe UMBC's systems developed for the SemEval 2014 tasks on Multi-lingual Semantic Textual Similarity (Task 10) and Cross-Level Semantic Similarity (Task 3). Our best submission in the Multilingual task ranked second in both English and Spanish subtasks using an unsupervised approach. Our best sys-tems for Cross-Level task ranked second in Paragraph-Sentence and first in both Sentence-Phrase and Word-Sense subtask. The system ranked first for the Phrase-Word subtask but was not included in the official results due to a late submission.
Uploads
Papers by Lushan Han