In this work we propose a metric to assess academic productivity based on publication outputs. We... more In this work we propose a metric to assess academic productivity based on publication outputs. We are interested in knowing how well a research group in an area of knowledge is doing relatively to a pre-selected set of reference groups, where each group is composed by academics or researchers. To assess academic productivity we propose a new metric, which we call P-score. Our metric P-score assigns weights to venues using only the publication patterns of selected reference groups. This implies that P-score does not depend on citation data and thus, that it is simpler to compute particularly in contexts in which citation data is not easily available. Also, preliminary experiments suggest that P-score preserves strong correlation with citation-based metrics.
Proceedings of the 18th Brazilian symposium on Multimedia and the web - WebMedia '12, 2012
ABSTRACT In this paper we present three new methods to extract keywords from web pages using Wiki... more ABSTRACT In this paper we present three new methods to extract keywords from web pages using Wikipedia as an external source of information. The information used from Wikipedia includes the titles of articles, co-occurrence of keywords and categories associated with each Wikipedia definition. We compare our methods with three keyword extraction methods used as baselines: (i) all the terms of a web page, (ii) a TF-IDF implementation that extracts single weighted words of a web page and (iii) a previously proposed Wikipedia-based keyword extraction method presented in the literature. We compare our three keyword extraction methods with the baseline methods in three distinct scenarios, all related to our target application, which is the selection of ads in a context-based advertising system. In the first scenario, the target pages to place ads were extracted from Wikipedia articles, whereas the target pages in the other two scenarios were extracted from a news web site. Experimental results show that our methods are quite competitive solutions for the task of selecting good keywords to represent target web pages, albeit being simple, effective and time efficient. For instance, in the first scenario our best method used to extract keywords from Wikipedia articles achieved an improvement of 33% when compared to the second best baseline, and a gain of 26% when considering all the terms.
This works presents a new query expansion method that improves the quality of the query results i... more This works presents a new query expansion method that improves the quality of the query results in information retrieval systems. Query expansion methods are useful in these systems because they avoid the necessity of an exact word matching between the query and the relevant documents. Beside this important advantage, our method has obtained an improvement of up to 14.9 in the average precision when compared against the system without query expansion. The gain in precision for the top ten results was roughly the same, which indicates that our method may be a good alternative to be used in information retrieval systems.
Museums can make their entire collections available to the world via the Internet. The Thinker Im... more Museums can make their entire collections available to the world via the Internet. The Thinker ImageBase, the San Francisco Fine Arts Museums' online art image database, demonstrates issues involved in managing large storage systems and delivering their contents to users.
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02, 2002
The objective of this paper is to present a new technique for computing term weights for index te... more The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our model are no longer terms, but termsets. The novelty is that we compute term weights using a data mining technique called association rules, which is time
ABSTRACT (SBM), which is an effective technique for computing term weights based on co-occurrence... more ABSTRACT (SBM), which is an effective technique for computing term weights based on co-occurrence patterns, employing the information about the proximity among query terms in documents. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration, leading to a new information retrieval model called proximity set-based model (PSBM). The novelty is that the proximity information is used as a pruning strategy to determine only related co-occurrence term patterns. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that PSBM improves the average precision of the answer set for all four collections evaluated. For the CFC collection, PSBM leads to a gain relative to the standard vector space model (VSM), of 23% in average precision values and 55% in average precision for the top 10 documents. PSBM is also competitive in terms of computational performance, reducing the execution time of the SBM in 21% for the CISI collection.
In this work we propose a metric to assess academic productivity based on publication outputs. We... more In this work we propose a metric to assess academic productivity based on publication outputs. We are interested in knowing how well a research group in an area of knowledge is doing relatively to a pre-selected set of reference groups, where each group is composed by academics or researchers. To assess academic productivity we propose a new metric, which we call P-score. Our metric P-score assigns weights to venues using only the publication patterns of selected reference groups. This implies that P-score does not depend on citation data and thus, that it is simpler to compute particularly in contexts in which citation data is not easily available. Also, preliminary experiments suggest that P-score preserves strong correlation with citation-based metrics.
Proceedings of the 18th Brazilian symposium on Multimedia and the web - WebMedia '12, 2012
ABSTRACT In this paper we present three new methods to extract keywords from web pages using Wiki... more ABSTRACT In this paper we present three new methods to extract keywords from web pages using Wikipedia as an external source of information. The information used from Wikipedia includes the titles of articles, co-occurrence of keywords and categories associated with each Wikipedia definition. We compare our methods with three keyword extraction methods used as baselines: (i) all the terms of a web page, (ii) a TF-IDF implementation that extracts single weighted words of a web page and (iii) a previously proposed Wikipedia-based keyword extraction method presented in the literature. We compare our three keyword extraction methods with the baseline methods in three distinct scenarios, all related to our target application, which is the selection of ads in a context-based advertising system. In the first scenario, the target pages to place ads were extracted from Wikipedia articles, whereas the target pages in the other two scenarios were extracted from a news web site. Experimental results show that our methods are quite competitive solutions for the task of selecting good keywords to represent target web pages, albeit being simple, effective and time efficient. For instance, in the first scenario our best method used to extract keywords from Wikipedia articles achieved an improvement of 33% when compared to the second best baseline, and a gain of 26% when considering all the terms.
This works presents a new query expansion method that improves the quality of the query results i... more This works presents a new query expansion method that improves the quality of the query results in information retrieval systems. Query expansion methods are useful in these systems because they avoid the necessity of an exact word matching between the query and the relevant documents. Beside this important advantage, our method has obtained an improvement of up to 14.9 in the average precision when compared against the system without query expansion. The gain in precision for the top ten results was roughly the same, which indicates that our method may be a good alternative to be used in information retrieval systems.
Museums can make their entire collections available to the world via the Internet. The Thinker Im... more Museums can make their entire collections available to the world via the Internet. The Thinker ImageBase, the San Francisco Fine Arts Museums' online art image database, demonstrates issues involved in managing large storage systems and delivering their contents to users.
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02, 2002
The objective of this paper is to present a new technique for computing term weights for index te... more The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our model are no longer terms, but termsets. The novelty is that we compute term weights using a data mining technique called association rules, which is time
ABSTRACT (SBM), which is an effective technique for computing term weights based on co-occurrence... more ABSTRACT (SBM), which is an effective technique for computing term weights based on co-occurrence patterns, employing the information about the proximity among query terms in documents. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration, leading to a new information retrieval model called proximity set-based model (PSBM). The novelty is that the proximity information is used as a pruning strategy to determine only related co-occurrence term patterns. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that PSBM improves the average precision of the answer set for all four collections evaluated. For the CFC collection, PSBM leads to a gain relative to the standard vector space model (VSM), of 23% in average precision values and 55% in average precision for the top 10 documents. PSBM is also competitive in terms of computational performance, reducing the execution time of the SBM in 21% for the CISI collection.
Uploads
Papers by Nivio Ziviani