Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Award Abstract # 0958143
Collaborative Research: CI-ADDO-EN: Semantic CiteSeer X

Division Of Computer and Network Systems
Initial Amendment Date: June 22, 2010
Latest Amendment Date: September 13, 2012
Award Number: 0958143
Award Instrument: Continuing Grant
Program Manager: Nan Zhang
 Division Of Computer and Network Systems
 Direct For Computer & Info Scie & Enginr
Start Date: July 1, 2010
End Date: June 30, 2016 (Estimated)
Total Intended Award Amount: $897,639.00
Total Awarded Amount to Date: $913,639.00
Funds Obligated to Date: FY 2010 = $299,772.00
FY 2011 = $215,843.00

FY 2012 = $398,024.00
History of Investigator:
  • C. Lee Giles (Principal Investigator)
  • Prasenjit Mitra (Co-Principal Investigator)
  • Bernard Jansen (Co-Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
PA  US  16802-1503
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
PA  US  16802-1503
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Special Projects - CNS,
CCRI-CISE Cmnty Rsrch Infrstrc,
SciSIP-Sci of Sci Innov Policy
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT

Program Reference Code(s): 9178, 9218, 9251, HPCC
Program Element Code(s): 171400, 735900, 762600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070


Access to the scientific literature has changed significantly due to the immediate availability of most new research over the World Wide Web. Taking advantage of this opportunity, new search engine tools have been developed such as Google Scholar and CiteSeer followed by CiteSeerX. CiteSeerX has become one the most comprehensive and widely-used online public resources for the CISE research community. The Semantic CiteSeerX project builds upon the work of CiteSeerX and will continue its original goal of research assistance. It will also provide more tools and features. Using the established CiteSeerX infrastructure, Semantic CiteSeerX extends and enhances this resource for community development. Semantic CiteSeerX will expand the CiteSeerX architecture to increase use, performance, reliability and applications while continuing to expand the breadth and depth of CiteSeerX's collection. In addition new metadata such as algorithms, figures, tables and equations will be extracted and indexed and this data will be provided as an RDF resource. Personalized CiteSeerX search and author recommendations through an extension of MyCiteSeerX will be developed by useing individual search histories combined with exploiting patterns of citations and searches within the community. The impact of CiteSeerX and new features will be evaluated and methods will be explored to increase the availability of CiteSeerX as a community resource. As in the past all software will be released open source. For more information please see: http//citeseerx.ist.psu.edu


Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Sumit Bhatia, Cornelia Caragea, Hung-Hsuan Chen, Jian Wu, Pucktada Treeratpituk, Zhaohui Wu, Madian Khabsa, Prasenjit Mitra, C. Lee Giles "Specialized Research Datasets in the CiteSeerx Digital Library." D-Lib Magazine 18(7/8) (2012) , v.18 , 2012 , p.7/8
Madian Khabsa, C. Lee Giles, "The Number of Scholarly Documents on the Public Web" PLoSONE , 2014 0.1371/journal.pone.0093949
Madian Khabsa, C. Lee Giles "The Number of Scholarly Documents on the Public Web" PLoSONE , v.May 9 , 2014 10.1371/journal.pone.0093949
Madian Khabsa, C. Lee Giles "Chemical entity extraction using CRF and an ensemble of extractors" Journal of Cheminformatics , v.7 , 2015 10.1186/1758-2946-7-S1-S12
Cornelia Caragea, Jian Wu, Sujatha Das Gollapalli, C. Lee Giles "Document Type Classification in Online Digital Libraries." Association for the Advancement of Artificial Intelligence (AAAI) , 2016 , p.3997
Hung-Hsuan Chen, C. Lee Giles "ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank." ACM Transactions on Knowledge Discovery from Data (TKDD) , v.10 , 2015 , p.15
Kunho Kim, Madian Khabsa, C. Lee Giles: "Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN" Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016 , 2016 , p.269
Kyle Williams, C. Lee Giles "Improving Similar Document Retrieval Using a Recursive Pseudo Relevance Feedback Strategy" Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016 , 2016 , p.275
Madian Khabsa, Zhaohui Wu, C. Lee Giles: "Towards Better Understanding of Academic Search" Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016 , 2016 , p.111
Madian Khabsa, Zhaohui Wu, C. Lee Giles: "Towards Better Understanding of Academic Search." Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016 , 2016 , p.111 978-1-4503-4229-2
Sagnik Ray Choudhury, Shuting Wang, C. Lee Giles "Curve Separation for Line Graphs in Scholarly Documents" Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016 , 2016 , p.277
(Showing: 1 - 10 of 13)



This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

CiteseerX is a scientific literature digital library and search engine that has focused primarily on the literature in computer and information science. However, it has recently broadened its coveage.

CiteSeerX has given researchers in the USA and throughout the world access to freely available full text scholarly and scientific papers. Such access has significantly benefited researchers who can then better perform their research since they have access to the data and knowledge inside those papers. This also helps avoid duplicatication of existing work.

CiteseerX aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge.

Rather than creating just another digital library, CiteseerX attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. Citeseerx has developed new methods and algorithms to index PostScript and PDF scholarly articles on the Web.

CiteseerX provides the following features.

Autonomous Citation Indexing (ACI) - CiteSeerX uses ACI to automatically create a citation index that can be used for literature search and evaluation. Compared to traditional citation indices, ACI provides improvements in cost, availability, comprehensiveness, efficiency, and timeliness.

Citation Statistics - CiteSeerX computes citation statistics and related documents for all articles cited in the database, not just the indexed articles.

Reference Linking - As with many online publishers, CiteSeerX allows browsing the database using citation links. However, CiteSeerX performs this automatically.

Citation context - CiteSeerX can show the context of citations to a given paper, allowing a researcher to quickly and easily see what other researchers have to say about an article of interest.

Awareness and Tracking - CiteSeerX provides automatic notification of new citations to given papers, and new papers matching a user profile.

Related documents - CiteSeerX locates related documents using citation and word based measures and displays an active and continuously updated bibliography for each document.

Full-Text Indexing - CiteSeerX indexes the full-text of the entire articles and citations. Full boolean, phrase and proximity search is supported.

Query-Sensitive Summaries - CiteSeerX provides the context of how query terms are used in articles instead of a generic summary, improving the efficiency of search.

Up-To-Date - CiteSeerX is regularly updated based on user submissions and regular crawls.

Powerful Search - CiteSeerX uses fielded search to all complex queries over content, and allows the use of author initials to provide more flexible name search.

Harvesting of Articles - CiteSeerX automatically harvests freely available scholalry papers from the Web.

Metadata Of Articles - CiteSeerX automatically extracts and provides metadata from all indexed articles.

Personal Content Portal - Personal collections, RSS-like notifications, social bookmarking, social network facilities.

CiteSeerX has also provided other features including citation indexing, author name disambiguation, extraction of data from tables, to mention a few. CiteSeerX data has been used by many researchers throughout the world.

CiteSeerX code and data is freely available under a Creative Commons license.

CiteSeerX is available at:


Last Modified: 10/03/2016
Modified by: C. L Giles

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page