research-article

STEWARD: architecture of a spatio-textual search engine

Authors:

Michael D. Lieberman,

Jagan Sankaranarayanan,

Jon SperlingAuthors Info & Claims

GIS '07: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems

Article No.: 25, Pages 1 - 8

https://doi.org/10.1145/1341012.1341045

Published: 07 November 2007 Publication History

Abstract

STEWARD ("Spatio-Textual Extraction on the Web Aiding Retrieval of Documents"), a system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents, is presented. Methods for retrieving and processing web documents, extracting and disambiguating georeferences, and identifying geographic focus are described. A brief overview of STEWARD's querying capabilities, as well as the design of an intuitive user interface, are provided. Finally, several application scenarios and future extensions to STEWARD are discussed.

References

[1]

Geographic names information system (GNIS), U.S. Geological Survey, 2004. Available from http://geonames.usgs.gov/pls/gnispublic/.

[2]

GEOnet names server (GNS), National Geospatial-Intelligence Agency, 2007. Available from http://earth-info.nga.mil/gns/html/index.html.

[3]

E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-Where: geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 273--280, Sheffield, UK, July 2004.

Digital Library

[4]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International conference on World Wide Web, pages 107--117, Brisbane, Australia, Apr. 1998.

Digital Library

[5]

J. D. Burger, J. C. Henderson, and W. T. Morgan. Statistical named entity recognizer adaptation. In Proceedings of the 6th Conference on Natural Language Learning, pages 163--166, Taipei, Taiwan, Aug. 2002.

Digital Library

[6]

E. Charniak. Statistical techniques for natural language parsing. AI Magazine, 18(4):33--44, 1997.

Digital Library

[7]

Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In Proceedings of the ACM SIGMOD Conference, pages 277--288, Chicago, IL, June 2006.

Digital Library

[8]

S. Cucerzan and D. Yarowsky. Language independent NER using a unified model of internal and contextual evidence. In Proceedings of the 6th Conference on Natural Language Learning, pages 171--175, Taipei, Taiwan, Aug. 2002.

Digital Library

[9]

J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases, pages 545--556, Cairo, Egypt, Sept. 2000.

Digital Library

[10]

W. N. Francis and H. Kucera. Brown corpus manual, 1964. Available from http://icame.uib.no/brown/bcm.html.

[11]

D. Jurafsky and J. H. Martin. Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall, Upper Saddle River, NJ, Jan. 2000.

Digital Library

[12]

J. L. Leidner. Toponym resolution in text: "which Sheffield is it?". In Proceedings of the the 27th Annual International ACM SIGIR Conference (SIGIR 2004), page 602, Sheffield, UK, July 2004. Abstract, Doctoral Consortium.

Digital Library

[13]

J. L. Leidner. Towards a reference corpus for automatic toponym resolution evaluation. In Proceedings of the Workshop on Geographic Information Retrieval, Sheffield, UK, July 2004. Online Proceedings.

[14]

J. L. Leidner, G. Sinclair, and B. Webber. Grounding spatial named entities for information extraction and question answering. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 31--38, Edmonton, CA, May 2003.

Digital Library

[15]

H. Li, R. K. Srihari, C. Niu, and W. Li. Location normalization for information extraction. In Proceedings of the 19th International Conference on Computational Linguistics, pages 1--7, Taipei, Taiwan, Aug. 2002.

Digital Library

[16]

H. Li, R. K. Srihari, C. Niu, and W. Li. InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, CA, May 2003.

Digital Library

[17]

J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi. Topic detection and tracking with spatio-temporal evidence. In Proceedings of 25th European Conference on Information Retrieval Research, pages 251--265, Pisa, Italy, Apr. 2003.

Digital Library

[18]

R. Malouf. Markov models for language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning, pages 187--190, Taipei, Taiwan, Aug. 2002.

Digital Library

[19]

A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and implementation of a geographic search engine. In Proceedings of the 8th International Workshop on the Web & Databases, pages 19--24, Baltimore, MD, June 2005.

[20]

K. S. McCurley. Geospatial mapping and navigation of the web. In Proceedings of the 10th International World Wide Web Conference, pages 221--229, Hong Kong, China, May 2001.

Digital Library

[21]

P. McNamee and J. Mayfield. Entity extraction without language-specific resources. In Proceedings of the 6th Conference on Natural Language Learning, pages 183--186, Taipei, Taiwan, Aug. 2002.

Digital Library

[22]

R. C. Nelson and H. Samet. A consistent hierarchical representation for vector data. Computer Graphics, 20(4):197--206, Aug. 1986. Also in Proceedings of the SIGGRAPH'86 Conference, Dallas, TX, August 1986.

Digital Library

[23]

A. Olligschlaeger and A. Hauptmann. Multimodal information systems and GIS: The Informedia digital video library. In Proceedings of the 19th Annual ESRI User Conference, pages 27--30, San Diego, CA, July 1999.

[24]

J. Patrick, C. Whitelaw, and R. Munro. SLINERC: the Sydney language-independent named entity recogniser and classifier. In Proceedings of the 6th Conference on Natural Language Learning, pages 199--202, Taipei, Taiwan, Aug. 2002.

Digital Library

[25]

E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 50--54, Edmonton, CA, May 2003.

Digital Library

[26]

Y. Ravin and N. Wacholder. Extracting names from natural-language text. Technical Report RC 2033, IBM Research Report, Yorktown Heights, NY, 1997.

[27]

T. Sagara and M. Kitsuregawa. Yellow page driven methods of collecting and scoring spatial web documents. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--8, Sheffield, UK, July 2004.

[28]

G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, New York, NY, 1986.

Digital Library

[29]

H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan-Kaufmann, San Francisco, 2006.

Digital Library

[30]

D. Smith and G. Mann. Bootstrapping toponym classifiers. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, CA, May 2003.

Digital Library

[31]

D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, pages 127--136, Darmstadt, Germany, 2001.

Digital Library

[32]

R. Srihari, C. Niu, and W. Li. A hybrid approach for named entity and sub-type tagging. In Proceedings of the 6th Conference on Applied Natural Language Processing, pages 247--254, Seattle, WA, Apr. 2000.

Digital Library

[33]

S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for geographical search on the web. In Proceedings of the 9th International Symposium on Spatial and Temporal Databases, pages 218--235, Angra dos Reis, Brazil, Aug. 2005.

Digital Library

[34]

R. Volz, J. Kleb, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In WWW 2007: Proceedings of the 16th international conference on World Wide Web, Banff, Canada, May 2007. ACM.

[35]

A. Woodruff and C. Plaunt. GIPSY: Automated geographic indexing of text documents. Journal of the American Society of Information Science, 45(9):645--655, 1994.

Digital Library

[36]

D. Wu, G. Ngai, M. Carpuat, J. Larsen, and Y. Yang. Boosting for named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning, pages 195--198, Taipei, Taiwan, Aug. 2002.

Digital Library

[37]

G. Zhou and J. Su. Named entity recognition using an HMM-based chunk tagger. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 209--219, Philadelphia, PA, 2001.

Digital Library

[38]

Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index structures for location-based web search. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 155--162, Bremen, Germany, Oct. 2005.

Digital Library

Cited By

Ahmed RSridevi S(2024)Quantum space-efficient large language models for Prolog query translationQuantum Information Processing10.1007/s11128-024-04559-823:10Online publication date: 16-Oct-2024
https://doi.org/10.1007/s11128-024-04559-8
Iqbal MLissandrini MPedersen T(2022)A foundation for spatio-textual-temporal cube analyticsInformation Systems10.1016/j.is.2022.102009108:COnline publication date: 3-Jun-2022
https://dl.acm.org/doi/10.1016/j.is.2022.102009
Afyouni IAghbari ZRazack R(2022)Multi-feature, multi-modal, and multi-source social event detectionInformation Fusion10.1016/j.inffus.2021.10.01379:C(279-308)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.inffus.2021.10.013
Show More Cited By

Index Terms

STEWARD: architecture of a spatio-textual search engine
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents
dg.o '07: Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains

A spatio-textual search engine, termed "STEWARD" is demonstrated where document similarity is based on both the textual similarity as well as the spatial proximity of the locations in the document to the spatial search input. STEWARD's performance is ...
Improving vertical geo/geo disambiguation by increasing geographical feature weights of places
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation Symposium

Importance of geocoding is emphasized in order to develop location-aware information retrieval systems for documents which include place names. Geocoding is used for translating those place names to geocodes. A geocode is often a pair of latitude and ...
Learning to rank for spatiotemporal search
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

In this article we consider the problem of mapping a noisy estimate of a user's current location to a semantically meaningful point of interest, such as a home, restaurant, or store. Despite the poor accuracy of GPS on current mobile devices and the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

GIS '07: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems

November 2007

439 pages

ISBN:9781595939142

DOI:10.1145/1341012

General Chairs:
Hanan Samet
University of Maryland
,
Cyrus Shahabi
University of Southern California
,
Program Chair:
Markus Schneider
University of Florida

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ESRI
Google Inc.
Oak Ridge National Laboratory
Microsoft: Microsoft

In-Cooperation

SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

GIS07

Sponsor:

Microsoft

GIS07: 15th International Symposium on Advances in Geographic Information Systems

November 7 - 9, 2007

Washington, Seattle

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
616
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ahmed RSridevi S(2024)Quantum space-efficient large language models for Prolog query translationQuantum Information Processing10.1007/s11128-024-04559-823:10Online publication date: 16-Oct-2024
https://doi.org/10.1007/s11128-024-04559-8
Iqbal MLissandrini MPedersen T(2022)A foundation for spatio-textual-temporal cube analyticsInformation Systems10.1016/j.is.2022.102009108:COnline publication date: 3-Jun-2022
https://dl.acm.org/doi/10.1016/j.is.2022.102009
Afyouni IAghbari ZRazack R(2022)Multi-feature, multi-modal, and multi-source social event detectionInformation Fusion10.1016/j.inffus.2021.10.01379:C(279-308)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.inffus.2021.10.013
Rani MKaushal S(2022)A novel framework for multiclass supervised classification of location-sensitive eventsMultimedia Tools and Applications10.1007/s11042-021-11842-882:7(9667-9692)Online publication date: 16-Feb-2022
https://doi.org/10.1007/s11042-021-11842-8
Kuo ASamet H(2021)MusicStandProceedings of the 29th International Conference on Advances in Geographic Information Systems10.1145/3474717.3484211(446-449)Online publication date: 2-Nov-2021
https://dl.acm.org/doi/10.1145/3474717.3484211
Anjan Kumar KSatish Kumar TReshma J(2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
https://doi.org/10.1007/978-3-030-71704-9_52
Dewandaru AWidyantoro DAkbar S(2020)Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News DomainISPRS International Journal of Geo-Information10.3390/ijgi91207129:12(712)Online publication date: 28-Nov-2020
https://doi.org/10.3390/ijgi9120712
Fize JRoche MTeisseire M(2020)Could spatial features help the matching of textual data?Intelligent Data Analysis10.3233/IDA-19474924:5(1043-1064)Online publication date: 30-Sep-2020
https://doi.org/10.3233/IDA-194749
Samet HHan YKastner JWei H(2020)Using Animation to Visualize Spatio-Temporal Varying COVID-19 DataProceedings of the 1st ACM SIGSPATIAL International Workshop on Modeling and Understanding the Spread of COVID-1910.1145/3423459.3430761(53-62)Online publication date: 3-Nov-2020
https://dl.acm.org/doi/10.1145/3423459.3430761
Mahmood AAref W(2019)Scalable Processing of Spatial-Keyword QueriesSynthesis Lectures on Data Management10.2200/S00892ED1V01Y201901DTM05611:1(1-116)Online publication date: 7-Feb-2019
https://doi.org/10.2200/S00892ED1V01Y201901DTM056
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents