Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1341012.1341045acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

STEWARD: architecture of a spatio-textual search engine

Published: 07 November 2007 Publication History

Abstract

STEWARD ("Spatio-Textual Extraction on the Web Aiding Retrieval of Documents"), a system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents, is presented. Methods for retrieving and processing web documents, extracting and disambiguating georeferences, and identifying geographic focus are described. A brief overview of STEWARD's querying capabilities, as well as the design of an intuitive user interface, are provided. Finally, several application scenarios and future extensions to STEWARD are discussed.

References

[1]
Geographic names information system (GNIS), U.S. Geological Survey, 2004. Available from http://geonames.usgs.gov/pls/gnispublic/.
[2]
GEOnet names server (GNS), National Geospatial-Intelligence Agency, 2007. Available from http://earth-info.nga.mil/gns/html/index.html.
[3]
E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-Where: geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 273--280, Sheffield, UK, July 2004.
[4]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International conference on World Wide Web, pages 107--117, Brisbane, Australia, Apr. 1998.
[5]
J. D. Burger, J. C. Henderson, and W. T. Morgan. Statistical named entity recognizer adaptation. In Proceedings of the 6th Conference on Natural Language Learning, pages 163--166, Taipei, Taiwan, Aug. 2002.
[6]
E. Charniak. Statistical techniques for natural language parsing. AI Magazine, 18(4):33--44, 1997.
[7]
Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In Proceedings of the ACM SIGMOD Conference, pages 277--288, Chicago, IL, June 2006.
[8]
S. Cucerzan and D. Yarowsky. Language independent NER using a unified model of internal and contextual evidence. In Proceedings of the 6th Conference on Natural Language Learning, pages 171--175, Taipei, Taiwan, Aug. 2002.
[9]
J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases, pages 545--556, Cairo, Egypt, Sept. 2000.
[10]
W. N. Francis and H. Kucera. Brown corpus manual, 1964. Available from http://icame.uib.no/brown/bcm.html.
[11]
D. Jurafsky and J. H. Martin. Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall, Upper Saddle River, NJ, Jan. 2000.
[12]
J. L. Leidner. Toponym resolution in text: "which Sheffield is it?". In Proceedings of the the 27th Annual International ACM SIGIR Conference (SIGIR 2004), page 602, Sheffield, UK, July 2004. Abstract, Doctoral Consortium.
[13]
J. L. Leidner. Towards a reference corpus for automatic toponym resolution evaluation. In Proceedings of the Workshop on Geographic Information Retrieval, Sheffield, UK, July 2004. Online Proceedings.
[14]
J. L. Leidner, G. Sinclair, and B. Webber. Grounding spatial named entities for information extraction and question answering. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 31--38, Edmonton, CA, May 2003.
[15]
H. Li, R. K. Srihari, C. Niu, and W. Li. Location normalization for information extraction. In Proceedings of the 19th International Conference on Computational Linguistics, pages 1--7, Taipei, Taiwan, Aug. 2002.
[16]
H. Li, R. K. Srihari, C. Niu, and W. Li. InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, CA, May 2003.
[17]
J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi. Topic detection and tracking with spatio-temporal evidence. In Proceedings of 25th European Conference on Information Retrieval Research, pages 251--265, Pisa, Italy, Apr. 2003.
[18]
R. Malouf. Markov models for language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning, pages 187--190, Taipei, Taiwan, Aug. 2002.
[19]
A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and implementation of a geographic search engine. In Proceedings of the 8th International Workshop on the Web & Databases, pages 19--24, Baltimore, MD, June 2005.
[20]
K. S. McCurley. Geospatial mapping and navigation of the web. In Proceedings of the 10th International World Wide Web Conference, pages 221--229, Hong Kong, China, May 2001.
[21]
P. McNamee and J. Mayfield. Entity extraction without language-specific resources. In Proceedings of the 6th Conference on Natural Language Learning, pages 183--186, Taipei, Taiwan, Aug. 2002.
[22]
R. C. Nelson and H. Samet. A consistent hierarchical representation for vector data. Computer Graphics, 20(4):197--206, Aug. 1986. Also in Proceedings of the SIGGRAPH'86 Conference, Dallas, TX, August 1986.
[23]
A. Olligschlaeger and A. Hauptmann. Multimodal information systems and GIS: The Informedia digital video library. In Proceedings of the 19th Annual ESRI User Conference, pages 27--30, San Diego, CA, July 1999.
[24]
J. Patrick, C. Whitelaw, and R. Munro. SLINERC: the Sydney language-independent named entity recogniser and classifier. In Proceedings of the 6th Conference on Natural Language Learning, pages 199--202, Taipei, Taiwan, Aug. 2002.
[25]
E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 50--54, Edmonton, CA, May 2003.
[26]
Y. Ravin and N. Wacholder. Extracting names from natural-language text. Technical Report RC 2033, IBM Research Report, Yorktown Heights, NY, 1997.
[27]
T. Sagara and M. Kitsuregawa. Yellow page driven methods of collecting and scoring spatial web documents. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--8, Sheffield, UK, July 2004.
[28]
G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, New York, NY, 1986.
[29]
H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan-Kaufmann, San Francisco, 2006.
[30]
D. Smith and G. Mann. Bootstrapping toponym classifiers. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 39--44, Edmonton, CA, May 2003.
[31]
D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, pages 127--136, Darmstadt, Germany, 2001.
[32]
R. Srihari, C. Niu, and W. Li. A hybrid approach for named entity and sub-type tagging. In Proceedings of the 6th Conference on Applied Natural Language Processing, pages 247--254, Seattle, WA, Apr. 2000.
[33]
S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for geographical search on the web. In Proceedings of the 9th International Symposium on Spatial and Temporal Databases, pages 218--235, Angra dos Reis, Brazil, Aug. 2005.
[34]
R. Volz, J. Kleb, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In WWW 2007: Proceedings of the 16th international conference on World Wide Web, Banff, Canada, May 2007. ACM.
[35]
A. Woodruff and C. Plaunt. GIPSY: Automated geographic indexing of text documents. Journal of the American Society of Information Science, 45(9):645--655, 1994.
[36]
D. Wu, G. Ngai, M. Carpuat, J. Larsen, and Y. Yang. Boosting for named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning, pages 195--198, Taipei, Taiwan, Aug. 2002.
[37]
G. Zhou and J. Su. Named entity recognition using an HMM-based chunk tagger. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 209--219, Philadelphia, PA, 2001.
[38]
Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index structures for location-based web search. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 155--162, Bremen, Germany, Oct. 2005.

Cited By

View all

Index Terms

  1. STEWARD: architecture of a spatio-textual search engine

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      GIS '07: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
      November 2007
      439 pages
      ISBN:9781595939142
      DOI:10.1145/1341012
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • ESRI
      • Google Inc.
      • Oak Ridge National Laboratory
      • Microsoft: Microsoft

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 November 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. STEWARD
      2. geocoding
      3. spatio-textual search engine

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      GIS07
      Sponsor:
      • Microsoft

      Acceptance Rates

      Overall Acceptance Rate 257 of 1,238 submissions, 21%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Quantum space-efficient large language models for Prolog query translationQuantum Information Processing10.1007/s11128-024-04559-823:10Online publication date: 16-Oct-2024
      • (2022)A foundation for spatio-textual-temporal cube analyticsInformation Systems10.1016/j.is.2022.102009108:COnline publication date: 3-Jun-2022
      • (2022)Multi-feature, multi-modal, and multi-source social event detectionInformation Fusion10.1016/j.inffus.2021.10.01379:C(279-308)Online publication date: 1-Mar-2022
      • (2022)A novel framework for multiclass supervised classification of location-sensitive eventsMultimedia Tools and Applications10.1007/s11042-021-11842-882:7(9667-9692)Online publication date: 16-Feb-2022
      • (2021)MusicStandProceedings of the 29th International Conference on Advances in Geographic Information Systems10.1145/3474717.3484211(446-449)Online publication date: 2-Nov-2021
      • (2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
      • (2020)Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News DomainISPRS International Journal of Geo-Information10.3390/ijgi91207129:12(712)Online publication date: 28-Nov-2020
      • (2020)Could spatial features help the matching of textual data?Intelligent Data Analysis10.3233/IDA-19474924:5(1043-1064)Online publication date: 30-Sep-2020
      • (2020)Using Animation to Visualize Spatio-Temporal Varying COVID-19 DataProceedings of the 1st ACM SIGSPATIAL International Workshop on Modeling and Understanding the Spread of COVID-1910.1145/3423459.3430761(53-62)Online publication date: 3-Nov-2020
      • (2019)Scalable Processing of Spatial-Keyword QueriesSynthesis Lectures on Data Management10.2200/S00892ED1V01Y201901DTM05611:1(1-116)Online publication date: 7-Feb-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media