Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2505515.2508215acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
demonstration

READFAST: high-relevance search-engine for big text

Published: 27 October 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Relevance of search-results is a key factor for any search engine. In order to return and rank the Web-pages that are most relevant to the query, contemporary search engines use complex ranking functions that depend on hundreds of features. For example, presence or absence of the query keywords on the page, their proximity, frequencies, HTML markup are just a few to name. Additional features might include fonts, tags, hyperlinks, metadata, and parts of the Web-page description. All this information is used by the search-engine to rank HTML Web pages returned to the user, but is unfortunately absent in free text that has no HTML markup, tags, hyperlinks, and any other metadata, except implicit natural language structure.
    Here we demonstrate one of the first Big text search engines that leverages hidden structure of the natural language sentences in order to process user queries and return more relevant search-results than a standard keyword-search. It provides a structured index extracted from the text using Natural Language Processing (NLP) that can be used to browse and query free text.

    References

    [1]
    E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, 2006.
    [2]
    K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
    [3]
    L. Chilton, G. Little, D. Edge, D. Weld, and J. Landay. Cascade: Crowdsourcing taxonomy creation. In CHI, 2013.
    [4]
    X. L. Dong, B. Saha, and D. Srivastava. Explaining data fusion decisions. In WWW, 2013.
    [5]
    X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. In VLDB, 2013.
    [6]
    M. Gubanov, L. Popa, H. Ho, H. Pirahesh, P. Chang, and L. Chen. Ibm ufo repository. In VLDB, 2009.
    [7]
    M. Gubanov, A. Pyayt, and L. Shapiro. Readfast: Browsing large documents through unified famous objects (ufo). In IRI, 2011.
    [8]
    M. Gubanov and L. Shapiro. Using unified famous objects (ufo) to automate alzheimer's disease diagnostics. In BIBM, 2012.
    [9]
    M. Gubanov, L. Shapiro, and A. Pyayt. Learning unified famous objects (ufo) to bootstrap information integration. In IRI, 2011.
    [10]
    M. Gubanov and M. Stonebraker. Bootstraping synonym resolution at web scale. In DIMACS, 2013.
    [11]
    N. Gupta, A. Halevy, B. Harb, H. Lam, H. Lee, J. Madhavan, F. Wu, and C. Yu. Recent progress towards an ecosystem of structured data on the web. In ICDE, 2013.
    [12]
    A. Halevy. Data publishing and sharing using fusion tables. In CIDR, 2013.
    [13]
    R. Helaoui, D. Riboni, M. Niepert, C. Bettini, and H. Stuckenschmidt. Towards activity recognition using probabilistic description logics. In AAAI, 2012.
    [14]
    D. Klein and C. Manning. Fast exact inference with a factored model for natural language parsing. 2007.
    [15]
    M. Niepert. Lifted probabilistic inference: An mcmc perspective. In STAR AI, 2012.
    [16]
    M. Niepert. Markov chains on orbits of permutation groups. In UAI, 2012.
    [17]
    M. Niepert. Rockit: Exploiting parallelism and symmetry for map inference in statistical relational models. In AAAI, 2013.
    [18]
    M. Niepert. Symmetry-aware marginal density estimation. In AAAI, 2013.
    [19]
    A. Singhal. Introducing the knowledge graph: Things, not strings. In Google Blog, 2012.
    [20]
    C. Zhang, R. Hoffmann, and D. Weld. Ontological smoothing for relation extraction with minimal supervision. In AAAI, 2012.

    Cited By

    View all
    • (2022)Simplifying Access to Large-scale Structured Datasets by Meta-Profiling with Scalable Training Set EnrichmentProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520156(2377-2380)Online publication date: 10-Jun-2022
    • (2021)Scalable Tabular Metadata Location and Classification in Large-Scale Structured DatasetsDatabase and Expert Systems Applications10.1007/978-3-030-86472-9_4(35-50)Online publication date: 31-Aug-2021
    • (2020)WebLensProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3417443(3425-3428)Online publication date: 19-Oct-2020
    • Show More Cited By

    Index Terms

    1. READFAST: high-relevance search-engine for big text

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2013

      Check for updates

      Author Tags

      1. data integration
      2. information retrieval
      3. natural language processing
      4. search
      5. structure extraction

      Qualifiers

      • Demonstration

      Conference

      CIKM'13
      Sponsor:
      CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
      October 27 - November 1, 2013
      California, San Francisco, USA

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Simplifying Access to Large-scale Structured Datasets by Meta-Profiling with Scalable Training Set EnrichmentProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520156(2377-2380)Online publication date: 10-Jun-2022
      • (2021)Scalable Tabular Metadata Location and Classification in Large-Scale Structured DatasetsDatabase and Expert Systems Applications10.1007/978-3-030-86472-9_4(35-50)Online publication date: 31-Aug-2021
      • (2020)WebLensProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3417443(3425-3428)Online publication date: 19-Oct-2020
      • (2020)Towards Tabular Embeddings, Training the Relational Models2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377769(5724-5726)Online publication date: 10-Dec-2020
      • (2019)Hybrid.Poly: A Consolidated Interactive Analytical Polystore System2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00223(1996-1999)Online publication date: Apr-2019
      • (2018)Hybrid.AICompanion Proceedings of the The Web Conference 201810.1145/3184558.3191600(1507-1514)Online publication date: 23-Apr-2018
      • (2017)CognitiveDBProceedings of the 26th International Conference on World Wide Web Companion10.1145/3041021.3054735(207-211)Online publication date: 3-Apr-2017
      • (2017)Generating Unified Famous Objects (UFOs) from the classified object tables2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8258537(4771-4773)Online publication date: Dec-2017
      • (2014)Text and structured data fusion in data tamer at scale2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816755(1258-1261)Online publication date: Mar-2014

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media