Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

Structured data on the web

Published: 01 February 2011 Publication History
  • Get Citation Alerts
  • Abstract

    Google's Web Tables and Deep Web Crawler identify and deliver this otherwise inaccessible resource directly to end users.

    References

    [1]
    Barbosa, L. and Freire, J. Siphoning Hidden-Web data through keyword-based interfaces. In Proceedings of the Brazilian Symposium on Databases, 2004, 309--321.
    [2]
    Bergman. M.K. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7, 1 (2001).
    [3]
    Cafarella, M.J., Halevy, A.Y., and Khoussainova, N. Data integration for the relational Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1090--1101.
    [4]
    Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., and Zhang, Y. WebTables: Exploring the power of tables on the Web. Proceedings of the VLDB Endowment 1, 1 (Aug. 2008), 538--549.
    [5]
    Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., and Wu, E. Uncovering the relational Web. In Proceedings of the 11th International Workshop on the Web and Databases (Vancouver, B.C., June 13, 2008).
    [6]
    Callan, J.P. and Connell, M.E. Query-based sampling of text databases. ACM Transactions on Information Systems 19, 2 (2001), 97--130.
    [7]
    Cars.com (faq); http://siy.cars.com/siy/qsg/faqgeneralinfo.jsp#howmanyads
    [8]
    Cazoodle apartment search; http://apartments.cazoodle.com/
    [9]
    Chang, K.C.-C., He, B., and Zhang, Z. Toward large-scale integration: Building a metaquerier over databases on the Web. In Proceedings of the Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 2005).
    [10]
    Chen, H., Tsai, S., and Tsai, J. Mining tables from large-scale html texts. In Proceedings of the 18th International Conference on Computational Linguistics (Saarbrucken, Germany, July 31--Aug. 4, 2000), 166--172.
    [11]
    Elmeleegy, H., Madhavan, J., and Halevy, A. Harvesting relational tables from lists on the Web. Proceedings of the VLDB Endowment 2, 1 (2009), 1078--1089.
    [12]
    Gatterbauer, W., Bohunsky, P., Herzog, M., Krüupl, B., and Pollak, B. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International World Wide Web Conference (Banff, Canada, May 8--12, 2007), 71--80.
    [13]
    Gonzalez, H., Halevy, A., Jensen, C., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. Google Fusion Tables: Web-centered data management and collaboration. In Proceedings of the SIGMOD ACM Special Interest Group on Management of Data (Indianapolis, 2010). ACM Press, New York, 2010, 1061--1066.
    [14]
    He, B., Patel, M., Zhang, Z., and Chang, K.C.-C. Accessing the Deep Web. Commun. ACM 50, 5 (May 2007), 94--101.
    [15]
    Ipeirotis, P.G. and Gravano, L. Distributed search over the Hidden Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (Hong Kong, Aug. 20--23, 2002), 394--405.
    [16]
    Limaye, G., Sarawagi, S., and Chakrabarti, S. Annotating and searching Web tables using entities, types, and relationships. Proceedings of the VLDB Endowment 3, 1 (2010), 1338--1347.
    [17]
    Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., and Halevy, A.Y. Google's Deep Web Crawl. Proceedings of the VLDB Endowment 1, 1 (2008), 1241--1252.
    [18]
    Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., and Yu, C. Web-scale data integration: You can afford to pay as you go. In Proceedings of the Second Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 7--10, 2007). 342--350.
    [19]
    Ntoulas, A., Zerfos, P., and Cho, J. Downloading textual Hidden Web content through keyword queries. In Proceedings of the Joint Conference on Digital Libraries (Denver, June 7--11, 2005), 100--109.
    [20]
    Raghavan, S. and Garcia-Molina, H. Crawling the Hidden Web. In Proceedings of the 27th International Conference on Very Large Databases (Rome, Italy, Sept. 11--14, 2001), 129--138.
    [21]
    Trulia; http://www.trulia.com/
    [22]
    Wang, Y. and Hu, J. A machine-learning-based approach for table detection on the Web. In Proceedings of the 11th International World Wide Web Conference (Honolulu, 2002), 242--250.
    [23]
    Zanibbi, R., Blostein, D., and Cordy, J. A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7, 1 (2004), 1--16.

    Cited By

    View all
    • (2023)An Overview of the Data Science Process and Data Analytics Within OrganisationsHandbook of Research on AI and Knowledge Engineering for Real-Time Business Intelligence10.4018/978-1-6684-6519-6.ch006(88-104)Online publication date: 7-Apr-2023
    • (2022)Symbolic Semantic Memory in Transformer Language Models2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00166(992-998)Online publication date: Dec-2022
    • (2022)Table understanding: Problem overviewWIREs Data Mining and Knowledge Discovery10.1002/widm.148213:1Online publication date: 21-Nov-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Communications of the ACM
    Communications of the ACM  Volume 54, Issue 2
    February 2011
    115 pages
    ISSN:0001-0782
    EISSN:1557-7317
    DOI:10.1145/1897816
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 February 2011
    Published in CACM Volume 54, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Popular
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)304
    • Downloads (Last 6 weeks)33

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)An Overview of the Data Science Process and Data Analytics Within OrganisationsHandbook of Research on AI and Knowledge Engineering for Real-Time Business Intelligence10.4018/978-1-6684-6519-6.ch006(88-104)Online publication date: 7-Apr-2023
    • (2022)Symbolic Semantic Memory in Transformer Language Models2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00166(992-998)Online publication date: Dec-2022
    • (2022)Table understanding: Problem overviewWIREs Data Mining and Knowledge Discovery10.1002/widm.148213:1Online publication date: 21-Nov-2022
    • (2021)Semantic Table Retrieval Using Keyword and Table QueriesACM Transactions on the Web10.1145/344169015:3(1-33)Online publication date: 13-May-2021
    • (2020)Understanding data search as a socio-technical practiceJournal of Information Science10.1177/016555151983718246:4(459-475)Online publication date: 1-Aug-2020
    • (2020)A Similarity Function for HTML ListsProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430963(309-316)Online publication date: 30-Nov-2020
    • (2020)The Evaluation of Semantic MappingJournal of Physics: Conference Series10.1088/1742-6596/1500/1/0121011500(012101)Online publication date: 29-May-2020
    • (2020)A fully automated approach to a complete Semantic Table InterpretationFuture Generation Computer Systems10.1016/j.future.2020.05.019Online publication date: May-2020
    • (2020)Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital LibraryDigital Libraries for Open Knowledge10.1007/978-3-030-54956-5_14(185-193)Online publication date: 25-Aug-2020
    • (2020) Task‐based human‐structured research data interaction: A discipline independent examination Proceedings of the Association for Information Science and Technology10.1002/pra2.30857:1Online publication date: 22-Oct-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Digital Edition

    View this article in digital edition.

    Digital Edition

    Magazine Site

    View this article on the magazine site (external)

    Magazine Site

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media