Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2427336.2427339guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

A domain independent framework for extracting linked semantic data from tables

Published: 01 January 2012 Publication History

Abstract

Vast amounts of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information benefits from understanding its intended meaning and making it explicit in a semantic representation language like RDF. Most current approaches to generating Semantic Web representations from tables requires human input to create schemas and often results in graphs that do not follow best practices for linked data. Evidence for a table's meaning can be found in its column headers, cell values, implicit relations between columns, caption and surrounding text but also requires general and domain-specific background knowledge. Approaches that work well for one domain, may not necessarily work well for others. We describe a domain independent framework for interpreting the intended meaning of tables and representing it as Linked Data. At the core of the framework are techniques grounded in graphical models and probabilistic reasoning to infer meaning associated with a table. Using background knowledge from resources in the Linked Open Data cloud, we jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns and represent the inferred meaning as graph of RDF triples. A table's meaning is thus captured by mapping columns to classes in an appropriate ontology, linking cell values to literal constants, implied measurements, or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns.

References

[1]
Berners-Lee, T.: Linked data (July 2006), http://www.w3.org/DesignIssues/LinkedData.html
[2]
Bizer, C.: The emerging web of linked data. IEEE Intelligent Systems 24(5), 87-92 (2009).
[3]
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia - a crystallization point for the web of data. Journal of Web Semantics 7(3), 154-165 (2009).
[4]
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proc. ACM Int. Conf. on Management of Data, pp. 1247-1250. ACM (2008).
[5]
Cafarella, M.J., Halevy, A.Y., Wang, Z.D., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. PVLDB 1(1), 538-549 (2008).
[6]
Cohen, A., Adams, C., Davis, J., Yu, C., Yu, P., Meng, W., Duggan, L., McDonagh, M., Smalheiser, N.: Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. In: Proc. 1st ACM Int. Health Informatics Symposium, pp. 376-380. ACM (2010).
[7]
Dataset 1425 - Census of Agriculture Race, Ethnicity and Gender Profile Data (2009), http://explore.data.gov/Agriculture/Census-of-Agriculture-Race-Ethnicity-and-Gender-Pr/yd4n-fk45.
[8]
Ding, L., DiFranzo, D., Graves, A., Michaelis, J.R., Li, X., McGuinness, D.L., Hendler, J.A.: Twc data-gov corpus: incrementally generating linked government data from data.gov. In: Proc 19th Int. Conf. on the WorldWideWeb, pp. 1383-1386. ACM, New York (2010).
[9]
Embley, D.W., Lopresti, D.P., Nagy, G.: Notes on Contemporary Table Recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 164-175. Springer, Heidelberg (2006).
[10]
Han, L., Finin, T., McNamee, P., Joshi, A., Yesha, Y.: Improving word similarity by augmenting pmi with estimates of word polysemy. IEEE Transactions on Knowledge and Data Engineering (2012).
[11]
Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.: RDF123: From Spreadsheets to RDF. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451-466. Springer, Heidelberg (2008).
[12]
Hurst, M.: Towards a theory of tables. IJDAR 8(2-3), 123-131 (2006).
[13]
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009).
[14]
Langegger, A., Wöß, W.: XLWrap - Querying and Integrating Arbitrary Spreadsheets with SPARQL. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 359-374. Springer, Heidelberg (2009).
[15]
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Tech. Rep. 8, Soviet Physics Doklady (1966).
[16]
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proc. 36th Int. Conf. on Very Large Databases (2010).
[17]
Miller, G.A.: Wordnet: a lexical database for english. CACM 38, 39-41 (1995).
[18]
Mulwad, V.: T2LD - An automatic framework for extracting, interpreting and representing tables as Linked Data. Master's thesis, U. of Maryalnd, Baltimore County (August 2010).
[19]
Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proc. 1st Int. Workshop on Consuming Linked Data, Shanghai (2010).
[20]
Polfliet, S., Ichise, R.: Automated mapping generation for converting databases into linked data. In: Proc. 9th Int. Semantic Web Conf. (November 2010).
[21]
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11(1), 95-130 (1999).
[22]
Sackett, D., Rosenberg, W., Gray, J., Haynes, R., Richardson, W.: Evidence based medicine: what it is and what it isn't. BMJ 312(7023), 71 (1996).
[23]
Sahoo, S.S., Halb, W., Hellmann, S., Idehen, K., Thibodeau Jr., T., Auer, S., Sequeda, J., Ezzat, A.: A survey of current approaches for mapping of relational databases to rdf. Tech. rep., W3C (2009).
[24]
Salton, G., Mcgill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986).
[25]
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th Int. World Wide Web Conf. ACM Press, New York (2007).
[26]
Syed, Z., Finin, T.: Creating and Exploiting a Hybrid Knowledge Base for Linked Data. In: Filipe, J., Fred, A., Sharp, B. (eds.) ICAART 2010. CCIS, vol. 129, pp. 3-21. Springer, Heidelberg (2011).
[27]
Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a Web of Semantic Data for Interpreting Tables. In: Proc. 2nd Web Science Conf. (April 2010).
[28]
Vavliakis, K.N., Grollios, T.K., Mitkas, P.A.: Rdote - transforming relational databases into semantic web data. In: Proc. 9th Int. Semantic Web Conf. (November 2010).
[29]
Venetis, P., Halevy, A., Madhavan, J., Pasca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. In: Proc. 37th Int. Conf. on Very Large Databases (2011).
[30]
Wang, J., Shao, B., Wang, H., Zhu, K.Q.: Understanding tables on the web. Tech. rep., Microsoft Research Asia (2011).
[31]
Wu, W., Li, H., Wang, H., Zhu, K.: Towards a probabilistic taxonomy of many concepts. Tech. rep., Microsoft Research Asia (2011).
[32]
Zagari, R., Bianchi-Porro, G., Fiocca, R., Gasbarrini, G., Roda, E., Bazzoli, F.: Comparison of 1 and 2 weeks of omeprazole, amoxicillin and clarithromycin treatment for helicobacter pylori eradication: the hyper study. Gut 56(4), 475 (2007).

Cited By

View all
  • (2017)Combining information on structure and content to automatically annotate natural science spreadsheetsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2017.02.006103:C(63-76)Online publication date: 1-Jul-2017
  • (2015)Declarative Data ExchangeProceedings of the 19th International Database Engineering & Applications Symposium10.1145/2790755.2790764(96-105)Online publication date: 13-Jul-2015
  • (2013)Semantic extraction of geographic data from web tables for big data integrationProceedings of the 7th Workshop on Geographic Information Retrieval10.1145/2533888.2533939(19-26)Online publication date: 5-Nov-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide books
Search Computing: broadening web search
January 2012
255 pages
ISBN:9783642342127
  • Editors:
  • Stefano Ceri,
  • Marco Brambilla

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2012

Author Tags

  1. RDF
  2. entity linking
  3. graphical models
  4. linked data
  5. machine learning
  6. semantic web
  7. tables

Qualifiers

  • Chapter

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Combining information on structure and content to automatically annotate natural science spreadsheetsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2017.02.006103:C(63-76)Online publication date: 1-Jul-2017
  • (2015)Declarative Data ExchangeProceedings of the 19th International Database Engineering & Applications Symposium10.1145/2790755.2790764(96-105)Online publication date: 13-Jul-2015
  • (2013)Semantic extraction of geographic data from web tables for big data integrationProceedings of the 7th Workshop on Geographic Information Retrieval10.1145/2533888.2533939(19-26)Online publication date: 5-Nov-2013
  • (2013)A Graph-Based Approach to Learn Semantic Descriptions of Data SourcesProceedings of the 12th International Semantic Web Conference - Part I10.1007/978-3-642-41335-3_38(607-623)Online publication date: 21-Oct-2013
  • (2013)Semantic Message Passing for Generating Linked Data from TablesProceedings of the 12th International Semantic Web Conference - Part I10.1007/978-3-642-41335-3_23(363-378)Online publication date: 21-Oct-2013

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media