Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

From Web Tables to Concepts: A Semantic Normalization Approach

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

Abstract

Relational Web tables, embedded in HTML or published on data platforms, have become an important resource for many applications, including question answering or entity augmentation. To utilize the data, we require some understanding of what the tables are about. Previous research on recovering Web table semantics has largely focused on simple tables, which only describe a single semantic concept. However, there is also a significant number of de-normalized multi-concept tables on the Web. Treating these as single-concept tables results in many incorrect relations being extracted. In this paper, we propose a normalization approach to decompose multi-concept tables into smaller single-concept tables. First, we identify columns that represent keys or identifiers of entities. Then, we utilize the table schema as well as intrinsic data correlations to identify concept boundaries and split the tables accordingly. Experimental results on real Web tables show that our approach is feasible and effectively identifies semantic concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://nycopendata.socrata.com/.

  2. 2.

    https://opendata.socrata.com/.

References

  1. Bahmani, A., Naghibzadeh, M., Bahmani, B.: Automatic database normalization and primary key generation. In: Canadian Conference on Electrical and Computer Engineering, CCECE 2008, pp. 000011–000016, May 2008

    Google Scholar 

  2. Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. Proc. VLDB Endow. 2, 1090–1101 (2009)

    Article  Google Scholar 

  3. Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endow. 1(1), 538–549 (2008)

    Article  Google Scholar 

  4. Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., Wu, E.: Uncovering the relational web. In: WebDB (2008)

    Google Scholar 

  5. Das Sarma, A., Fang, L., Gupta, N., Halevy, A.Y., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 817–828 (2012)

    Google Scholar 

  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  7. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)

    Article  MATH  Google Scholar 

  8. Ilyas, I.F., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: Cords: automatic discovery of correlations and soft functional dependencies. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, New York, NY, USA, pp. 647–658. ACM (2004)

    Google Scholar 

  9. Sorrentino, S., Bergamaschi, B., Gawinecki, M., Po, L.: Schema normalization for improving schema matching. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 280–293. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011)

    Article  Google Scholar 

  11. Wang, D.Z., Dong, X.L., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: 12th International Workshop on the Web and Databases, WebDB 2009, Providence, Rhode Island, USA, 28 June 2009

    Google Scholar 

  12. Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, New York, NY, USA, pp. 97–108. ACM (2012)

    Google Scholar 

  14. Zhang, M., Chakrabarti, K.: Infogather+: Semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, pp. 145–156. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katrin Braunschweig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Braunschweig, K., Thiele, M., Lehner, W. (2015). From Web Tables to Concepts: A Semantic Normalization Approach. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25264-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25263-6

  • Online ISBN: 978-3-319-25264-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics