Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/584931.584940acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

A framework for web table mining

Published: 08 November 2002 Publication History

Abstract

Web table mining is about information extraction from tables published inside web pages as HTML texts. Most previous work on this subject makes use of the tags to discover components of the table. Our work treats web as a distinct publication media, in two ways. We argue that new types of table format have been developed specially for the web. We also argue that the visual cues embedded within the HTML text, are utilized by the authors to direct the viewer on how to read the contents contained a web table properly. We develop a framework for comprehensively analyzing the structural aspects of a web table, within which rules are devised to process and extract attribute-value pairs from the table. This approach to web table mining is validated by good experimental results.

References

[1]
H. Chen, S. Tsai, and J. Tasi, Ming Tables from Large Scale HTML Texts, In Proc. 18th International Conference on Computational Linguistics, Saabrucken, Germany, July 2000]]
[2]
M. Hurst, Classifying TABLE Elements in HTML. In Poster, 11th International World Wide Web Conference, Honolulu, HI, May 2002, http://www2002.org/CDROM/poster/115/index.html]]
[3]
H. Ng, C. Lim, and J. Koo, Learning to Recognize Tables in Free Text, In Proc. of the 37th Annual Meeting of ACL, 1999, p. 443--450.]]
[4]
Y. Wang, and J. Hu, A Machine Learning Based Approach for Table Detection on The Web, In Proc. 11th International World Wide Web Conference, Honolulu, HI, May 2002, pp. 242--250]]
[5]
X. Wang, and D. Wood, A Conceptual Model for Tables, In Proc. of Principles of Digital Document Processing, 4th International Workshop, PODDP'98, Saint Malo, France, March, 1998, Lecture Notes in Computer Science, Vol. 1481]]
[6]
Y. Yang, Web Table Mining and Database Discovery, M.Sc. thesis, Simon Fraser University, August, 2002]]
[7]
M. Yoshida, K. Torisawa, and J. Tsujji, A Method to Integrate Tables of the World Wide Web, In Proc. 1st International Workshop on Web Document Analysis, Seattle, WA, USA, September 2001, pp. 31--34]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WIDM '02: Proceedings of the 4th international workshop on Web information and data management
November 2002
116 pages
ISBN:1581135939
DOI:10.1145/584931
  • Program Chairs:
  • Roger Chiang,
  • Ee-Peng Lim
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data extraction
  2. information extraction
  3. table mining
  4. web pages

Qualifiers

  • Article

Conference

CIKM02

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Classification of Layout vs. Relational Tables on the Web: Machine Learning with Rendered PagesACM Transactions on the Web10.1145/355534917:1(1-23)Online publication date: 20-Dec-2022
  • (2022)On extracting data from tables that are encoded using HTMLKnowledge-Based Systems10.1016/j.knosys.2019.105157190:COnline publication date: 22-Apr-2022
  • (2022)TOMATEInformation Sciences: an International Journal10.1016/j.ins.2021.04.087577:C(49-68)Online publication date: 22-Apr-2022
  • (2022)A coral-reef approach to extract information from HTML tablesApplied Soft Computing10.1016/j.asoc.2021.107980115:COnline publication date: 6-May-2022
  • (2022)A hybrid quantum approach to leveraging data from HTML tablesKnowledge and Information Systems10.1007/s10115-021-01636-7Online publication date: 8-Jan-2022
  • (2022)Table understanding: Problem overviewWIREs Data Mining and Knowledge Discovery10.1002/widm.148213:1Online publication date: 21-Nov-2022
  • (2021)A clustering approach to extract data from HTML tablesInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10268358:6Online publication date: 1-Nov-2021
  • (2020)TULIP: A Five-Star Table and List - From Machine-Readable to Machine-Understandable SystemsLinked Open Data - Applications, Trends and Future Developments10.5772/intechopen.91406Online publication date: 19-Nov-2020
  • (2020)Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble LearningProceedings of The Web Conference 202010.1145/3366423.3380174(951-961)Online publication date: 20-Apr-2020
  • (2019)Tablepedia: Automating PDF Table Reading in an Experimental Evidence Exploration and Analytic SystemThe World Wide Web Conference10.1145/3308558.3314118(3615-3619)Online publication date: 13-May-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media