research-article

Proposing of modular system for web information extraction

Authors:

CompSysTech '09: Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing

Article No.: 99, Pages 1 - 4

https://doi.org/10.1145/1731740.1731847

Published: 18 June 2009 Publication History

Get Access

Abstract

The paper discusses dimensions and parameters of web information extraction such as level of automatism, type of source document related to its structuring and dependence on extractable domain. In second half we analyze extraction process in more detail according to considered dimensions, propose its phases and discuss modules for each one and interfaces between phases.

References

[1]

M. Labsky, V. Svatek, M. Nekvasil, and D. Rak. The ex project: Web information extraction using extraction ontologies. In PriCKL'07, ECML/PKDD Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, 2007.

Digital Library

Google Scholar

[2]

Li Ding et al., Swoogle: A Search and Metadata Engine for the Semantic Web, Proceedings of the 13th ACM Conf. on Information and Knowledge Management, 2004.

Digital Library

Google Scholar

[3]

Maruščák D., Novotný R., Vojtáš P., Unsupervised Structured Web Data and Attribute Value Extraction, Proc. 8th. Annual Conf. Znalosti 2009, Brno.

Google Scholar

[4]

A Survey of Web Information Extraction Systems, 2006, Kayed, Mohammed and Shaalan, Khaled F., In IEEE Trans. on Knowl. and Data Eng. Vol. 18, 10/2006

Digital Library

Google Scholar

[5]

Laender, A. H. F., Ribeiro-Neto, B. and DA Silva, A., S., DEByE -- Data Extraction by Example. Data and Knowledge Engeneering

Digital Library

Google Scholar

Cited By

View all

Tseng C(2012)Implementing web of data with VWBE & H2X2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA.2012.6360790(558-563)Online publication date: Jul-2012
https://doi.org/10.1109/ICIEA.2012.6360790
Tseng C(2011)Techniques for Building a Read/Write/Execute-Able WebApplied Mechanics and Materials10.4028/www.scientific.net/AMM.145.138145(138-142)Online publication date: Dec-2011
https://doi.org/10.4028/www.scientific.net/AMM.145.138

Index Terms

Proposing of modular system for web information extraction

Recommendations

Extraction Rule Language for Web Information Extraction and Integration
WISA '13: Proceedings of the 2013 10th Web Information System and Application Conference

The Web is the largest data source that contains a lot of valuable information of interests to users or applications. However, how to automatically navigate and extract useful data from web pages is an important issue to study. There have been a number ...
Tag tree template for Web information and schema extraction

The process of information extraction from Web is both interesting and challenging, which could be helpful in Web Searching, Information Retrieval and Web Mining. Web pages on many sites are produced dynamically as structural records based on a HTML ...
Information Extraction Using Web Usage Mining, Web Scrapping and Semantic Annotation
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks

Extracting useful information from the web is the most significant issue of concern for the realization of semantic web. This may be achieved by several ways among which Web Usage Mining, Web Scrapping and Semantic Annotation plays an important role. ...

Comments

Information & Contributors

Information

Published In

CompSysTech '09: Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing

June 2009

653 pages

ISBN:9781605589862

DOI:10.1145/1731740

Editors:
Boris Rachev
Technical University of Varna, Bulgaria
,
Angel Smrikarov
University of Ruse, Bulgaria

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CompSysTech '09

CompSysTech '09: Proceedings of the 10th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing

June 18 - 19, 2009

Ruse, Bulgaria

Acceptance Rates

Overall Acceptance Rate 241 of 492 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
133
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tseng C(2012)Implementing web of data with VWBE & H2X2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA.2012.6360790(558-563)Online publication date: Jul-2012
https://doi.org/10.1109/ICIEA.2012.6360790
Tseng C(2011)Techniques for Building a Read/Write/Execute-Able WebApplied Mechanics and Materials10.4028/www.scientific.net/AMM.145.138145(138-142)Online publication date: Dec-2011
https://doi.org/10.4028/www.scientific.net/AMM.145.138

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Extraction Rule Language for Web Information Extraction and Integration

Tag tree template for Web information and schema extraction

Information Extraction Using Web Usage Mining, Web Scrapping and Semantic Annotation