Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1571941.1572132acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Template-independent wrapper for web forums

Published: 19 July 2009 Publication History

Abstract

This paper presents a novel work on the task of extracting data from Web forums. Millions of users contribute rich information to Web forum everyday, which has become an important resource for manyWeb applications, such as product opinion retrieval, social network analysis, and so on. The novelty of the proposed algorithm is that it can not only extract the pure text but also distinguish between the original post and replies. Experimental results on a large number of real Web forums indicate that the proposed algorithm can correctly ex

References

[1]
J. Chen, K. Xiao. Perception-oriented online news extraction. Proceedings of JCDL'08, 2008.
[2]
J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the ICML 2001, 2001.
[3]
A. H. F. Laender, B. A. Ribeiro-Neto, A. S. da Silva, and Teixeira, J. S. A brief survey of web data extraction tools. SIGMOD Record 31(2), 2002.
[4]
S. Zheng,R. Song, and J. Wen. Template-independent news extraction based on visual consistency. Proceedings of AAAI 2007, 2007.

Cited By

View all
  • (2012)FODEX -- Towards Generic Data Extraction from Web ForumsProceedings of the 2012 26th International Conference on Advanced Information Networking and Applications Workshops10.1109/WAINA.2012.134(821-826)Online publication date: 26-Mar-2012
  • (2012)Complete-Thread extraction from web forumsProceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications10.1007/978-3-642-29253-8_70(727-734)Online publication date: 11-Apr-2012
  • (2010)Blog extraction with template-independent wrapper2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content10.1109/ICNIDC.2010.5657967(313-317)Online publication date: Sep-2010
  • Show More Cited By

Index Terms

  1. Template-independent wrapper for web forums

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
    July 2009
    896 pages
    ISBN:9781605584836
    DOI:10.1145/1571941

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. conditional random fields
    2. crawler

    Qualifiers

    • Poster

    Conference

    SIGIR '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2012)FODEX -- Towards Generic Data Extraction from Web ForumsProceedings of the 2012 26th International Conference on Advanced Information Networking and Applications Workshops10.1109/WAINA.2012.134(821-826)Online publication date: 26-Mar-2012
    • (2012)Complete-Thread extraction from web forumsProceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications10.1007/978-3-642-29253-8_70(727-734)Online publication date: 11-Apr-2012
    • (2010)Blog extraction with template-independent wrapper2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content10.1109/ICNIDC.2010.5657967(313-317)Online publication date: Sep-2010
    • (2010)Semi-automatic information extraction from discussion boards with applications for anti-spam technologyProceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II10.1007/978-3-642-12165-4_30(370-382)Online publication date: 23-Mar-2010

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media