research-article

Using structured text for large-scale attribute extraction

Authors:

Marius PaşcaAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 1183 - 1192

https://doi.org/10.1145/1458082.1458238

Published: 26 October 2008 Publication History

Abstract

We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In addition to attribute extraction, this approach also automatically identifies values for a subset of the extracted class attributes.

References

[1]

E. Agichtein and L. Gravano. Snowball: Extracting relations from large plaintext collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (DL-00), pages 85--94, San Antonio, Texas, 2000.

Digital Library

[2]

M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2670--2676, Hyderabad, India, 2007.

Digital Library

[3]

M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005.

Digital Library

[4]

H. Chen, S. Tsai, and J. Tsai. Mining tables from large scale html texts. In Proceedings of the 18th International Conference on Computational Linguistics (COLING-00), pages 166--172, 2000.

Digital Library

[5]

T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005.

Digital Library

[6]

A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, 29(1), 2006.

[7]

C. Fellbaum, editor. WordNet: An Electronic Lexical Database and Some of its Applications. MIT Press, 1998.

[8]

T. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1), 2006.

[9]

M. Paşca. Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07), pages 101--110, Banff, Canada, 2007.

Digital Library

[10]

M. Paşca, B. Van Durme, and N. Garera. The role of documents vs. queries in extracting class attributes from text. In Proceedings of the 16th International Conference on Information and Knowledge Management (CIKM-07), pages 485--494, Lisbon, Portugal, 2007.

Digital Library

[11]

P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 113--120, Sydney, Australia, 2006.

Digital Library

[12]

K. Probst, R. Ghani, M. Krema, A. Fano, and Y. Liu. Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2838--2843, Hyderabad, India, 2007.

Digital Library

[13]

M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.

[14]

K. Shinzato and K. Torisawa. Acquiring hyponymy relations from Web documents. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 73--80, Boston, Massachusetts, 2004.

[15]

K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island, Korea, 2005.

Digital Library

[16]

F. Wu and D. Weld. Automatically refining the Wikipedia infobox ontology. In Proceedings of the 17th World Wide Web Conference (WWW-08), pages 635--644, Beijing. China, 2008.

Digital Library

[17]

N. Yoshinaga and K. Torisawa. Open-domain attribute-value acquisition from semi-structured texts. In Proceedings of the 6th International Semantic Web Conference (ISWC-07), Workshop on Text to Knowledge: The Lexicon/Ontology Interface (OntoLex-2007), pages 55--66, Busan, South Korea, 2007.

Cited By

Singh NGunjan VZurada JSingh NGunjan VZurada J(2022)Domain ModelingCognitive Tutor10.1007/978-981-19-5197-8_2(31-50)Online publication date: 18-Sep-2022
https://doi.org/10.1007/978-981-19-5197-8_2
Ren XHan J(2018)Mining Structures of Factual Knowledge from Text: An Effort-Light ApproachSynthesis Lectures on Data Mining and Knowledge Discovery10.2200/S00860ED1V01Y201806DMK01510:1(1-199)Online publication date: 22-Jun-2018
https://doi.org/10.2200/S00860ED1V01Y201806DMK015
Jiang MShang JCassidy TRen XKaplan LHanratty THan JMatwin SYu SFarooq F(2017)MetaPADProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098105(877-886)Online publication date: 13-Aug-2017
https://dl.acm.org/doi/10.1145/3097983.3098105
Show More Cited By

Index Terms

Using structured text for large-scale attribute extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

The role of documents vs. queries in extracting class attributes from text
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual information extraction. The differences are quantified ...
Extraction of open-domain class attributes from text: building blocks for faceted search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Knowledge automatically extracted from text captures instances, classes of instances and relations among them. In particular, the acquisition of class attributes (e.g., "top speed", "body style" and "number of cylinders" for the class of "sports cars") ...
Feature Extraction for Large-Scale Text Collections
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Feature engineering is a fundamental but poorly documented component in Learning-to-Rank (LTR) search engines. Such features are commonly used to construct learning models for web and product search engines, recommender systems, and question-answering ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
667
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh NGunjan VZurada JSingh NGunjan VZurada J(2022)Domain ModelingCognitive Tutor10.1007/978-981-19-5197-8_2(31-50)Online publication date: 18-Sep-2022
https://doi.org/10.1007/978-981-19-5197-8_2
Ren XHan J(2018)Mining Structures of Factual Knowledge from Text: An Effort-Light ApproachSynthesis Lectures on Data Mining and Knowledge Discovery10.2200/S00860ED1V01Y201806DMK01510:1(1-199)Online publication date: 22-Jun-2018
https://doi.org/10.2200/S00860ED1V01Y201806DMK015
Jiang MShang JCassidy TRen XKaplan LHanratty THan JMatwin SYu SFarooq F(2017)MetaPADProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098105(877-886)Online publication date: 13-Aug-2017
https://dl.acm.org/doi/10.1145/3097983.3098105
Jamoussi YNouira A(2017)An extracting model for constructing actions with improved part-of-speech tagging from social networking texts2017 11th International Conference on Intelligent Systems and Control (ISCO)10.1109/ISCO.2017.7855957(77-81)Online publication date: Jan-2017
https://doi.org/10.1109/ISCO.2017.7855957
Xu BXie CZhang YXiao YWang HWang W(2016)Learning defining features for categoriesProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061168(3924-3930)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3061053.3061168
Nouira AJamoussi YHajjami H(2016)Extracting Actions with Improved Part of Speech Tagging for Social Networking Texts2016 IEEE International Conference on Computer and Information Technology (CIT)10.1109/CIT.2016.109(161-166)Online publication date: Dec-2016
https://doi.org/10.1109/CIT.2016.109
Fauconnier JKamel MRothenburger BWainwright RCorchado JBechini AHong J(2015)A supervised machine learning approach for taxonomic relation recognition through non-linear enumerative structuresProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695988(423-425)Online publication date: 13-Apr-2015
https://dl.acm.org/doi/10.1145/2695664.2695988
Saint-Dizier P(2014)Advanced Question-Answering and Discourse SemanticsComputational Linguistics10.4018/978-1-4666-6042-7.ch028(598-616)Online publication date: 2014
https://doi.org/10.4018/978-1-4666-6042-7.ch028
Liu QWu DLiu YCheng X(2014)Extracting Attributes and Synonymous Attributes from Online EncyclopediasProceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 0110.1109/WI-IAT.2014.46(290-296)Online publication date: 11-Aug-2014
https://dl.acm.org/doi/10.1109/WI-IAT.2014.46
Saint-Dizier P(2013)Advanced Question-Answering and Discourse SemanticsEmerging Applications of Natural Language Processing10.4018/978-1-4666-2169-5.ch006(130-148)Online publication date: 2013
https://doi.org/10.4018/978-1-4666-2169-5.ch006
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten