2d conditional random fields for web information extraction

J Zhu, Z Nie, JR Wen, B Zhang, WY Ma - Proceedings of the 22nd …, 2005 - dl.acm.org
J Zhu, Z Nie, JR Wen, B Zhang, WY Ma
Proceedings of the 22nd international conference on Machine learning, 2005dl.acm.org
The Web contains an abundance of useful semistructured information about real world
objects, and our empirical study shows that strong sequence characteristics exist for Web
information about objects of the same type across different Web sites. Conditional Random
Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do
better labeling. However, as the information on a Web page is two-dimensionally laid out,
previous linear-chain CRFs have their limitations for Web information extraction. To better …
The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do better labeling. However, as the information on a Web page is two-dimensionally laid out, previous linear-chain CRFs have their limitations for Web information extraction. To better incorporate the two-dimensional neighborhood interactions, this paper presents a two-dimensional CRF model to automatically extract object information from the Web. We empirically compare the proposed model with existing linear-chain CRF models for product information extraction, and the results show the effectiveness of our model.
ACM Digital Library