Article

Free access

Incorporating non-local information into information extraction systems by Gibbs sampling

Authors:

Jenny Rose Finkel,

Trond Grenager,

Christopher ManningAuthors Info & Claims

ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Pages 363 - 370

https://doi.org/10.3115/1219840.1219885

Published: 25 June 2005 Publication History

Abstract

Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

References

[1]

S. Abney. 1997. Stochastic attribute-value grammars. Computational Linguistics, 23:597--618.

Digital Library

[2]

C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. 2003. An introduction to MCMC for machine learning. Machine Learning, 50:5--43.

[3]

A. Borthwick. 1999. A Maximum Entropy Approach to Named Entity Recognition. Ph.D. thesis, New York University.

Digital Library

[4]

R. Bunescu and R. J. Mooney. 2004. Collective information extraction with relational Markov networks. In Proceedings of the 42nd ACL, pages 439--446.

Digital Library

[5]

H. L. Chieu and H. T. Ng. 2002. Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th Coling, pages 190--196.

Digital Library

[6]

R. G. Cowell, A. Philip Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag, New York.

Digital Library

[7]

J. R. Curran and S. Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the 7th CoNLL, pages 164--167.

Digital Library

[8]

S. Della Pietra, V. Della Pietra, and J. Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380--393.

Digital Library

[9]

J. Finkel, S. Dingare, H. Nguyen, M. Nissim, and C. D. Manning. 2004. Exploiting context for biomedical entity recognition: from syntax to the web. In Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling 2004.

Digital Library

[10]

D. Freitag and A. McCallum. 1999. Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.

Digital Library

[11]

D. Freitag. 1998. Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University.

Digital Library

[12]

S. Geman and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transitions on Pattern Analysis and Machine Intelligence, 6:721--741.

Digital Library

[13]

M. Kim, Y. S. Han, and K. Choi. 1995. Collocation map for overcoming data sparseness. In Proceedings of the 7th EACL, pages 53--59.

Digital Library

[14]

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science, 220:671--680.

[15]

P. J. Van Laarhoven and E. H. L. Arts. 1987. Simulated Annealing: Theory and Applications. Reidel Publishers.

[16]

J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th ICML, pages 282--289. Morgan Kaufmann, San Francisco, CA.

Digital Library

[17]

T. R. Leek. 1997. Information extraction using hidden Markov models. Master's thesis, U.C. San Diego.

[18]

R. Malouf. 2002. Markov models for language-independent named entity recognition. In Proceedings of the 6th CoNLL, pages 187--190.

Digital Library

[19]

A. Mikheev, M. Moens, and C. Grover. 1999. Named entity recognition without gazetteers. In Proceedings of the 9th EACL, pages 1--8.

Digital Library

[20]

L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286.

[21]

C. Sutton and A. McCallum. 2004. Collective segmentation and labeling of distant entities in information extraction. In ICML Workshop on Statistical Relational Learning and Its connections to Other Fields.

[22]

B. Taskar, P. Abbeel, and D. Koller. 2002. Discriminative probabilistic models for relational data. In Proceedings of the 18th Conference on Uncertianty in Artificial Intelligence (UAI-02), pages 485--494, Edmonton, Canada.

Digital Library

Cited By

Rajabiyazdi FRamesh SLangstone BKulik DPontalba J(2024)TextVista: NLP-Enriched Time-Series Text Data VisualizationsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670971(1-12)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670971
Sharma MGogineni ARamakrishnan N(2024)Neural Methods for Data-to-text GenerationACM Transactions on Intelligent Systems and Technology10.1145/366063915:5(1-46)Online publication date: 8-May-2024
https://dl.acm.org/doi/10.1145/3660639
Mai GHuang WSun JSong SMishra DLiu NGao SLiu TCong GHu YCundy CLi ZZhu RLao N(2024)On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/365307010:2(1-46)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3653070
Show More Cited By

Incorporating non-local information into information extraction systems by Gibbs sampling
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic reasoning algorithms

Recommendations

Online but accurate inference for latent variable models with local Gibbs sampling

We study parameter inference in large-scale latent variable models. We first propose a unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed ...
Likelihood-free approximate Gibbs sampling
Abstract
Likelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional ...
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

June 2005

657 pages

General Chair:
Kevin Knight
University of Southern California

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 25 June 2005

Qualifiers

Article

Acceptance Rates

ACL '05 Paper Acceptance Rate 77 of 423 submissions, 18%;

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

550
Total Citations
View Citations
3,974
Total Downloads

Downloads (Last 12 months)142
Downloads (Last 6 weeks)18

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rajabiyazdi FRamesh SLangstone BKulik DPontalba J(2024)TextVista: NLP-Enriched Time-Series Text Data VisualizationsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670971(1-12)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670971
Sharma MGogineni ARamakrishnan N(2024)Neural Methods for Data-to-text GenerationACM Transactions on Intelligent Systems and Technology10.1145/366063915:5(1-46)Online publication date: 8-May-2024
https://dl.acm.org/doi/10.1145/3660639
Mai GHuang WSun JSong SMishra DLiu NGao SLiu TCong GHu YCundy CLi ZZhu RLao N(2024)On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/365307010:2(1-46)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3653070
Liu YWu TWang JWang JZhuang SVallina-Rodríguez NSuarez-Tángil GLevin DPelsser C(2024)Collecting Self-reported Semantics of BGP Communities and Investigating Their Consistency with Real-world UsageProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688414(314-327)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3646547.3688414
Yan YLee JSerra ESpezzano F(2024)GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language UnderstandingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679934(4163-4167)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679934
Kim DChoi SKim JSetlur VAgrawala M(2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327150
Sharma PSamal ASoh LJoshi D(2023)A Spatially-Aware Data-Driven Approach to Automatically Geocoding Non-Gazetteer Place NamesACM Transactions on Spatial Algorithms and Systems10.1145/362798710:1(1-34)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3627987
Hu XZhou ZLi HHu YGu FKersten JFan HKlan F(2023)Location Reference Recognition from Texts: A Survey and ComparisonACM Computing Surveys10.1145/362581956:5(1-37)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3625819
Zhong LWu JLi QPeng HWu X(2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
https://dl.acm.org/doi/10.1145/3618295
Clausner CPletschacher SAntonacopoulos A(2023)NAME – A Rich XML Format for Named Entity and Relation TaggingProceedings of the 7th International Workshop on Historical Document Imaging and Processing10.1145/3604951.3605521(91-96)Online publication date: 25-Aug-2023
https://dl.acm.org/doi/10.1145/3604951.3605521
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents