Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Transfer joint embedding for cross-domain named entity recognition

Published: 17 May 2013 Publication History

Abstract

Named Entity Recognition (NER) is a fundamental task in information extraction from unstructured text. Most previous machine-learning-based NER systems are domain-specific, which implies that they may only perform well on some specific domains (e.g., Newswire) but tend to adapt poorly to other related but different domains (e.g., Weblog). Recently, transfer learning techniques have been proposed to NER. However, most transfer learning approaches to NER are developed for binary classification, while NER is a multiclass classification problem in nature. Therefore, one has to first reduce the NER task to multiple binary classification tasks and solve them independently. In this article, we propose a new transfer learning method, named Transfer Joint Embedding (TJE), for cross-domain multiclass classification, which can fully exploit the relationships between classes (labels), and reduce domain difference in data distributions for transfer learning. More specifically, we aim to embed both labels (outputs) and high-dimensional features (inputs) from different domains (e.g., a source domain and a target domain) into a unified low-dimensional latent space, where 1) each label is represented by a prototype and the intrinsic relationships between labels can be measured by Euclidean distance; 2) the distance in data distributions between the source and target domains can be reduced; 3) the source domain labeled data are closer to their corresponding label-prototypes than others. After the latent space is learned, classification on the target domain data can be done with the simple nearest neighbor rule in the latent space. Furthermore, in order to scale up TJE, we propose an efficient algorithm based on stochastic gradient descent (SGD). Finally, we apply the proposed TJE method for NER across different domains on the ACE 2005 dataset, which is a benchmark in Natural Language Processing (NLP). Experimental results demonstrate the effectiveness of TJE and show that TJE can outperform state-of-the-art transfer learning approaches to NER.

References

[1]
Aone, C., Halverson, L., Hampton, T., and Ramos-Santacruz, M. 1998. SRA: Description of the IE2 system used for MUC-7. In Proceedings of the 7th Message Understanding Conference.
[2]
Belkin, M. and Niyogi, P. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 6, 1373--1396.
[3]
Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. 2007. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge, MA, 137--144.
[4]
Bengio, S., Weston, J., and Grangier, D. 2010. Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, vol. 23, 163--171.
[5]
Blitzer, J., Dredze, M., and Pereira, F. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. ACL, 432--439.
[6]
Blitzer, J., McDonald, R., and Pereira, F. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language. 120--128.
[7]
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Resear. 16, 321--357.
[8]
Chen, B., Lam, W., Tsang, I. W., and Wong, T.-L. 2009. Extracting discriminative concepts for domain adaptation in text mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 79--188.
[9]
Ciaramita, M. and Altun, Y. 2005. Named-entity recognition in novel domains with external lexical knowledge. In Proceedings of the NIPS Workshop on Advances in Strucured Learning for Text and Speech Processing.
[10]
Cox, T. and Cox, M. 1994. Multidimensional Scaling. Chapman & Hall, London.
[11]
Crammer, K. and Singer, Y. 2002. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Resear. 2, 265--292.
[12]
Daumé III, H. 2007. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. ACL, 256--263.
[13]
Dumais, S. and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 256--263.
[14]
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Resear. 9, 1871--1874.
[15]
Finkel, J. R., Grenager, T., and Manning, C. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. ACL, 363--370.
[16]
Finkel, J. R. and Manning, C. D. 2009. Nested named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 141--150.
[17]
Glorot, X., Bordes, A., and Bengio, Y. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning. 513--520.
[18]
Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., and Smola, A. 2007. A kernel method for the two-sample problem. In Proceedings of the Annual Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 513--520.
[19]
Gretton, A., Bousquet, O., Smola, A. J., and Schölkopf, B. 2005. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the 18th International Conference on Algorithmic Learning Theory.
[20]
Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunningham, H., and Wilks, Y. 1998. Description of the University of Sheffield LaSIE-II system as used for MUC-7. In Proceedings of the 7th Message Understanding Conference.
[21]
Isozaki, H. and Kazawa, H. 2002. Efficient support vector classifiers for named entity recognition. In Proceedings of the 19th International Conference on Computational Linguistics. ACL, 1--7.
[22]
Jiang, J. and Zhai, C. 2006. Exploiting domain structure for named entity recognition. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. ACL, 74--81.
[23]
Jiang, J. and Zhai, C. 2007. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. ACL, 264--271.
[24]
Krupka, G. R. and Hausman, K. 1998. Isoquest inc.: Description of the NetOwlTM extractor system as used for MUC-7. In Proceedings of 7th Message Understanding Conference.
[25]
Kwok, C., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the web. ACM Trans. Inf. Syst. 19, 242--262.
[26]
Manning, C. D., Raghavan, P., and Schtze, H. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY.
[27]
Mikheev, A., Grover, C., and Moens, M. 1998. Description of the LTG system used for MUC-7. In Proceedings of the 7th Message Understanding Conference.
[28]
Mikheev, A., Grover, C., and Moens, M. 1999. Named entity recognition without gazetteers. In Proceedings of the 19th International Conference of the European Chapter of the Association for Computational Linguistics. 1--8.
[29]
Nadeau, D. and Sekine, S. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1, 3--26.
[30]
Pan, S. J., Kwok, J. T., and Yang, Q. 2008. Transfer learning via dimensionality reduction. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 677--682.
[31]
Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., and Chen, Z. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web. ACM, 751--760.
[32]
Pan, S. J., Tsang, I. W., Kwok, J. T., and Yang, Q. 2011. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks 22, 199--210.
[33]
Pan, S. J. and Yang, Q. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10, 1345--1359.
[34]
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D. 2009. Dataset Shift in Machine Learning. MIT Press.
[35]
Rabiner, L. R. and Juang, B. H. 1986. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 1, 4--16.
[36]
Schumaker, R. P. and Chen, H. 2009. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Trans. Inf. Syst. 27, 12:1--12:19.
[37]
Sekine, S., Sudo, K., and Nobata, C. 2002. Extended named entity hierarchy. In Proceedings of the 3rd International Conference on Language Resources and Evaluation. 1818--1824.
[38]
Smola, A. J., Gretton, A., Song, L., and Schölkopf, B. 2007. A Hilbert space embedding for distributions. In Proceedings of the 18th International Conference on Algorithmic Learning Theory. 13--31.
[39]
Song, L. 2007. Learning via Hilbert space embedding of distributions. Ph.D. thesis, University of Sydney.
[40]
Weinberger, K. Q. and Chapelle, O. 2009. Large margin taxonomy embedding for document categorization. In Advances in Neural Information Processing Systems, vol. 21, 1737--1744.
[41]
Whitelaw, C., Kehlenbeck, A., Petrovic, N., and Ungar, L. 2008. Web-scale named entity recognition. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 123--132.
[42]
Wu, D., Lee, W. S., Ye, N., and Chieu, H. L. 2009. Domain adaptive bootstrapping for named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 1523--1532.
[43]
Zhang, T. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine Learning. ACM, 116--123.
[44]
Zhou, G. and Su, J. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. ACL, 473--480.

Cited By

View all
  • (2024)Dual Contrastive Learning for Cross-Domain Named Entity RecognitionACM Transactions on Information Systems10.1145/367887942:6(1-33)Online publication date: 18-Oct-2024
  • (2024)Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher DistillationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.344902932(4617-4630)Online publication date: 1-Jan-2024
  • (2024)Deep learning for named entity recognition: a surveyNeural Computing and Applications10.1007/s00521-024-09646-636:16(8995-9022)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 31, Issue 2
May 2013
180 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2457465
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2013
Accepted: 01 January 2013
Revised: 01 July 2012
Received: 01 October 2011
Published in TOIS Volume 31, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Named entity recognition
  2. multiclass classification
  3. transfer learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dual Contrastive Learning for Cross-Domain Named Entity RecognitionACM Transactions on Information Systems10.1145/367887942:6(1-33)Online publication date: 18-Oct-2024
  • (2024)Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher DistillationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.344902932(4617-4630)Online publication date: 1-Jan-2024
  • (2024)Deep learning for named entity recognition: a surveyNeural Computing and Applications10.1007/s00521-024-09646-636:16(8995-9022)Online publication date: 1-Jun-2024
  • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
  • (2023)Domain-Invariant Feature Progressive Distillation with Adversarial Adaptive Augmentation for Low-Resource Cross-Domain NERACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357050222:3(1-21)Online publication date: 14-Apr-2023
  • (2023)Learning Implicit and Explicit Multi-task Interactions for Information ExtractionACM Transactions on Information Systems10.1145/353302041:2(1-29)Online publication date: 8-Apr-2023
  • (2022)MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation ExtractionACM Transactions on Information Systems10.1145/350391740:4(1-32)Online publication date: 11-Jan-2022
  • (2022)Dealing With Hierarchical Types and Label Noise in Fine-Grained Entity TypingIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.315528130(1305-1318)Online publication date: 2022
  • (2022)VoCSK: Verb-oriented commonsense knowledge mining with taxonomy-guided inductionArtificial Intelligence10.1016/j.artint.2022.103744310(103744)Online publication date: Sep-2022
  • (2022)Named entity recognition (NER) for Chinese agricultural diseases and pests based on discourse topic and attention mechanismEvolutionary Intelligence10.1007/s12065-022-00727-w17:1(457-466)Online publication date: 28-May-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media