Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hierarchical Concept-Driven Language Model

Published: 19 May 2021 Publication History

Abstract

For guiding natural language generation, many semantic-driven methods have been proposed. While clearly improving the performance of the end-to-end training task, these existing semantic-driven methods still have clear limitations: for example, (i) they only utilize shallow semantic signals (e.g., from topic models) with only a single stochastic hidden layer in their data generation process, which suffer easily from noise (especially adapted for short-text etc.) and lack of interpretation; (ii) they ignore the sentence order and document context, as they treat each document as a bag of sentences, and fail to capture the long-distance dependencies and global semantic meaning of a document. To overcome these problems, we propose a novel semantic-driven language modeling framework, which is a method to learn a Hierarchical Language Model and a Recurrent Conceptualization-enhanced Gamma Belief Network, simultaneously. For scalable inference, we develop the auto-encoding Variational Recurrent Inference, allowing efficient end-to-end training and simultaneously capturing global semantics from a text corpus. Especially, this article introduces concept information derived from high-quality lexical knowledge graph Probase, which leverages strong interpretability and anti-nose capability for the proposed model. Moreover, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence concept dependence. Experiments conducted on several NLP tasks validate the superiority of the proposed approach, which could effectively infer meaningful hierarchical concept structure of document and hierarchical multi-scale structures of sequences, even compared with latest state-of-the-art Transformer-based models.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Eprint Arxiv (2014).
[2]
Nikita Bhutani, H. V. Jagadish, and Dragomir R. Radev. 2016. Nested propositions in open information extraction. In EMNLP.
[3]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993–1022.
[4]
Paul Bloom. 2003. Glue for the mental world. Nature 421 (2003), 212–213.
[5]
Lihan Chen, Jiaqing Liang, Chenhao Xie, and Yanghua Xiao. 2018. Short text entity linking with fine-grained topics. In CIKM.
[6]
Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, and Jing jing Liu. 2020. Distilling knowledge learned in BERT for text generation. In ACL. 7893--7905.
[7]
Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP.
[8]
Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. 2016. Hierarchical multiscale recurrent neural networks. ArXiv abs/1609.01704 (2016).
[9]
Yulai Cong, Bo Chen, Hongwei Liu, and Mingyuan Zhou. 2017. Deep latent Dirichlet allocation with topic-layer-adaptive stochastic gradient Riemannian MCMC. In ICML. 864--873.
[10]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In ACL.
[11]
Michael J. Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In WMT@ACL.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.
[13]
Adji B. Dieng, Chong Wang, Jianfeng Gao, and John W. Paisley. 2016. TopicRNN: A recurrent neural network with long-range semantic dependency. ArXiv abs/1611.01702 (2016).
[14]
Ofer Egozi, Shaul Markovitch, and Evgeniy Gabrilovich. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems 29, 2 (2011), 8.
[15]
Carlos Flick. 2004. ROUGE: A package for automatic evaluation of summaries. In Workshop on Text Summarization Branches Out.
[16]
Sebastian Gehrmann, Yuntian Deng, and Alexander M. Rush. 2018. Bottom-up abstractive summarization. In EMNLP.
[17]
Dimitra Gkatzia and Saad Mahamood. 2015. A snapshot of NLG evaluation practices 2005–2014. In ENLG.
[18]
Li Gong, Josep Maria Crego, and Jean Senellart. 2019. Enhanced transformer model for data-to-text generation. In NGT@EMNLP-IJCNLP.
[19]
Alex Graves. 2013. Generating sequences with recurrent neural networks. ArXiv abs/1308.0850 (2013).
[20]
Thomas R. L. Griffiths and Mark Steyvers. 2004. Finding scientific topics.Proceedings of the National Academy of Sciences of the United States of America 101, Suppl 1 (2004), 5228–35.
[21]
Dandan Guo, Bo Chen, Ruiying Lu, and Mingyuan Zhou. 2019. Recurrent hierarchical topic-guided neural language models. ArXiv abs/1912.10337 (2019).
[22]
Dandan Guo, Bo Chen, Hao Zhang, and Mingyuan Zhou. 2018. Deep poisson gamma dynamical systems. In NeurIPS. 8451--8461.
[23]
Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. 2013. Scalable modified Kneser-Ney language model estimation. In ACL. 690–696.
[24]
Sepp Hochreiter and Jrgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[25]
Matthew D. Hoffman, David M. Blei, and Francis R. Bach. 2010. Online learning for latent Dirichlet allocation. In NIPS.
[26]
Wen Hua, Yangqiu Song, Haixun Wang, and Xiaofang Zhou. 2013. Identifying users’ topical tasks in web search. In WSDM. 93–102.
[27]
Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2015. Short text understanding through lexical-semantic analysis. In IEEE ICDE. 495–506.
[28]
Heyan Huang, Yashen Wang, Chong Feng, Zhirun Liu, and Qiang Zhou. 2018. Leveraging conceptualization for short-text embedding. IEEE Transactions on Knowledge and Data Engineering 30, 7 (2018), 1282–1295.
[29]
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the limits of language modeling. ArXiv abs/1602.02410 (2016).
[30]
Dongwoo Kim, Haixun Wang, and Alice Oh. 2013. Context-dependent conceptualization. In IJCAI. 2654–2661.
[31]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014).
[32]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. ArXiv abs/1909.11942 (2020).
[33]
Jey Han Lau, Timothy Baldwin, and Trevor Cohn. 2017. Topically driven neural language model. In ACL.
[34]
Remi Lebret, David Grangier, and Michael Auli. 2016. Neural text generation from structured data with application to the biography domain. In EMNLP.
[35]
Peipei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, and Xindong Wu. 2013. Computing term similarity by large probabilistic isa knowledge. In CIKM. ACM, 1401–1410.
[36]
Danyang Liu and Gongshen Liu. 2019. A transformer-based variational autoencoder for sentence generation. In IJCNN. 1–7.
[37]
Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table-to-text generation by structure-aware seq2seq learning. CoRR abs/1711.09724 (2018).
[38]
Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical word embeddings. In AAAI.
[39]
Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, and Shih-Fu Chang. 2018. Entity-aware image caption generation. In EMNLP.
[40]
Weixin Luo, Wen Liu, and Shenghua Gao. 2017. A revisit of sparse coding based anomaly detection in stacked RNN framework. In ICCV. 341–349.
[41]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In ACL.
[42]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In ACL.
[43]
Donald Metzler and W. Bruce Croft. 2007. Latent concept expansion using Markov random fields. In SIGIR.
[44]
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech. 1045–1048.
[45]
Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Vcernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH.
[46]
Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan vCernocky, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In ICASSP. 5528–5531.
[47]
Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In SLT. 234–239.
[48]
Gregory L. Murphy. 2002. The Big Book of Concepts. MIT Press
[49]
Deanna Needell, Nathan Srebro, and Rachel Ward. 2016. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Mathematical Programming 155, 1–2 (2016), 549–573.
[50]
John W. Paisley, Chong Wang, David M. Blei, and Michael I. Jordan. 2015. Nested hierarchical Dirichlet processes. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 2 (2015), 256–270.
[51]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In ACL.
[52]
Jin-woo Park, Seung-won Hwang, and Haixun Wang. 2016. Fine-grained semantic conceptualization of FrameNet. In AAAI. 2638–2644.
[53]
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. 2016. Self-critical sequence training for image captioning. In CVPR. 1179–1195.
[54]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In EMNLP.
[55]
Aaron Schein, Hanna M. Wallach, and Mingyuan Zhou. 2016. Poisson-gamma dynamical systems. In NIPS.
[56]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In ACL.
[57]
Lei Sha, Lili Mou, Tianyu Liu, Pascal Poupart, Sujian Li, Baobao Chang, and Zhifang Sui. 2018. Order-planning neural text generation from structured data. In AAAI. 5414--5421.
[58]
Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen. 2011. Short text conceptualization using a probabilistic knowledgebase. In IJCAI. 2330–2336.
[59]
Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen. 2011. Short text conceptualization using a probabilistic knowledgebase. In IJCAI. 2330–2336.
[60]
Yangqiu Song, Shusen Wang, and Haixun Wang. 2015. Open domain short text conceptualization: A generative + descriptive modeling approach. In IJCAI. 3820–3826.
[61]
Yangqiu Song, Shusen Wang, and Haixun Wang. 2015. Open domain short text conceptualization: A generative + descriptive modeling approach. In ICAI.
[62]
Akash Srivastava and Charles A. Sutton. 2017. Autoencoding variational inference for topic models. In ICLR.
[63]
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced representation through knowledge integration. ArXiv abs/1904.09223 (2019).
[64]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In NIPS. 3104–3112.
[65]
Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2006. Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 476 (2006), 1566–1581.
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
[67]
Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji Rong Wen. 2014. Concept-based short text classification and ranking. In CIKM. 1069–1078.
[68]
Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, and Qiang Du. 2018. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In IJCAI. 4453--4460.
[69]
Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, and Kevin Knight. 2018. Describing a knowledge base. In INLG.
[70]
Tian Wang and Kyunghyun Cho. 2016. Larger-context language modelling with recurrent neural network. In ACL.
[71]
Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, and Lawrence Carin. 2017. Topic compositional neural language model. In AISTATS.
[72]
Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, and Lawrence Carin. 2019. Topic-guided variational autoencoders for text generation. In NAACL-HLT.
[73]
Yashen Wang, Heyan Huang, and Chong Feng. 2017. Query expansion based on a feedback concept model for microblog retrieval. In WWW. 559–568.
[74]
Yashen Wang, Heyan Huang, Chong Feng, Qiang Zhou, Jiahui Gu, and Xiong Gao. 2016. CSE: Conceptual sentence embeddings based on attention model. In ACL. 505–515.
[75]
Yashen Wang, Yifeng Liu, Huanhuan Zhang, and Haiyong Xie. 2019. Leveraging lexical semantic information for learning concept-based multiple embedding representations for knowledge graph completion. In APWeb/WAIM.
[76]
Yashen Wang, Huanhuan Zhang, Yifeng Liu, and Haiyong Xie. 2019. KG-to-text generation with slot-attention and link-attention. In NLPCC.
[77]
Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji Rong Wen. 2015. Query understanding through knowledge-based conceptualization. In IJCAI. 3264–3270.
[78]
Sam Wiseman, Stuart Shieber, and Alexander Rush. 2017. Challenges in data-to-document generation. In EMNLP.
[79]
Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In ACL.
[80]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD.
[81]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. 481–492.
[82]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. 481–492.
[83]
Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. 2007. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research 8, 2 (2007), 35–63.
[84]
Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2016. Understanding short texts through semantic enrichment and hashing. IEEE Transactions on Knowledge and Data Engineering 28, 2 (2016), 566–579.
[85]
Hao Zhang, Bo Chen, Dandan Guo, and Mingyuan Zhou. 2018. WHAI: Weibull hybrid autoencoding inference for deep topic modeling. In ICLR.
[86]
He Zhao, Lan Du, Wray L. Buntine, and Mingyuan Zhou. 2018. Dirichlet belief networks for topic structure learning. In NeurIPS.
[87]
Mingyuan Zhou, Yulai Cong, and Bo Chen. 2016. Augmentable gamma belief networks. Journal of Machine Learning Research 17, 163 (2016), 163:1–163:44.
[88]
Qingyu Zhou, Nan Yang, Furu Wei, and Ming Zhou. 2017. Selective encoding for abstractive sentence summarization. In ACL.
[89]
Jie Zhu, Junhui Li, Muhua Zhu, Longhua Qian, Min Zhang, and Guodong Zhou. 2019. Modeling graph structure in transformer for better AMR-to-text generation. In EMNLP/IJCNLP.
[90]
Qile Zhu, Zheng Feng, and Xiaolin Li. 2018. GraphBTM: Graph enhanced autoencoded variational inference for biterm topic model. In EMNLP.
[91]
Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: A benchmarking platform for text generation models. In SIGIR.

Cited By

View all
  • (2023)Patent Phrase to Phrase Matching Based on BertBCP Business & Management10.54691/bcpbm.v38i.383238(1100-1107)Online publication date: 2-Mar-2023
  • (2023)MEGA: Meta-Graph Augmented Pre-Training Model for Knowledge Graph CompletionACM Transactions on Knowledge Discovery from Data10.1145/361737918:1(1-24)Online publication date: 16-Oct-2023
  • (2023)Resisting the Edge-Type Disturbance for Link Prediction in Heterogeneous NetworksACM Transactions on Knowledge Discovery from Data10.1145/361409918:2(1-24)Online publication date: 13-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 6
June 2021
474 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3465438
  • Editor:
  • Charu Aggarwal
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 May 2021
Accepted: 01 February 2021
Revised: 01 December 2020
Received: 01 June 2020
Published in TKDD Volume 15, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Language modeling
  2. text generation
  3. concept semantic information
  4. interpretation
  5. recurrent conceptualization-enhanced gamma belief network
  6. hierarchical language modeling
  7. representation learning

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • New Generation of Artificial Intelligence Special Action Project
  • National Key Research and Development Project
  • National Integrated Big Data Center Pilot Project
  • Joint Advanced Research Foundation of China Electronics Technology Group Corporation (CETC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)4
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Patent Phrase to Phrase Matching Based on BertBCP Business & Management10.54691/bcpbm.v38i.383238(1100-1107)Online publication date: 2-Mar-2023
  • (2023)MEGA: Meta-Graph Augmented Pre-Training Model for Knowledge Graph CompletionACM Transactions on Knowledge Discovery from Data10.1145/361737918:1(1-24)Online publication date: 16-Oct-2023
  • (2023)Resisting the Edge-Type Disturbance for Link Prediction in Heterogeneous NetworksACM Transactions on Knowledge Discovery from Data10.1145/361409918:2(1-24)Online publication date: 13-Nov-2023
  • (2023)TDAN: Transferable Domain Adversarial Network for Link Prediction in Heterogeneous Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/361022918:1(1-22)Online publication date: 6-Sep-2023
  • (2023)Community-Based Influence Maximization Using Network Embedding in Dynamic Heterogeneous Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/359454417:8(1-21)Online publication date: 28-Jun-2023
  • (2023)Characterizing and Forecasting Urban Vibrancy Evolution: A Multi-View Graph Mining PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/356868317:5(1-24)Online publication date: 28-Feb-2023
  • (2023)Multi-Concept Representation Learning for Knowledge Graph CompletionACM Transactions on Knowledge Discovery from Data10.1145/353301717:1(1-19)Online publication date: 20-Feb-2023
  • (2023)Learnable Multi-View Matrix Factorization With Graph Embedding and Flexible LossIEEE Transactions on Multimedia10.1109/TMM.2022.315799725(3259-3272)Online publication date: 1-Jan-2023
  • (2023)A Multi-Type Transferable Method for Missing Link Prediction in Heterogeneous Social NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323348135:11(10981-10991)Online publication date: 2-Jan-2023
  • (2023)RETRACTED ARTICLE: Spatial-temporal deep learning model based rumor source identification in social networksJournal of Combinatorial Optimization10.1007/s10878-023-01018-545:3Online publication date: 24-Mar-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media