Abstract
Multi-label document classification is a typical challenge in many real-world applications. Multi-label ranking is a common approach, while existing studies usually disregard the effects of context and the relationships among labels during the scoring process. In this paper, we propose an Long Short Term Memory (LSTM)-based multi-label ranking model for document classification, namely LSTM\(^2\) consisting of repLSTM—an adaptive data representation process and rankLSTM—a unified learning-ranking process. In repLSTM, the supervised LSTM is used to learn document representation by incorporating the document labels. In rankLSTM, the order of the documents labels is rearranged in accordance with a semantic tree, in which the semantics are compatible with and appropriate to the sequential learning of LSTM. The model can be wholly trained by sequentially predicting labels. Connectionist Temporal Classification is performed in rankLSTM to address the error propagation for a variable number of labels in each document. Moreover, a variety of experiments with document classification conducted on three typical datasets reveal the impressive performance of our proposed approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Barutcuoglu Z, Schapire RE, Troyanskaya OG (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22(7):830–836
Blei DM, Ng AY, Jordan MI (2001) Latent dirichlet allocation. In: Advances in neural information processing systems, pp 601–608
Blockeel H, De Raedt L, Ramon J (2000) Top-down induction of clustering trees. arXiv:cs/0011032
Bucak SS, Mallapragada PK, Jin R, Jain AK (2009) Efficient multi-label ranking for multi-class learning: application to object recognition. In: 2009 IEEE 12th international conference on Computer vision, IEEE, pp 2098–2105
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Chen J, Chaudhari NS (2005) Protein secondary structure prediction with bidirectional lstm networks. In: International joint conference on neural networks: post-conference workshop on computational intelligence approaches for the analysis of bio-data (CI-BIO), August 2005
Chiang TH, Lo HY, Lin SD (2012) A ranking-based knn approach for multi-label classification. ACML 25:81–96
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45
Dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of the 25th international conference on computational linguistics (COLING), Dublin, Ireland
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Elsas JL, Donmez P, Callan J, Carbonell JG (2009) Pairwise document classification for relevance feedback. Technical report, DTIC Document
Gharroudi O, Elghazel H, Aussem A (2015) Ensemble multi-label classification: a comparative study on threshold selection and voting methods. In: 2015 IEEE 27th international conference on Tools with artificial intelligence (ICTAI), IEEE, pp 377–384
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):52
Graves A, Daojian, Liu K, Lai S, Zhou G, Zhao J (2012) Supervised sequence labelling with recurrent neural networks, vol 385. Springer, Berlin
Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on Acoustics, speech and signal processing (ICASSP), IEEE, pp 6645–6649
Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artifi Intell 172(16):1897–1916
Ioannou M, Sakkas G, Tsoumakas G, Vlahavas I (2010) Obtaining bipartitions from score vectors for multilabel classification. In: 2010, 22nd IEEE international conference on tools with artificial intelligence, vol. 1, IEEE, pp 409–416
Jordan A (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv Neural Inf Process Syst 14:841
Karpathy A, Fei-Fei L (2014) Deep visual-semantic alignments for generating image descriptions. arXiv preprint arXiv:1412.2306
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: 29th AAAI conference on artificial intelligence
Li J, Chen X, Hovy E, Jurafsky D (2015) Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9):3084–3104
Mencia EL, Fürnkranz J (2008) Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: Machine learning and knowledge discovery in databases, Springer, pp 50–65
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH 2010, 11th annual conference of the international speech communication association, Makuhari, Chiba, Japan, 26–30 September 2010, pp 1045–1048
Mikolov T, Yih Wt, Zweig G (2013) Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol 13, pp 746–751
Padhye A (2006) Comparing supervised and unsupervised classification of messages in the enron email corpus. Ph.D. thesis, University of Minnesota
Petterson J, Caetano TS (2010) Reverse multi-label learning. In: Advances in neural information processing systems, pp 1912–1920
Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using lstms. arXiv preprint arXiv:1502.04681
Srivastava N, Salakhutdinov RR, Hinton GE (2013) Modeling documents with deep boltzmann machines. arXiv preprint arXiv:1309.6865
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings ECML/PKDD 2008 workshop on mining multidimensional data (MMD08), pp 30–44
Vembu S, Gärtner T (2011) Label ranking algorithms: a survey. In: Preference learning, Springer, Berlin, pp 45–64
Xue X, Zhang W, Zhang J, Wu B, Fan J, Lu Y (2011) Correlative multi-label multi-instance image annotation. In: 2011 IEEE international conference on Computer vision (ICCV), IEEE, pp 651–658
Yepes AJ, MacKinlay A, Bedo J, Garnavi R, Chen Q (2014) Deep belief networks and biomedical text categorisation. In: Australasian language technology association workshop, p 123
Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classification via convolutional deep neural network. In: Proceedings of COLING, pp 2335–2344
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Zhu X, Sobihani P, Guo H (2015) Long short-term memory over recursive structures. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1604–1612
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yan, Y., Wang, Y., Gao, WC. et al. LSTM\(^{2}\): Multi-Label Ranking for Document Classification. Neural Process Lett 47, 117–138 (2018). https://doi.org/10.1007/s11063-017-9636-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9636-0