Abstract
Citations play a fundamental role in supporting authors’ contribution claims throughout a scientific paper. Labelling citation instances with different function labels is indispensable for understanding a scientific text. A single citation is the linkage between two scientific papers in the citation network. These citations encompass rich native information, including context of the citation, citation location, citing and cited paper titles, DOI, and the website’s URL. Nevertheless, previous studies have ignored such rich native information during the process of datasets’ accumulation, thereby resulting in a lack of comprehensive yet significantly valuable features for the citation function classification task. In this paper, we argue that such important information should not be ignored, and accordingly, we extract and integrate all of the native information features into different neural text representation models via trainable embeddings and free text. We first construct a new dataset entitled, NI-Cite, comprising a large number of labelled citations with five key native features (Citation Context, Section Name, Title, DOI, Web URL) against each dataset instance. In addition, we propose to exploit the recently developed text representation models integrated with such information to evaluate the performance of citation function classification task. The experimental results demonstrate that the native information features suggested in this paper enhance the overall classification performance.
Similar content being viewed by others
Change history
18 July 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11192-022-04451-1
Notes
References
Abu-Jbara, A., & Radev, D. (2012). Reference scope identification in citing sentences. In Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 80–90).
Agarwal, S., Choubey, L., & Yu, H. (2010). Automatically classifying the role of citations in biomedical articles. In Proceedings of American Medical Informatics Association fall symposium (pp. 11–15).
Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. In Proceedings of the 54th annual meeting of the Association for Computational Linguistics (pp. 715–725).
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. Retrieved from arXiv:1903.10676
Bertin, M., Atanassova, I., Gingras, Y., & Lariviere, V. (2016). The invariant distribution of references in scientific articles. Journal of the American Society for Information Science and Technology, 67(1), 164–177.
Bornmann, L., & Daniel, H. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Cohan, A., Ammar, W., van Zuylen, M., & Cady, F. (2019). Structural scaffolds for citation intent classification in scientific publications. In Proceedings of 2019 conference of the North American Chapter of the Association for Computational Linguistics (pp. 3586–3596).
Cohan, A., & Goharian, N. (2018). Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, 19(2), 287–303.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Dong, C., & Schafer, U. (2011). Ensemble-style self-training on citation classification. In Proceedings of the 5th international joint conference on natural language processing (pp. 623–631).
Färber, M., & Jatowt, A. (2020). Citation recommendation: Approaches and datasets. International Journal on Digital Libraries, 21(1), 375–405.
Garfield, E. (1965). Can citation indexing be automated? In M. E. Stevens, V. E. Giuliano, & L. B. Heilprin (Eds.), Statistical association methods for mechanical documentation. National Bureau of Standards.
Garzone, M., & Mercer, R. E. (2000). Towards an automated citation classifier. In Proceedings the conference of the Canadian society for computational studies of intelligence (pp. 337–346). Springer.
Hassan, S., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. In Proceedings of 2017 ACM/IEEE joint conference on digital libraries (pp. 1–8).
Hernández-Alvarez, M., & Gomez, M. J. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.
Jochim, C., & Schiitz, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of the 2012 international conference on computational linguistics (pp. 1343–1358).
Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(27), 1–54. https://doi.org/10.1186/s40537-019-0192-5.
Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., & Levy, O. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8, 64–77.
Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frame. Transactions of the Association for Computational Linguistics, 6, 391–406.
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1746–1751).
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of twenty-ninth AAAI conference on artificial intelligence (pp. 2267–2273).
Lauscher, A., Ko, B., Kuehl, B., Johnson, S., Jurgens, D., Cohan, A., & Lo, K. (2021). MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting. arXiv preprint arXiv:2107.00414
Moed, H. F. (2006). Citation analysis in research evaluation (Vol. 9). Springer.
Moravcsik, M. J., & Murugesan, P. (1975). Some results of the function and quality of citations. Social Studies of Science, 5(1), 86–92.
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity (pp. 334–337). Computer Horizons.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1532–1543).
Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is multilingual BERT? In Proceedings of the 57th annual meeting of the Association for Computational Linguistics (pp. 4996–5001).
Pride, D., & Knoth, P. (2017). Incidental or influential?—Challenges in automatically detecting citation importance using publication full texts. In Research and advanced technology for digital libraries (pp. 572–578). https://doi.org/10.1007/978-3-319-67008-9_48
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., & Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683
Roman, M., Shahid, A., Khan, S., Koubaa, A., & Yu, L. (2021). Citation intent classification using word embedding. IEEE Access, 9, 9982–9995.
Safder, I., Hassan, S. U., Visvizi, A., Noraset, T., Nawaz, R., & Tuarob, S. (2020). Deep learning-based extraction of algorithmic metadata in full-text scholarly documents. Information Processing & Management, 57, 102269.
Smith, L. C. (1981). Citation analysis. Library Trends, 30(1), 83–106.
Taylor, W. L. (1953). Cloze procedure: A new tool for measuring readability. Journalism Quarterly, 30, 415–433.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). An annotation scheme for citation function. In Proceedings of the 7th SIGdial workshop on discourse and dialogue (pp. 80–87).
Teufel, S., Siddharthan, A., & Tidhar, D. (2019). Automatic classification of citation function. In Proceedings of 2006 conference on empirical methods in natural language processing (pp. 103–110).
Tuarob, S., Kang, S. W., Wettayakorn, P., Pornprasit, C., Sachati, T., Hassan, S. U., & Haddawy, P. (2019). Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering, 32(10), 1881–1896.
Tuarob, S., Mitra, P., & Giles, C. L. (2013). A classification scheme for algorithm citation function in scholarly works. In Proceedings of the 13th ACM/IEEE-CS joint conference on digital libraries (pp. 367–368).
Tuarob, S., Mitra, P., & Giles, L. C. (2015). A hybrid approach to discover semantic hierarchical sections in scholarly documents. In Proceedings of the 13th international conference on document analysis and recognition (pp. 1081–1085).
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. In Proceedings of AAAI workshop: Scholarly big data (pp. 13–18).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference advances in neural information processing systems (pp. 5998–6008).
Wang, Y., Johnson, M., Wan, S., Sun, Y., & Wang, W. (2019). How to best use syntax in semantic role labelling. In Proceedings of the 57th annual meeting of the Association for Computational Linguistics (pp. 5338–5343).
Weinstock, M. (1971). Citation indexes. In M. Drake (Ed.), Encyclopedia of library and information science (Vol. 5). Dekker.
Yan, J. (2009). Text representation. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of database systems (pp. 3069–3072). Springer.
Yousif, A., Niu, Z., Tarus, J. K., & Ahmad, A. (2019). A survey on sentiment analysis of scientific citations. Artificial Intelligence Review, 52(1), 1805–1838. https://doi.org/10.1007/s10462-017-9597-8.
Zhang, Y., Wang, Y., Sheng, Q. Z., Mahmood, A., Emma Zhang, W., & Zhao, R. (2021). TDM-CFC: Towards document-level multi-label citation function classification. In Proceedings of international conference on web information systems engineering (pp. 363–376).
Zhao, H., Luo, Z., Feng, C., Zheng, A., & Liu, X. (2019). A context-based framework for modeling the role and function of on-line resource citations in scientific literature. In Proceedings of 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (pp. 5209–5218).
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2014). Measuring academic influence. Journal of the Association for Information Science and Technology, 66, 408–427.
Acknowledgements
This research is funded by Australian Research Council (ARC) Discovery Project DP200102298 and the National Social Science Fund of China (No. 18ZDA325).
Author information
Authors and Affiliations
Corresponding author
Additional information
The original online version of this article was revised: In the original version the first affiliation was incorrectly linked to the author name, Adnan Mahmood.
Rights and permissions
About this article
Cite this article
Zhang, Y., Zhao, R., Wang, Y. et al. Towards employing native information in citation function classification. Scientometrics 127, 6557–6577 (2022). https://doi.org/10.1007/s11192-021-04242-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-04242-0