Abstract
BERT produces state-of-the-art solutions for many natural language processing tasks at the cost of interpretability. As works discuss the value of BERT’s attention weights to this purpose, we contribute to the field by examining this issue in the context of stance classification. We propose an interpretability framework to identify the most influential words for correctly predicting stances using BERT models. Unlike related work, we develop a broader level of interpretability focused on the overall model behaviour, aggregating tokens’ attentions into words’ attention weights that can be semantically related to the domain and proposing metrics to measure words relevance in correct predictions. We developed a broad experimental setting to analyse the premises underlying our framework regarding word attention scores and the capability concerning interpretability, adopting three case studies of stances expressed on Twitter on issues about the pandemic, and four pre-trained BERT models. We concluded that our method is not affected by the characteristics of BERT-models vocabularies, that words with high absolute attention have a higher probability of positive influence on correct classification, and that the influential words represent the domains. We observed many common words compared to a baseline method, but the words yielded by our method were considered more relevant according to a qualitative assessment.
Similar content being viewed by others
Availability of supporting data
Code and data available (according to Twitter policies) at https://github.com/cacsaenz/attention-based-interpretability.
Notes
If a tweet t does not contain a word w, we assume \(wa_{w,t}=0\).
Coronavac is referred to as the “Chinese vaccine” due to the research partnership between a Brazilian and a Chinese institution to develop it.
References
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems (NIPS’17), pp 6000–6010
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, (NAACL-HLT), pp 4171–4186
Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4593–4601. https://doi.org/10.18653/v1/P19-1452
Rogers A, Kovaleva O, Rumshisky A (2020) A primer in bertology: What we know about how bert works. Trans Assoc Comput Linguist 8:842–866. https://doi.org/10.1162/tacl_a_00349
Ventura F, Greco S, Apiletti D, Cerquitelli T (2022) Trusting deep learning natural-language models via local and global explanations. Knowl Inf Syst 64(7):1863–1907. https://doi.org/10.1007/s10115-022-01690-9
Molnar C (2019) Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/
Lundberg SM, Lee S (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774
Kokalj E, Škrlj B, Lavrač N, Pollak S, Robnik-Šikonja M (2021) BERT meets shapley: extending SHAP explanations to transformer-based classifiers. In: Proceedings of the EACL hackashop on news media content analysis and automated report generation, pp 16–21. https://www.aclweb.org/anthology/2021.hackashop-1.3
Ayoub J, Yang XJ, Zhou F (2021) Combat covid-19 infodemic using explainable natural language processing models. Inf Process Manag 58(4):102569. https://doi.org/10.1016/j.ipm.2021.102569
Anan R, Apon TS, Hossain ZT, Modhu EA, Mondal S, Alam MGR (2023) Interpretable bangla sarcasm detection using bert and explainable ai. In: 2023 IEEE 13th annual computing and communication workshop and conference (CCWC), pp 1272–1278. https://doi.org/10.1109/CCWC57344.2023.10099331
Novikova J, Shkaruta K (2022) DECK: behavioral tests to improve interpretability and generalizability of BERT models detecting depression from text. CoRR arXiv:abs/2209.05286. https://doi.org/10.48550/arXiv.2209.05286
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Proceedings of the 34th international conference on machine learning (ICML)–volume 70, pp 3319–3328
Abnar S, Zuidema W (2020) Quantifying attention flow in transformers. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4190–4197. https://doi.org/10.18653/v1/2020.acl-main.385
Chefer H, Gur S, Wolf L (2021) Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 782–791
Vig J (2019) A multiscale visualization of attention in the transformer model. In: Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations, pp 37–42. https://doi.org/10.18653/v1/P19-3007
Jain S, Wallace BC (2019) Attention is not Explanation. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1, pp 3543–3556. https://doi.org/10.18653/v1/N19-1357
Wiegreffe S, Pinter Y (2019) Attention is not not explanation. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 11–20. https://doi.org/10.18653/v1/D19-1002
Serrano S, Smith NA (2019) Is attention interpretable? In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2931–2951. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1282
Vashishth S, Upadhyay S, Tomar GS, Faruqui M (2019) Attention interpretability across NLP tasks. https://doi.org/10.48550/ARXIV.1909.11218
Bai B, Liang J, Zhang G, Li H, Bai K, Wang F (2021) Why attentions may not be interpretable? In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (KDD), pp 25–34. https://doi.org/10.1145/3447548.3467307
ALDayel A, Magdy W (2021) Stance detection on social media: state of the art and trends. Inf Process Manag 58(4):102597. https://doi.org/10.1016/j.ipm.2021.102597
Ghosh S, Singhania P, Singh S, Rudra K, Ghosh S (2019) Stance detection in web and social media: A comparative study. In: Crestani F, Braschler M, Savoy J, Rauber A, Müller H, Losada DE, Heinatz Bürki G, Cappellato L, Ferro N (eds.) Experimental IR meets multilinguality, multimodality, and interaction, pp 75–87. Springer, Berlin
Giorgioni S, Politi M, Salman S, 0001 RB, Croce D (2020) UNITOR@sardistance2020: Combining transformer-based architectures and transfer learning for robust stance detection. In: Proceedings of the seventh evaluation campaign of natural language processing and speech tools for Italian (EVALITA 2020). CEUR Workshop Proceedings, vol. 2765
Kawintiranon K, Singh L (2021) Knowledge enhanced masked language model for stance detection. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4725–4735. https://www.aclweb.org/anthology/2021.naacl-main.376
Ebeling R, Córdova Sáenz CA, Nobre J, Becker K (2021) The effect of political polarization on social distance stances in the Brazilian covid-19 scenario. J Inf Data Manag 12(1):86–108. https://doi.org/10.5753/jidm.2021.1889
Ebeling R, Saenz CAC, Nobre JC, Becker K (2022) Analysis of the influence of political polarization in the vaccination stance: the Brazilian covid-19 scenario. Proc Int AAAI Conf Web Social Media 16(1):159–170
de Sousa AM, Becker K (2022) Comparing covid vaccination stances on brazil and united states of america. In: Proceedings of the 37th Brazilian symposium on databases SBBD, pp 65–77. https://doi.org/10.5753/sbbd.2022.224628 (in Portuguese)
Sáenz CAC, Becker K (2021) Interpreting bert-based stance classification: a case study about the brazilian covid vaccination. In: SBC (ed.) XXXVI Simpósio Brasileiro de Banco de Dados, 2021, p. 12
Sáenz CAC, Becker K (2021) Assessing the use of attention weights to interpret bert-based stance classification. In: Proceedings of the IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology (WI/IAT)
Li X, Xiong H, Li X, Wu X, Zhang X, Liu J, Bian J, Dou D (2022) Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond. Knowl Inf Syst 64(12):3197–3234. https://doi.org/10.1007/s10115-022-01756-8
Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?”: Explaining the predictions of any classifier. KDD ’16, pp. 1135–1144. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2939672.2939778
Anelli VW, Biancofiore GM, Bellis AD, Noia TD, Sciascio ED (2022) Interpretability of BERT latent space through knowledge graphs. In: Proceedings of the 31st ACM international conference on information & knowledge management, Atlanta, GA, USA, October 17-21, 2022, pp 3806–3810. https://doi.org/10.1145/3511808.3557617
Ebeling R, Sáenz CAC, Nobre JC, Becker K (2020) Quarenteners vs. chloroquiners: A framework to analyze how political polarization affects the behavior of groups. In: IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology WI/IAT 2020, pp 203–210. https://doi.org/10.1109/WIIAT50758.2020.00031
Grootendorst M (2020) Bertopic: Leveraging bert and c-tf-idf to create easily interpretable topics https://doi.org/10.5281/zenodo.4381785
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Mutlu EC, Oghaz T, Jasser J, Tutunculer E, Rajabi A, Tayebi A, Ozmen O, Garibay I (2020) A stance data set on polarized conversations on twitter about the efficacy of hydroxychloroquine as a treatment for covid-19. Data Brief 33:106401. https://doi.org/10.1016/j.dib.2020.106401
Souza F, Nogueira R, Lotufo R (2020) Bertimbau: pretrained bert models for brazilian portuguese. In: Cerri R, Prati RC (eds) Intelligent systems. Springer, Cham, pp 403–417
Funding
This study was partially financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, CNPq (131178/2020-2) and FAPERGS (19/2551-0001862-2).
Author information
Authors and Affiliations
Contributions
CACS and KB helped in conceptualization, methodology, validation, formal analysis, investigation, resources, writing and figures; CACS contributed to software; and KB supervised the study and administrated the project.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Case studies: stances on issues about the COVID pandemic
Case studies: stances on issues about the COVID pandemic
We consider three case studies related to the pandemic to explore attention weights as the basis for interpreting stances in BERT classification models. Our research group developed two of them as we assessed the influence of political polarization for stances expressed in Twitter about social isolation [25] and vaccination [26]. These studies include a topic modelling analysis using BERTopic [34] to understand the main arguments that support each stance. The third case study addresses stances on hydroxychloroquine as a treatment for COVID-19 [36]. Table 4 summarizes the main characteristics of each case study relevant to our purposes.
(a) Social isolation: This case study reflects Brazilians’ polarized position at the beginning of the pandemic by late March 2020. President Bolsonaro launched a campaign “Brazil cannot stop" as a reaction to the social isolation measures, claiming that the damages to the economy were more extensive than the health benefits. Supporters of this campaign are referred to as “Chloroquiners". In contrast, the opponents (“Quarentineers") focus on the need to protect lives. We identified a strong political bias praising or criticizing the President and the central government. Further details can be found in [25, 33].
(b) Vaccination: This case study reflects Brazilians’ polarized position regarding COVID-19 vaccination from January 2020 to April 2021. Pro-vaxxers praise the science and demonstrate joy/anxiety/relief regarding vaccination, while anti-vaxxers regard vaccination as an individual choice. There has been a heated discussion about mandatory vaccination to reach collective immunity, with a Supreme Federal Court (STF) ruling about the constitutionality of this measure. A strong political bias was also identified, including a specific anti-vaxxer stance, referred to as “anti-sinovaxxers”. This subset represents the political dispute between President Bolsonaro and Sao Paulo’s governor João Dória, using as a target the so-called Chinese vaccine Coronavac, a partnership between a Brazilian and a Chinese institution. Coronavac was the first vaccine available to Brazilians, and it was politically exploited by Doria, who, back then, was a prospective candidate for the 2022 Presidential election. Details are provided in [26, 27].
(c) Hydroxychloroquine: This case study focuses on the polarization among Twitter users about using chloroquine and hydroxychloroquine to treat COVID-19. The data were collected during April 2020 and cover various events related to these drugs, such as India’s hydroxychloroquine export ban, the publication of clinical trials results, the accusations against the White House about a political/economic interest behind the push for these drugs and the FDA warning about the use of these drugs. There are three stances: people in favour of the use of these drugs (Pro-chloroquine), people against (Anti-chloroquine) and people without open opposition/acceptance of this treatment (Neutrals). More details on this dataset can be found in [36]. Since the original analysis is limited to the most frequent words in the dataset, we deployed BERTopic for topic modelling for further understanding.
In all three studies, stances were assigned according to the presence of specific pre-defined hashtags, which were removed from the datasets to avoid bias for stance classification. We deployed typical pre-processing actions such as removing mentions/URLs/special characters, lower casing and discarding short tweets (less than three words). Details can be found in the original studies. We selected random samples from the original datasets for our experiments, as detailed in Sect. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sáenz, C.A.C., Becker, K. Understanding stance classification of BERT models: an attention-based framework. Knowl Inf Syst 66, 419–451 (2024). https://doi.org/10.1007/s10115-023-01962-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01962-y