Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Best of Touché 2023 Task 4: Testing Data Augmentation and Label Propagation for Multilingual Multi-target Stance Detection

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2024)

Abstract

Touché 2023 task 4 evaluated stance detection in a Multilingual multi-target setting with a reduced annotated dataset. This is why we have tested different approaches focused on increasing training data by (1) including new samples from back-translating original training data and (2) adding samples from unlabeled data using label propagation. The results showed that back-translation was successful, while label propagation worsened the performance. We obtained the best results with a transformer-based model fine-tuned in two steps: the first on a related, more extensive dataset and the second on the development data. This shows the usefulness of including related data in our approach and suggests additional research based on taking advantage of other datasets and data augmentation. Besides, given that the current results were close to a 0.35 f1 score, there is still room for improvement in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://futureu.europa.eu/?locale=en.

  2. 2.

    https://en.wikipedia.org/wiki/Sardines_movement.

  3. 3.

    https://pypi.org/project/deep-translator/.

  4. 4.

    https://huggingface.co/Helsinki-NLP/opus-mt-ine-en.

  5. 5.

    https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2.

  6. 6.

    https://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelSpreading.html.

  7. 7.

    The final model selected in each case was decided based on the results obtained during previous tests.

  8. 8.

    https://huggingface.co/roberta-base.

  9. 9.

    https://huggingface.co/xlm-roberta-large.

  10. 10.

    https://huggingface.co/bert-base-uncased.

  11. 11.

    https://www.tira.io/.

  12. 12.

    It is important to note that the actual labels of the test set have never been made public and therefore, no checks or additional experiments could be performed.

References

  1. Agerri, R., Centeno, R., Espinosa, M.S., de Landa, J.F., Rodrigo, Á.: VaxxStance@iberLEF 2021: overview of the task on going beyond text in cross-lingual stance detection. Proces. del Leng. Nat. 67, 173–181 (2021)

    Google Scholar 

  2. Anand, P., Walker, M., Abbott, R., Fox Tree, J.E., Bowmani, R., Minor, M.: Cats rule and dogs drool!: Classifying stance in online debate. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2011) (2011)

    Google Scholar 

  3. Barriere, V., Balahur, A.: Multilingual multi-target stance recognition in online public consultations. Mathematics 11(9), 2161 (2023)

    Article  Google Scholar 

  4. Barriere, V., Jacquet, G.G., Hemamou, L.: CoFE: a new dataset of intra-multilingual multi-target stance classification from an online European participatory democracy platform. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pp. 418–422 (2022)

    Google Scholar 

  5. Bondarenko, A., et al.: Overview of Touché 2023: argument and causal retrieval. In: Arampatzis, A., et al. (eds.) CLEF 2023. LNCS, vol. 14163, pp. 507–530. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42448-9_31

    Chapter  Google Scholar 

  6. Cignarella, A.T., Lai, M., Bosco, C., Patti, V., Paolo, R., et al.: SardiStance@ EVALITA2020: overview of the task on stance detection in Italian tweets. In: EVALITA 2020 Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, pp. 1–10. CEUR (2020)

    Google Scholar 

  7. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451 (2020)

    Google Scholar 

  8. Dey, K., Shrivastava, R., Kaushik, S.: Topical stance detection for Twitter: a two-phase LSTM model using attention. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 529–536. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_40

    Chapter  Google Scholar 

  9. Faulkner, A.: Automated classification of stance in student essays: an approach using stance target information and the Wikipedia link-based measure. The Florida AI Research Society (2014)

    Google Scholar 

  10. Hardalov, M., Arora, A., Nakov, P., Augenstein, I.: Cross-domain label-adaptive stance detection. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9011–9028 (2021)

    Google Scholar 

  11. Hardalov, M., Arora, A., Nakov, P., Augenstein, I.: A survey on stance detection for mis- and disinformation identification. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 1259–1277. Association for Computational Linguistics, Seattle (2022)

    Google Scholar 

  12. Hercig, T., Krejzl, P., Hourová, B., Steinberger, J., Lenc, L.: Detecting stance in Czech news commentaries. In: Conference on Theory and Practice of Information Technologies (2017). https://api.semanticscholar.org/CorpusID:35923394

  13. Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5070–5079 (2019)

    Google Scholar 

  14. Küçük, D., Can, F.: Stance detection: a survey. ACM Comput. Surv. 53(1), 1–37 (2020)

    Article  Google Scholar 

  15. Küçük, D., Can, F.: Stance detection on tweets: an SVM-based approach. arXiv preprint arXiv:1803.08910 (2018)

  16. Lai, M., Cignarella, A.T., Hernández Farías, D.I., Bosco, C., Patti, V., Rosso, P.: Multilingual stance detection in social media political debates. Comput. Speech Lang. 63, 101075 (2020)

    Article  Google Scholar 

  17. Lai, M., Patti, V., Ruffo, G., Rosso, P.: Stance evolution and Twitter interactions in an Italian political debate. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 15–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_2

    Chapter  Google Scholar 

  18. Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: A dataset for detecting stance in tweets. In: Calzolari, N., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016)

    Google Scholar 

  19. Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 31–41 (2016)

    Google Scholar 

  20. Patel, H., Verma, J.P.: Community detection using label propagation algorithm with random walk approach. In: Dhavse, R., Kumar, V., Monteleone, S. (eds.) Emerging Technology Trends in Electronics, Communication and Networking. LNEE, vol. 952, pp. 307–320. Springer, Singapore (2023). https://doi.org/10.1007/978-981-19-6737-5_25

    Chapter  Google Scholar 

  21. Schäfer, K.: Queen of swords at Touché 2023: intra-multilingual multi-target stance classification using BERT. In: Working Notes of CLEF (2023)

    Google Scholar 

  22. Sugiyama, A., Yoshinaga, N.: Data augmentation using back-translation for context-aware neural machine translation. In: Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019) (2019)

    Google Scholar 

  23. Taulé, M., Martí, M.A., Pardo, F.M.R., Rosso, P., Bosco, C., Patti, V.: Overview of the task on stance and gender detection in tweets on Catalan independence. In: IberEval@SEPLN (2017)

    Google Scholar 

  24. Taulé, M., Pardo, F.M.R., Martí, M.A., Rosso, P.: Overview of the task on multimodal stance detection in tweets on Catalan #1oct referendum. In: IberEval@SEPLN (2018)

    Google Scholar 

  25. Vamvas, J., Sennrich, R.: X-stance: a multilingual multi-target dataset for stance detection. In: Proceedings of SwissText/KONVENS 2020 (2020)

    Google Scholar 

  26. Van Dyk, D.A., Meng, X.L.: The art of data augmentation. J. Comput. Graph. Stat. 10(1), 1–50 (2001)

    Article  MathSciNet  Google Scholar 

  27. Vaswani, A., et al.: Attention is all you need (2023)

    Google Scholar 

  28. Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 985–992 (2006)

    Google Scholar 

  29. Wei, W., Zhang, X., Liu, X., Chen, W., Wang, T.: pkudblab at SemEval-2016 task 6: a specific convolutional neural network system for effective stance detection. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (2016)

    Google Scholar 

  30. Xu, R., Zhou, Y., Wu, D., Gui, L., Du, J., Xue, Y.: Overview of NLPCC shared task 4: stance detection in Chinese microblogs. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 907–916. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_85

    Chapter  Google Scholar 

  31. Zotova, E., Agerri, R., Rigau, G.: Semi-automatic generation of multilingual datasets for stance detection in Twitter. Expert Syst. Appl. 170, 114547 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially funded by the Spanish Research Agency (Agencia Estatal de Investigación), DeepInfo project PID2021-127777OB-C22 (MCIU/AEI/FEDER, UE) and the HOLISTIC ANALYSIS OF ORGANISED MISINFORMATION ACTIVITY IN SOCIAL NETWORKS project (PCI2022-135026-2).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jorge Avila or Roberto Centeno .

Editor information

Editors and Affiliations

A Hyperparameters

A Hyperparameters

1.1 A.1 Run 1

  • subsample: 0.5

  • min_child_weight: 5

  • max_depth: 7

  • learning rate: 0.01

  • colsample_bytree: 0.5

1.2 A.2 Runs 2, 3, 4, 5 and 6

(See Table 3).

Table 3. Hyperparameters for runs 2, 3, 4 and 5

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Avila, J., Rodrigo, Á., Centeno, R. (2024). Best of Touché 2023 Task 4: Testing Data Augmentation and Label Propagation for Multilingual Multi-target Stance Detection. In: Goeuriot, L., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024. Lecture Notes in Computer Science, vol 14958. Springer, Cham. https://doi.org/10.1007/978-3-031-71736-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71736-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71735-2

  • Online ISBN: 978-3-031-71736-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics