Abstract
This paper is a condensed overview of Touché: the fourth edition of the lab on argument and causal retrieval that was held at CLEF 2023. With the goal to create a collaborative platform for research on computational argumentation and causality, we organized four shared tasks: (a) argument retrieval for controversial topics, where participants retrieve web documents that contain high-quality argumentation and detect the argument stance, (b) causal retrieval, where participants retrieve documents that contain causal statements from a generic web crawl and detect the causal stance, (c) image retrieval for arguments, where participants retrieve from a focused web crawl images showing support or opposition to some stance, and (d) multilingual multi-target stance classification, where participants detect the stance of comments on proposals from an online multilingual participatory democracy platform.
L. Hemamou—Independent view, not influenced by Sanofi R &D France.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The term ‘touché’ is commonly “used to acknowledge a hit in fencing or the success or appropriateness of an argument, an accusation, or a witty point.” [https://merriam-webster.com/dictionary/touche]
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Pre-trained model: https://huggingface.co/facebook/bart-large-cnn; minimum length: 64; maximum length: 256.
- 9.
Pre-trained model: https://huggingface.co/google/flan-t5-base; maximum generated tokens: 3; the prompt is given in Appendix A.
- 10.
Pre-trained model: https://huggingface.co/facebook/bart-large-cnn; minimum length: 64; maximum length: 256.
- 11.
Pre-trained model: https://huggingface.co/google/flan-t5-base; maximum generated tokens: 3; the prompt is given in Appendix A.
- 12.
- 13.
As one of our suggested use case for image retrieval for arguments is getting a quick overview, we excluded overly large images.
- 14.
- 15.
To sharpen our focus on images, this year we tried to exclude images that are actually screenshots of text documents.
- 16.
- 17.
Archived using https://github.com/webis-de/scriptor
- 18.
- 19.
Since no stance model convincingly outperformed naive baselines in their evaluation, we use the simple both-sides baseline that assigns each image to both stances.
- 20.
- 21.
- 22.
German, English, Greek, French, Italian, and Hungarian.
- 23.
- 24.
- 25.
roberta-base.
- 26.
xlm-roberta-large.
- 27.
bert-base-uncased.
- 28.
References
Ajzen, I.: The social psychology of decision making. In: Social psychology: Handbook of basic principles, pp. 297–325. Guilford Press (1996)
Avila, J.P., Rodrigo, A., Centeno, R.: Silver surfer team at Touché task 4: Testing data augmentation and label propagation for multilingual stance detection. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Bar-Haim, R., Kantor, Y., Venezian, E., Katz, Y., Slonim, N.: Project debater APIs: Decomposing the AI grand challenge. In: Proceedings of EMNLP 2021, pp. 267–274. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-demo.31
Barriere, V., Balahur, A.: Multilingual multi-target stance recognition in online public consultations. Mathematics 11(9), 2161 (2023)
Barriere, V., Balahur, A., Ravenet, B.: Debating Europe: A multilingual multi-target stance classification dataset of online debates. In: Proceedings of PoliticalNLP 2022, pp. 16–21. ELRA (2022). https://aclanthology.org/2022.politicalnlp-1.3
Barriere, V., Jacquet, G., Hemamou, L.: CoFE: A new dataset of intra-multilingual multi-target stance classification from an online European participatory democracy platform. In: Proceedings of AACL-IJCNLP 2022 (2022)
Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Elastic ChatNoir: Search engine for the ClueWeb and the common crawl. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 820–824. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_83
Bondarenko, A., et al.: Overview of Touché 2023: Argument and causal retrieval. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Bondarenko, A., et al.: Overview of Touché 2021: Argument retrieval. In: Proceedings of CLEF 2021. CEUR Workshop Proceedings, vol. 2936, pp. 2258–2284. CEUR-WS.org (2021). https://ceur-ws.org/Vol-2936/paper-205.pdf
Bondarenko, A., et al.: CausalQA: A benchmark for causal question answering. In: Proceedings of COLING 2022, pp. 3296–3308. ICCL (2022). https://aclanthology.org/2022.coling-1.291
Carnot, M.L., et al.: On stance detection in image retrieval for argumentation. In: Proceedings of SIGIR 2023. ACM (2023). https://doi.org/10.1145/3539618.3591917
Chernodub, A., et al.: TARGER: Neural argument mining at your fingertips. In: Proceedings of ACL 2019, pp. 195–200. ACL (2019). https://doi.org/10.18653/v1/p19-3031
Chung, H.W., et al.: Scaling instruction-finetuned language models. arXiv (2022). https://doi.org/10.48550/arXiv.2210.11416
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of ACL 2020, pp. 8440–8451. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.747
Cormack, G.V., Smucker, M.D., Clarke, C.L.A.: Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retrieval J. 14(5), 441–465 (2011). https://doi.org/10.1007/s10791-011-9162-z
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186. ACL (2019). https://doi.org/10.18653/v1/n19-1423
Elagina, D., Heizmann, B.A., Koch, M., Lahmann, G., Ortlepp, C.: Neville longbottom at Touché 2023: Image retrieval for arguments using ChatGPT, CLIP and IBM debater. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fröbe, M., et al.: Continuous integration for reproducible shared tasks with TIRA.io. In: Kamps, J., et al. (eds.) ECIR 2023. LNCS, vol. 13982, pp. 236–241. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28241-6_20
Gaden, A., Reinhold, B., Zeit-Altpeter, L., Rausch, N.: Evidence retrieval for causal questions using query expansion and reranking. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Heindorf, S., Scholten, Y., Wachsmuth, H., Ngonga Ngomo, A.C., Potthast, M.: CauseNet: Towards a causality graph extracted from the web. In: 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), pp. 3023–3030. ACM (2020). https://doi.org/10.1145/3340531.3412763
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of NeurIPS 2017, pp. 3146–3154. NeurIPS (2017). https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
Kiesel, J., Potthast, M., Stein, B.: Dataset Touché22-image-retrieval-for-arguments (2022). https://doi.org/10.5281/zenodo.6786948
Kiesel, J., Potthast, M., Stein, B.: Dataset Touché23-image-retrieval-for-arguments (2023). https://doi.org/10.5281/zenodo.7497994
Lewis, M., et al.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of ACL 2020, pp. 7871–7880. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.703
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2429–2436. ACM (2021). https://doi.org/10.1145/3404835.3463254
Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: Declarative experimentation in Python from BM25 to dense retrieval. In: Proceedings of CIKM 2021, pp. 4526–4533. ACM (2021). https://doi.org/10.1145/3459637.3482013
Möbius, M., Enderling, M., Bachinger, S.: Jean-Luc Picard at Touché 2023: Comparing image generation, stance detection and feature matching for image retrieval for arguments. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Overwijk, A., Xiong, C., Callan, J.: ClueWeb22: 10 billion web documents with rich information. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022), pp. 3360–3362. ACM (2022). https://doi.org/10.1145/3477495.3536321
Palotti, J.R.M., Scells, H., Zuccon, G.: TrecTools: An open-source Python library for information retrieval practitioners involved in TREC-like campaigns. In: Proceedings of SIGIR 2019, pp. 1325–1328. ACM (2019). https://doi.org/10.1145/3331184.3331399
Plenz, M., Buchmüller, R., Bondarenko, A.: Argument quality prediction for ranking documents. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natural language processing toolkit for many human languages. In: Proceedings of ACL 2020, pp. 101–108. ACL (2020). https://doi.org/10.18653/v1/2020.acl-demos.14
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML 2021. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021). https://proceedings.mlr.press/v139/radford21a.html
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of TREC 1994. NIST Special Publication, vol. 500–225, pp. 109–126. NIST (1994)
Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM 2004, pp. 42–49. ACM (2004). https://doi.org/10.1145/1031171.1031181
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of CVPR 2022, pp. 10674–10685. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01042
Schaefer, K.: Queen of swords at Touché 2023: Intra-multilingual multi-target stance classification using BERT. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Su, H., et al.: One embedder, any task: Instruction-finetuned text embeddings. arXiv (2022). 10.48550/arXiv.2212.09741
Sugiyama, A., Yoshinaga, N.: Data augmentation using back-translation for context-aware neural machine translation. In: Proceedings of DiscoMT@EMNLP 2019, pp. 35–44. ACL (2019). https://doi.org/10.18653/v1/D19-6504
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In: Proceedings of NeurIPS 2021. NeurIPS (2021). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract-round2.html
Vamvas, J., Sennrich, R.: X-stance: A multilingual multi-target dataset for stance detection. In: Proceedings of SwissText/KONVENS 2020. CEUR-WS.org (2020). https://ceur-ws.org/Vol-2624/paper9.pdf
Wachsmuth, H., et al.: Computational argumentation quality assessment in natural language. In: Proceedings of EACL 2017, pp. 176–187. ACL (2017). https://doi.org/10.18653/v1/e17-1017
Xie, X., et al.: Grid-based evaluation metrics for web image search. In: Proceedings of WWW 2019, pp. 2103–2114. ACM (2019). https://doi.org/10.1145/3308558.3313514
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Proceedings of NIPS 2003, pp. 321–328. MIT Press (2003). https://proceedings.neurips.cc/paper/2003/hash/87682805257e619d49b8e0dfdc14affa-Abstract.html
Acknowledgment
This work has been partially supported by the Deutsche Forschungsgemeinschaft (DFG) in the project “ACQuA 2.0: Answering Comparative Questions with Arguments” (project 376430233) as part of the priority program “RATIO: Robust Argumentation Machines” (SPP 1999). V. Barriere’s work was funded by the National Center for Artificial Intelligence CENIA FB210017, Basal ANID. This work has been partially supported by the OpenWebSearch.eu project (funded by the EU; GA 101070014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bondarenko, A. et al. (2023). Overview of Touché 2023: Argument and Causal Retrieval. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-42448-9_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)