Abstract
Idioms are an important part of natural languages and are often used to enhance expressiveness and fluency of speech. However, it can be difficult to find a contextually appropriate idiom when writing an essay or crafting a headline for a news article, especially for non-native speakers. This gives rise to the idea of an automated system that is able to recommend an idiom for an input sentence. The goal of this study is to develop and compare methods that would make such a system possible. We used an existing collection of idioms and employed several configurations based on models from the Sentence-BERT family. We also automatically expanded the initial dataset and fine-tuned a pre-trained Sentence-BERT model on the idiom/context matching task. This approach achieved the highest MRR score of 0.507. The data and the trained model are publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adewumi, T., Vadoodi, R., Tripathy, A., Nikolaido, K., Liwicki, F., Liwicki, M.: Potential idiomatic expression (PIE)-English: corpus for classes of idioms. In: LREC, pp. 689–696 (2022)
Agrawal, R., Kumar, V.C., Muralidharan, V., Sharma, D.M.: No more beating about the bush: a step towards idiom handling for Indian language NLP. In: LREC (2018)
BNC Consortium, et al.: British national corpus. Oxford Text Archive Core Collection (2007)
Dale, R., Viethen, J.: The automated writing assistance landscape in 2021. Nat. Lang. Eng. 27(4), 511–518 (2021)
Dankers, V., Lucas, C., Titov, I.: Can transformer be too compositional? Analysing idiom processing in neural machine translation. In: ACL, pp. 3608–3626 (2022)
Gamage, G., De Silva, D., Adikari, A., Alahakoon, D.: A BERT-based idiom detection model. In: HSI, pp. 1–5 (2022)
Haagsma, H., Bos, J., Nissim, M.: MAGPIE: a large corpus of potentially idiomatic expressions. In: LREC, pp. 279–287 (2020)
Jochim, C., Bonin, F., Bar-Haim, R., Slonim, N.: SLIDE - a sentiment lexicon of common idioms. In: LREC (2018)
Liu, P., Qian, K., Qiu, X., Huang, X.J.: Idiom-aware compositional distributed semantics. In: EMNLP, pp. 1204–1213 (2017)
Liu, Y., Liu, B., Shan, L., Wang, X.: Modelling context with neural networks for recommending idioms in essay writing. Neurocomputing 275, 2287–2293 (2018)
Liu, Y., Pang, B., Liu, B.: Neural-based Chinese idiom recommendation for enhancing elegance in essay writing. In: ACL, pp. 5522–5526 (2019)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Nunberg, G., Sag, I.A., Wasow, T.: Idioms. Language 70(3), 491–538 (1994)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Workshop on New Challenges for NLP Frameworks, pp. 45–50 (2010)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP, pp. 3982–3992 (2019)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Berlin (2002). https://doi.org/10.1007/3-540-45715-1_1
Saxena, P., Paul, S.: EPIE dataset: a corpus for possible idiomatic expressions. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds.) TSD 2020. LNCS, vol. 12284, pp. 87–94. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58323-1_9
Sporleder, C., Li, L.: Unsupervised recognition of literal and non-literal use of idiomatic expressions. In: EACL, pp. 754–762 (2009)
Sporleder, C., Li, L., Gorinski, P., Koch, X.: Idioms in context: the IDIX corpus. In: LREC (2010)
Wible, D., Tsao, N.L.: StringNet as a computational resource for discovering and investigating linguistic constructions. In: Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, pp. 25–31 (2010)
Williams, L., Bannister, C., Arribas-Ayllon, M., Preece, A., Spasić, I.: The role of idioms in sentiment analysis. Expert Syst. Appl. 42(21), 7375–7385 (2015)
Acknowledgments
The study develops ideas partially derived from Anna Vysheslavova’s 2020 summer internship under Pavel Braslavski’s supervision. We would like to express gratitude to Yulia Badryzlova for fruitful discussion of the paper draft.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhernokleev, D., Braslavski, P. (2024). Needle in a Haystack: Finding Suitable Idioms Based on Text Descriptions. In: Ignatov, D.I., et al. Analysis of Images, Social Networks and Texts. AIST 2023. Lecture Notes in Computer Science, vol 14486. Springer, Cham. https://doi.org/10.1007/978-3-031-54534-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-54534-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54533-7
Online ISBN: 978-3-031-54534-4
eBook Packages: Computer ScienceComputer Science (R0)