Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset

Berro, Auday; Benatallah, Boualem; Gaci, Yacine; Benabdeslem, Khalid

doi:10.1007/978-3-031-70341-6_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14941))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

878 Accesses

Abstract

Developing task-oriented bots requires diverse sets of annotated user utterances to learn mappings between natural language utterances and user intents. Automated paraphrase generation offers a cost-effective and scalable approach for generating varied training samples by creating different versions of the same utterance. However, existing sequence-to-sequence models used in automated paraphrasing often suffer from errors, such as repetition and grammar. Identifying these errors, particularly in transformer architectures, has become a challenge. In this paper, we propose a taxonomy of errors encountered in transformer-based paraphrase generation models based on a comprehensive error analysis of transformer-generated paraphrases. Leveraging this taxonomy, we introduced the Transformer-based Paraphrasing Model Errors dataset, consisting of 5880 annotated paraphrases labeled with error types and explanations. Additionally, we developed a novel multilabel paraphrase annotation model by fine-tuning a BERT model for error annotation task. Evaluation against human annotations demonstrates significant agreement, with the model showing robust performance in predicting error labels, even for unseen paraphrases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Crowdsourcing Syntactically Diverse Paraphrases with Diversity-Aware Prompts and Workflows

PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing

Improving paraphrase generation using supervised neural-based statistical machine translation framework

Article 17 July 2023

Notes

1.
https://github.com/AudayBerro/TPME/tree/master.
2.
https://github.com/sonos/nlu-benchmark.
3.
https://github.com/GEM-benchmark/NL-Augmenter.
4.
The prompt we used can be found in the supplementary material link supplied.
5.
https://upset.app/ and https://asntech.shinyapps.io/intervene/.
6.
OR refers to the state of Oregon.
7.
Paraphrase generation is a multi-step (word-by-word) prediction task, where a small error at an early time-step may lead to poor predictions for the rest of the sentence, as the error is compounded over the next token predictions [8].

References

Alikaniotis, D., Raheja, V.: The unreasonable effectiveness of transformer language models in grammatical error correction. In: BEA@ACL (2019)
Google Scholar
Bannard, C., Callison, C.: Paraphrasing with bilingual parallel corpora. In: ACL’05, pp. 597–604 (2005). https://aclanthology.org/P05-1074
Berro, A., Fard, M.A.Y.Z., et al.: An extensible and reusable pipeline for automated utterance paraphrases. In: PVLDB (2021)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Google Scholar
Bui, T.C., Le, V.D., To, H.T., Cha, S.K.: Generative pre-training for paraphrase generation by representing and predicting spans in exemplars. In: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 83–90. IEEE (2021)
Google Scholar
Cegin, J., Simko, J., Brusilovsky, P.: ChatGPT to replace crowdsourcing of paraphrases for intent classification: Higher diversity and comparable model robustness (2023). arXiv preprint arXiv:2305.12947
Celikyilmaz, A., Clark, E., Gao, J.: Evaluation of text generation: A survey (2020)
Google Scholar
Chen, D., Dolan, W.B.: Collecting highly parallel data for paraphrase evaluation. ACL-HLT, pp. 190–200 (2011). https://aclanthology.org/P11-1020
Chklovski, T.: Collecting paraphrase corpora from volunteer contributors. In: Proceedings of the 3rd International Conference on Knowledge Capture, pp. 115–120 (2005)
Google Scholar
Dopierre, T., Gravier, C., Logerais, W.: ProtAugment: unsupervised diverse short-texts paraphrasing for intent detection meta-learning. In: ACL-IJCNLP (2021). https://aclanthology.org/2021.acl-long.191
Dou, Y., Forbes, M., et al.: Is GPT-3 text indistinguishable from human text? Scarecrow: a framework for scrutinizing machine text. In: ACL, pp. 7250–7274 (2022)
Google Scholar
Ethayarajh, K.: How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. EMNLP-IJCNLP (2019)
Google Scholar
Freitag, M., Foster, G., et al.: Experts, errors, and context: a large-scale study of human evaluation for machine translation. Trans. Assoc. Comput. Linguist. 9, 1460–1474 (2021). https://aclanthology.org/2021.tacl-1.87
Fujita, A.: Automatic generation of syntactically well-formed and semantically appropriate paraphrases. Ph.D. thesis, Ph. D. thesis, Nara Institute of Science and Technology (2005). https://api.semanticscholar.org/CorpusID:16348044
Fujita, A., Furihata, K., Inui, K., Matsumoto, Y., Takeuchi, K.: Paraphrasing of japanese light-verb constructions based on lexical conceptual structure (2004)
Google Scholar
Goyal, T., Durrett, G.: Neural syntactic preordering for controlled paraphrase generation, pp. 238–252 (2020)
Google Scholar
Hegde, C., Patil, S.: Unsupervised paraphrase generation using pre-trained language models (2020)
Google Scholar
Huang, S., Wu, Y., Wei, F., Luan, Z.: Dictionary-guided editing networks for paraphrase generation 33, 6546–6553 (2019)
Google Scholar
Huang, T.H., Chen, Y.N., Bigham, J.P.: Real-time on-demand crowd-powered entity extraction (2017). https://arxiv.org/abs/1704.03627
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks, pp. 1875–1885 (2018)
Google Scholar
Jiang, Y., Kummerfeld, J.K., Lasecki, W.S.: Understanding task design trade-offs in crowdsourced paraphrase collection. In: ACL 55th Annual Meeting, pp. 103–109. Vancouver, Canada (Jul 2017)
Google Scholar
Koponen, M.: Assessing machine translation quality with error analysis (2010)
Google Scholar
Larson, S., Cheung, A., Mahendran, A., et al.: Inconsistencies in crowdsourced slot-filling annotations: a typology and identification methods. In: COLING (2020). https://aclanthology.org/2020.coling-main.442
Li, Z., Jiang, X., Shang, L., Li, H.: Paraphrase generation with deep reinforcement learning. EMNLP (2018). https://aclanthology.org/D18-1421
Madnani, N., Dorr, B.J.: Generating phrasal and sentential paraphrases: A survey of data-driven methods. CL (2010). https://aclanthology.org/J10-3003
Mallinson, J., Sennrich, R., Lapata, M.: Paraphrasing revisited with neural machine translation. ACL European Chapter (2017). https://aclanthology.org/E17-1083
Metzler, D., Hovy, E., Zhang, C.: An empirical evaluation of data-driven paraphrase generation techniques. In: ACL 49th Annual Meeting, pp. 546–551. Portland, Oregon, USA (2011)
Google Scholar
Negri, M., Mehdad, Y., Marchetti, A., Giampiccolo, D., Bentivogli, L.: Chinese whispers: Cooperative paraphrase acquisition. In: LREC’12, pp. 2659–2665. Istanbul, Turkey (2012)
Google Scholar
Nilforoshan, H., Wang, J., Wu, E.: PreCog: Improving crowdsourced data quality before acquisition (2017). arXiv preprint arXiv:1704.02384
Popović, M.: On nature and causes of observed MT errors. MTSummitXVIII (2021)
Google Scholar
Prakash, A., et al.: Neural paraphrase generation with stacked residual LSTM networks. In: COLING (2016)
Google Scholar
Raffel, C., Shazeer, N., Roberts, A., Lee, K., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. In: JMLR (2020)
Google Scholar
Ramírez, J., Berro, A., Baez, M., Benatallah, B., Casati, F.: Crowdsourcing diverse paraphrases for training task-oriented bots (2021)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks. EMNLP (2019). https://aclanthology.org/D19-1410
Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S.: Beyond accuracy: behavioral testing of NLP models with checklist. In: ACL, pp. 4902–4912 (2020). https://aclanthology.org/2020.acl-main.442
Su, Y., Awadallah, A.H., Khabsa, M., Pantel, P., Gamon, M., Encarnacion, M.: Building natural language interfaces to web APIs (2017)
Google Scholar
Sun, X., Liu, J., Lyu, Y., et al.: Answer-focused and position-aware neural question generation. EMNLP (2018). https://aclanthology.org/D18-1427
Thompson, B., Post, M.: Automatic machine translation evaluation in many languages via zero-shot paraphrasing. EMNLP (2020)
Google Scholar
Thomson, C., Reiter, E.: A gold standard methodology for evaluating accuracy in data-to-text systems. In: INLG (2020). https://aclanthology.org/2020.inlg-1.22
Van, E., Clinciu, M., et al.: Underreporting of errors in NLG output, and what to do about it. INLG (2021). https://aclanthology.org/2021.inlg-1.14
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Witteveen, S., Andrews, M.: Paraphrasing with large language models (2019)
Google Scholar
Yaghoub-Zadeh-Fard, M., Benatallah, B., et al.: Dynamic word recommendation to obtain diverse crowdsourced paraphrases of user utterances. In: IUI (2020)
Google Scholar
Yaghoub-Zadeh-Fard, M.A., Benatallah, B., et al.: User utterance acquisition for training task-oriented bots: A review of challenges, techniques and opportunities (2020)
Google Scholar
Yaghoubzadeh, M., Benatallah, B., et al.: A study of incorrect paraphrases in crowdsourced user utterances. NAACL’19 (2019). https://aclanthology.org/N19-1026
Yaghoubzadehfard, M.: Scalable and Quality-Aware Training Data Acquisition for Conversational Cognitive Services. Ph.D. thesis, UNSW Sydney (2021)
Google Scholar
Zamanirad, S.: Superimposition of natural language conversations over software enabled services. Ph.D. thesis, University of New South Wales, Sydney, Australia (2019)
Google Scholar
Zeng, D., Zhang, H., Xiang, L., Wang, J., Ji, G.: User-oriented paraphrase generation with keywords controlled network. IEEE Access 7, 80542–80551 (2019)
Article Google Scholar
Zhou, J., Bhat, S.: Paraphrase generation: a survey of the state of the art. In: EMNLP (2021). https://aclanthology.org/2021.emnlp-main.414

Download references

Acknowledgments

We acknowledge the financial support provided by the PICASSO Idex Lyon scholarship, which supported the research conducted by Auday Berro as part of Ph.D. studies.

Author information

Authors and Affiliations

Université claude Bernard Lyon 1, LIRIS UMR 5205, Lyon, France
Auday Berro, Yacine Gaci & Khalid Benabdeslem
Insight SFI Research Center on Data Analytics, Dublin City University, Dublin, Ireland
Boualem Benatallah

Authors

Auday Berro
View author publications
You can also search for this author in PubMed Google Scholar
Boualem Benatallah
View author publications
You can also search for this author in PubMed Google Scholar
Yacine Gaci
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Benabdeslem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Auday Berro .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berro, A., Benatallah, B., Gaci, Y., Benabdeslem, K. (2024). Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14941. Springer, Cham. https://doi.org/10.1007/978-3-031-70341-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-70341-6_20
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70340-9
Online ISBN: 978-3-031-70341-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset