COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation

Fang, Junjie; Tan, Zhixing; Shi, Xiaodong

doi:10.1007/978-3-031-44693-1_55

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1655 Accesses

Abstract

With the increasing use of natural language generation (NLG) models, there is a growing need to differentiate between machine-generated text and natural language text. One promising approach is watermarking, which can help identify machine-generated text and protect against risks such as spam emails and academic dishonesty. However, existing watermarking methods can significantly affect the semantic meaning of the text, creating a need for more effective techniques that maintain semantic integrity. In this paper, we propose a novel watermarking method called COntextual SYnonym WAtermarking (COSYWA) that embeds watermarks in text using a Masked Language Model (MLM) without significantly impairing its semantics. Specifically, we use post-processing to embed watermarks in the output of an NLG model. We generate a context-based synonym set using an MLM model to embed watermark information and use statistical hypothesis testing to detect the existence of watermarking. Our experimental results show that COSYWA significantly enhances the text’s capacity to maintain its original meaning while effectively embedding a watermark, making it a promising approach for protecting against misinformation in NLG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

Article 25 June 2024

Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase

Article 07 November 2019

Semantic Compression for Text Document Processing

References

Adi, Y., et al.: Turning your weakness into a strength: watermarking deep neural networks by backdooring. In: USENIX-Security, pp. 1615–1631 (2018)
Google Scholar
Baki, S., et al.: Scaling and effectiveness of email masquerade attacks: exploiting natural language generation. In: ACM ASIACCS, pp. 469–482 (2017)
Google Scholar
Bender, E., et al.: On the dangers of stochastic parrots: can language models be too big? In: ACM FAccT, pp. 610–623 (2021)
Google Scholar
Bojar, O., et al.: Findings of the 2014 workshop on statistical machine translation. In: WMT, pp. 12–58 (2014)
Google Scholar
Chang, C., Clark, S.: Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method. Comput. Linguist. 40(2), 403–448 (2014)
Article Google Scholar
Ching-Yun, C., Stephen, C.: Practical linguistic steganography using contextual synonym substitution and vertex colour coding. In: EMNLP, pp. 1194–1203 (2010)
Google Scholar
Crothers, E., et al.: Machine generated text: a comprehensive survey of threat models and detection methods. arXiv preprint arXiv:2210.07321 (2022)
Dehouche, N.: Plagiarism in the age of massive generative pre-trained transformers (GPT-3). ESEP 21, 17–23 (2021)
Article Google Scholar
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Giaretta, A., Dragoni, N.: Community targeted phishing: a middle ground between massive and spear phishing through natural language generation. In: Ciancarini, P., Mazzara, M., Messina, A., Sillitti, A., Succi, G. (eds.) SEDA 2018. AISC, vol. 925, pp. 86–93. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-14687-0_8
Chapter Google Scholar
He, X., et al.: Cater: intellectual property protection on text generation APIs via conditional watermarks. In: NIPS (2022)
Google Scholar
He, X., et al.: Protecting intellectual property of language generation APIs with lexical watermark. In: AAAI, vol. 36, pp. 10758–10766 (2022)
Google Scholar
Kirchenbauer, J., et al.: A watermark for large language models. arXiv preprint arXiv:2301.10226 (2023)
Kurenkov, A.: Lessons from the GPT-4Chan controversy. The Gradient (2022)
Google Scholar
Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ACL (2020)
Google Scholar
Li, L., et al.: Bert-attack: adversarial attack against BERT using BERT. In: EMNLP (2020)
Google Scholar
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Ng, N., et al.: Facebook FAIR’s WMT19 news translation task submission. In: WMT (2020)
Google Scholar
OpenAI: Chatgpt: Optimizing language models for dialogue (2022)
Google Scholar
Papineni, K., et al.: Bleu: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
See, A., et al.: Get to the point: summarization with pointer-generator networks. In: ACL, pp. 1073–1083 (2017)
Google Scholar
Shu, K., et al.: Mining disinformation and fake news: concepts, methods, and recent advancements. Disinformation, misinformation, and fake news in social media: emerging research challenges and opportunities, pp. 1–19 (2020)
Google Scholar
Stiff, H., Johansson, F.: Detecting computer-generated disinformation. Int. J. Data Sci. Anal. 13(4), 363–383 (2022)
Article Google Scholar
Stribling, J., et al.: SCIgen-an automatic CS paper generator (2005)
Google Scholar
Szyller, S., et al.: Dawn: dynamic adversarial watermarking of neural networks. In: ACMM, pp. 4417–4425 (2021)
Google Scholar
Topkara, U., et al.: The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In: MM &Sec, pp. 164–174 (2006)
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: EMNLP, pp. 38–45 (2020)
Google Scholar
Zhang, T., et al.: Bertscore: evaluating text generation with BERT. In: ICLR (2020)
Google Scholar
Zhou, W., et al.: Bert-based lexical substitution. In: ACL, pp. 3368–3373 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62006138), the Key Support Project of NSFC-Liaoning Joint Foundation (No. U1908216), and the Major Scientific Research Project of the State Language Commission in the 13th Five-Year Plan (No. WT135-38). We thank all anonymous reviewers for their valuable suggestions on this work.

Author information

Authors and Affiliations

Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, China
Junjie Fang & Xiaodong Shi
Zhongguancun Laboratory, Beijing, People’s Republic of China
Zhixing Tan
Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, Xiamen, China
Xiaodong Shi

Authors

Junjie Fang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixing Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Shi .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, J., Tan, Z., Shi, X. (2023). COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_55

Download citation

DOI: https://doi.org/10.1007/978-3-031-44693-1_55
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase

Semantic Compression for Text Document Processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase

Semantic Compression for Text Document Processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation