Paper 2025/558
Breaking and Fixing Content-Defined Chunking
Abstract
Content-defined chunking (CDC) algorithms split streams of data into smaller blocks, called chunks, in a way that preserves chunk boundaries when the data is partially changed. CDC is ubiquitous in applications that deduplicate data such as backup solutions, software patching systems, and file hosting platforms. Much like compression, CDC can introduce leakage when combined with encryption: fingerprinting attacks can exploit chunk length patterns to infer information about the data. To address these risks, many systems—mainly in the cloud backup setting—have developed bespoke mitigations by mixing a cryptographic key into the chunking process. We study these keyed CDC (KCDC) schemes “in the wild”, presenting efficient key recovery attacks against five different KCDC schemes, deployed in the backup solutions Borg, Bupstash, Duplicacy, Restic, and Tarsnap. Our attacks are in a realistic threat model that relies only on weak known or chosen-plaintext capabilities. This shows, in particular, that they fail to protect against fingerprinting attacks. To demonstrate practical exploitability, we also present “end-to-end” attacks on three complete encrypted backup applications, namely Borg, Restic and Tarsnap. These build on our attacks on the underlying KCDC schemes. In an effort to tackle these problems, we introduce the first formal treatment for KCDC schemes and propose a provably secure construction that fulfills a strong notion of security. We benchmark our construction against existing (broken) approaches, showing that it has competitive performance. In doing so, we take a step towards making real-world systems that rely on KCDC more resilient to attacks.
Metadata
- Available format(s)
-
PDF
- Category
- Applications
- Publication info
- Preprint.
- Keywords
- backup systemscontent-defined chunkingcryptanalysis
- Contact author(s)
-
kitruong @ ethz ch
simerz @ ethz ch
scmatteo @ ethz ch
mail @ felixguenther info
kenny paterson @ inf ethz ch - History
- 2025-03-28: approved
- 2025-03-26: received
- See all versions
- Short URL
- https://ia.cr/2025/558
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2025/558, author = {Kien Tuong Truong and Simon-Philipp Merz and Matteo Scarlata and Felix Günther and Kenneth G. Paterson}, title = {Breaking and Fixing Content-Defined Chunking}, howpublished = {Cryptology {ePrint} Archive, Paper 2025/558}, year = {2025}, url = {https://eprint.iacr.org/2025/558} }