Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Paper 2025/558

Breaking and Fixing Content-Defined Chunking

Kien Tuong Truong, ETH Zurich
Simon-Philipp Merz, ETH Zurich
Matteo Scarlata, ETH Zurich
Felix Günther, IBM Research - Zurich
Kenneth G. Paterson, ETH Zurich
Abstract

Content-defined chunking (CDC) algorithms split streams of data into smaller blocks, called chunks, in a way that preserves chunk boundaries when the data is partially changed. CDC is ubiquitous in applications that deduplicate data such as backup solutions, software patching systems, and file hosting platforms. Much like compression, CDC can introduce leakage when combined with encryption: fingerprinting attacks can exploit chunk length patterns to infer information about the data. To address these risks, many systems—mainly in the cloud backup setting—have developed bespoke mitigations by mixing a cryptographic key into the chunking process. We study these keyed CDC (KCDC) schemes “in the wild”, presenting efficient key recovery attacks against five different KCDC schemes, deployed in the backup solutions Borg, Bupstash, Duplicacy, Restic, and Tarsnap. Our attacks are in a realistic threat model that relies only on weak known or chosen-plaintext capabilities. This shows, in particular, that they fail to protect against fingerprinting attacks. To demonstrate practical exploitability, we also present “end-to-end” attacks on three complete encrypted backup applications, namely Borg, Restic and Tarsnap. These build on our attacks on the underlying KCDC schemes. In an effort to tackle these problems, we introduce the first formal treatment for KCDC schemes and propose a provably secure construction that fulfills a strong notion of security. We benchmark our construction against existing (broken) approaches, showing that it has competitive performance. In doing so, we take a step towards making real-world systems that rely on KCDC more resilient to attacks.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Preprint.
Keywords
backup systemscontent-defined chunkingcryptanalysis
Contact author(s)
kitruong @ ethz ch
simerz @ ethz ch
scmatteo @ ethz ch
mail @ felixguenther info
kenny paterson @ inf ethz ch
History
2025-03-28: approved
2025-03-26: received
See all versions
Short URL
https://ia.cr/2025/558
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2025/558,
      author = {Kien Tuong Truong and Simon-Philipp Merz and Matteo Scarlata and Felix Günther and Kenneth G. Paterson},
      title = {Breaking and Fixing Content-Defined Chunking},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/558},
      year = {2025},
      url = {https://eprint.iacr.org/2025/558}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.