Motivated by applications in DNA storage, we study a setting in which strings are affected by tandem-duplication errors. In particular, we look at two settings: disjoint tandem-duplication errors, and equal-length tandem-duplication errors. We construct codes, with positive asymptotic rate, for the two settings, as well as for their combination. Our constructions are duplication-free codes, comprising codewords that do not contain tandem duplications of specific lengths. Additionally, our codes generalize previous constructions, containing them as special cases.
Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
The codes \(C_F\) in [11] contain all the duplication-free strings of length up to n, with a proper padding to make them all of length n.
Ben-Tolila E., Schwartz M.: On the reverse-complement string-duplication system. IEEE Trans. Inform. Theory 68(11), 7184–7197 (2022).
Berstel J.: Growth of repetition-free words—a review. Theor. Comp. Sci. 340(2), 280–290 (2005).
Church G.M., Gao Y., Kosuri S.: Next-generation digital information storage in DNA. Science 337, 1628 (2012).
Elishco O.: On the long-term behavior of \(k\)-tuples frequencies in mutation systems. arXiv preprint arXiv:2401.04020 (2024).
Elishco O., Farnoud F., Schwartz M., Bruck J.: The entropy rate of some Pólya string models. IEEE Trans. Inform. Theory 65(12), 8180–8193 (2019).
Farnoud F., Schwartz M., Bruck J.: The capacity of string-duplication systems. IEEE Trans. Inform. Theory 62(2), 811–824 (2016).
Farnoud F., Schwartz M., Bruck J.: Estimation of duplication history under a stochastic model for tandem repeats. BMC Bioinform. 20(64), 1–11 (2019).
Goldman N., Bertone P., Chen S., Dessimoz C., LeProust E.M., Sipos B., Birney E.: Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494(7435), 77–80 (2013).
Goshkoder D., Polyanskii N., Vorobyev I.: Codes correcting a single long duplication error. In: Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT2023), Taipei, Taiwan, pp. 2708–2713 (2023).
Jain S., Farnoud F., Bruck J.: Capacity and expressiveness of genomic tandem duplication. IEEE Trans. Inform. Theory 63(10), 6129–6138 (2017).
Jain S., Farnoud F., Schwartz M., Bruck J.: Duplication-correcting codes for data storage in the DNA of living organisms. IEEE Trans. Inform. Theory 63(8), 4996–5010 (2017).
Kovačević M.: Zero-error capacity of duplication channels. IEEE Trans. Commun. 67(10), 6735–6742 (2019).
Lenz A., Wachter-Zeh A., Yaakobi E.: Duplication-correcting codes. Des. Codes Cryptogr. 87, 277–298 (2019).
Lind D., Marcus B.H.: An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, Cambridge (1985).
Marcus B.H., Roth R.M., Siegel P.H.: An Introduction to Coding for Constrained Systems (2001). Unpublished lecture notes. https://personal.math.ubc.ca/~marcus/Handbook/index.html.
Nguyen T.T., Cai K., Song W., Immink K.A.S.: Optimal single chromosome-inversion correcting codes for data storage in live DNA. In: Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT2022), Espoo, Finland, pp. 1791–1796 (2022).
Shipman S.L., Nivala J., Macklis J.D., Church G.M.: CRISPR-Cas encoding of digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017).
Tang Y., Farnoud F.: Error-correcting codes for short tandem duplication and edit errors. IEEE Trans. Inform. Theory 68(2), 871–880 (2021).
Tang Y., Yehezkeally Y., Schwartz M., Farnoud F.: Single-error detection and correction for duplication and substitution channels. IEEE Trans. Inform. Theory 66(11), 6908–6919 (2020).
Tang Y., Wang S., Lou H., Gabrys R., Farnoud F.: Low-redundancy codes for correcting multiple short-duplication and edit errors. IEEE Trans. Inform. Theory 69(5), 2940–2954 (2023).
Yohananov L., Schwartz M.: Optimal reverse-complement-duplication error-correcting codes. arXiv preprint arXiv:2312.00394 (2023).
Zeraatpisheh M., Esmaeili M., Gulliver T.A.: Construction of tandem duplication correcting codes. IET Commun. 13(15), 2217–2225 (2019).
Zeraatpisheh M., Esmaeili M., Gulliver T.A.: Construction of duplication correcting codes. IEEE Access 8, 96150–96161 (2020).
This work was supported in part by the Zhejiang Lab BioBit Program (grant no. 2022YFB507). The author M. Schwartz is currently on a leave of absence from Ben-Gurion University of the Negev.
Author information
Authors and Affiliations
All authors contributed to the design and implementation and wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors declare no competing interests.
Additional information
Communicated by T. Feng.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, W., Schwartz, M. On duplication-free codes for disjoint or equal-length errors. Des. Codes Cryptogr. 92, 2845–2861 (2024). https://doi.org/10.1007/s10623-024-01417-7
Issue Date:
DOI: https://doi.org/10.1007/s10623-024-01417-7