Abstract
The past decade has witnessed rapid progress in deep learning for molecular design, owing to the availability of invertible and invariant representations for molecules such as simplified molecular-input line-entry system (SMILES), which has powered cheminformatics since the late 1980s. However, the design of elemental components and their structural arrangement in solid-state materials to achieve certain desired properties is still a long-standing challenge in physics, chemistry and biology. This is primarily due to, unlike molecular inverse design, the lack of an invertible crystallographic representation that satisfies symmetry invariances. To address this issue, we have developed a simplified line-input crystal-encoding system (SLICES), which is a string-based crystallographic representation that satisfies both invertibility and symmetry invariances. SLICES successfully reconstructed 94.95% of over 40,000 structurally and chemically diverse crystal structures, showcasing an unprecedented invertibility. Furthermore, by only encoding compositional and topological data, SLICES guarantees symmetry invariances. We demonstrate the application of SLICES in the inverse design of direct narrow-gap semiconductors for optoelectronic applications. As a string-based, invertible, invariant and efficient crystallographic representation, SLICES has the potential to become a standard tool for in-silico materials discovery.