research-article

Open access

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem

Authors:

Nicolas Chataing,

Gabriel Scherer,

Jeremy YallopAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue POPL

Article No.: 51, Pages 1509 - 1539

https://doi.org/10.1145/3632893

Published: 05 January 2024 Publication History

Abstract

We propose a new language feature for ML-family languages, the ability to selectively unbox certain data constructors, so that their runtime representation gets compiled away to just the identity on their argument. Unboxing must be statically rejected when it could introduce confusion, that is, distinct values with the same representation.

We discuss the use-case of big numbers, where unboxing allows to write code that is both efficient and safe, replacing either a safe but slow version or a fast but unsafe version. We explain the static analysis necessary to reject incorrect unboxing requests. We present our prototype implementation of this feature for the OCaml programming language, discuss several design choices and the interaction with advanced features such as Guarded Algebraic Datatypes.

Our static analysis requires expanding type definitions in type expressions, which is not necessarily normalizing in presence of recursive type definitions. In other words, we must decide normalization of terms in the first-order λ-calculus with recursion. We provide an algorithm to detect non-termination on-the-fly during reduction, with proofs of correctness and completeness. Our algorithm turns out to be closely related to the normalization strategy for macro expansion in the cpp preprocessor.

References

[1]

Ömer Sínan Ağacan. 2016. GHC unboxed sums. https://github.com/ghc/ghc/commit/714bebff44076061d0a719c4eda2cfd213b7ac3d

[2]

Noah Lev Bartell-Mangel. 2022. Filling a Niche: Using Spare Bits to Optimize Data Representations. https://www.noahlev.org/papers/popl22src-filling-a-niche.pdf POPL’22 student research presentation

[3]

Thaïs Baudon, Gabriel Radanne, and Laure Gonnord. 2023. Bit-Stealing Made Legal. In ICFP. https://doi.org/10.1145/3607858

[4]

Aria Beingessner. 2015. Rust RFC 1230: More Exotic Enum Layout Optimizations. https://github.com/rust-lang/rfcs/issues/1230

[5]

Michael Benfield. 2022. rustc PR 94075: Use niche-filling optimization even when multiple variants have data. https://github.com/rust-lang/rust/pull/94075

[6]

Mathieu Boespflug, Maxime Dénès, and Benjamin Grégoire. 2011. Full Reduction at Full Throttle. In CPP. https://inria.hal.science/hal-00650940

[7]

Eduard-Mihai Burtescu. 2017. rustc PR 45225: Refactor type memory layouts and ABIs, to be more general and easier to optimize. https://github.com/rust-lang/rust/pull/45225

[8]

Lloyd Chan. 2017. Scala Pre-SIP: Unboxed wrapper types. https://contributors.scala-lang.org/t/pre-sip-unboxed-wrapper-types/987

[9]

Zilin Chen, Ambroise Lafont, Liam O’Connor, Gabriele Keller, Craig McLaughlin, Vincent Jackson, and Christine Rizkallah. 2023. Dargent: A Silver Bullet for Verified Data Layout Refinement. PACMPL, 7, POPL (2023), Article 47, Jan, 27 pages. https://doi.org/10.1145/3571240

[10]

Simon Colin, Rodolphe Lepigre, and Gabriel Scherer. 2019. Unboxing Mutually Recursive Type Definitions in OCaml. In JFLA 2019. https://hal.inria.fr/hal-01929508

[11]

Stephen Compall. 2017. Blog post: the high cost of AnyVal classes. https://failex.blogspot.com/2017/04/the-high-cost-of-anyval-subclasses.html

[12]

Iavor S. Diatchki, Mark P. Jones, and Rebekah Leslie. 2005. High-Level Views on Low-Level Representations. In ICFP’05. http://web.cecs.pdx.edu/~mpj/pubs/bitdata-icfp05.pdf

[13]

Torbjörn Granlund and contributors. 1991. GMP. https://gmplib.org/

[14]

John Hughes. 1982. Super-Combinators a New Implementation Method for Applicative Languages. In Proceedings of the 1982 ACM Symposium on LISP and Functional Programming (LFP). https://doi.org/10.1145/800068.802129

[15]

Zurab Khasidashvil. 2020. A short proof of the decidability of normalization in recursive program schemes. In Shalva Pkhakadze’s Festschrift, AMIM Vol. 25 No. 2. http://www.viam.science.tsu.ge/Ami/2020_2/5_zura.pdf

[16]

Simon Marlow. 2003. GHC’s UNPACK pragma. https://github.com/ghc/ghc/commit/abbc5a0be1df84a33015470319062ed7a3aa3153

[17]

Antoine Miné and Xavier Leroy. 2012. Zarith. https://github.com/ocaml/Zarith/

[18]

Martin Odersky and Adriaan Moors. 2018. dotty PR 5300: Opaque types. https://github.com/lampepfl/dotty/pull/5300

[19]

Erik Osheim, Jorge Vicente Cantero, and Sébastien Doeraene. 2017. Scala SIP 35: Opaque types. https://contributors.scala-lang.org/t/pre-sip-unboxed-wrapper-types/987

[20]

Simon Peyton-Jones. 2007. GHC view patterns. https://gitlab.haskell.org/ghc/ghc/-/wikis/view-patterns

[21]

Gordon Plotkin. 2022. Recursion does not always help. arxiv:2206.08413

[22]

Dave Prosser. 1986. X3J11/86-196: Complete macro expansion algorithm. https://www.spinellis.gr/blog/20060626/x3J11-86-196.pdf

[23]

Sylvain Salvati and Igor Walukiewicz. 2015. Using models to model-check recursive schemes. Logical Methods in Computer Science, Volume 11, Issue 2 (2015), June, https://doi.org/10.2168/LMCS-11(2:7)2015

[24]

Diomidis Spinellis. 2008. A corrected and annotated version of the X4J11/86-196 document. https://www.spinellis.gr/blog/20060626/

[25]

Don Syme. 2016. Fsharp PR 1395: struct discriminated unions. https://github.com/dotnet/fsharp/pull/1395

[26]

Don Syme, Gregory Neverov, and James Margetson. 2007. Extensible Pattern Matching via a Lightweight Language Extension. In ICFP’07 (ICFP ’07). https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/p29-syme.pdf

[27]

The C++ standard committee, working group SG12. 2014. n3882; An update to the preprocessor specification. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3882.pdf

[28]

The C standard committee, working group WG14. 1992. Defect report 017. https://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_017.html

[29]

David A. Turner. 1979. A new implementation technique for applicative languages. In Software - Practice and Experience.

[30]

Stephen Weeks. 2006. Whole-Program Compilation in MLton. In ML Workshop 2006. http://www.mlton.org/References.attachments/060916-mlton.pdf

[31]

Jeremy Yallop. 2020. OCaml RFC: constructor unboxing. https://github.com/ocaml/RFCs/pull/14

Cited By

Teo BTitzer BNaik MPereira FTitzer B(2024)Unboxing Virgil ADTs for Fun and ProfitProceedings of the Workshop Dedicated to Jens Palsberg on the Occasion of His 60th Birthday10.1145/3694848.3694857(43-52)Online publication date: 22-Oct-2024
https://dl.acm.org/doi/10.1145/3694848.3694857
Elsman M(2024)Double-Ended Bit-Stealing for Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36746288:ICFP(88-120)Online publication date: 15-Aug-2024
https://dl.acm.org/doi/10.1145/3674628

Index Terms

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language features
        Data types and structures
      2. Language types
        Functional languages
2. Theory of computation
  1. Semantics and reasoning
    1. Program constructs
      1. Type structures

Recommendations

Self type constructors
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications

Bruce and Foster proposed the language LOOJ, an extension of Java with the notion of MyType, which represents the type of a self reference and changes its meaning along with inheritance. MyType is useful to write extensible yet type-safe classes for ...
Self type constructors
OOPSLA '09

Bruce and Foster proposed the language LOOJ, an extension of Java with the notion of MyType, which represents the type of a self reference and changes its meaning along with inheritance. MyType is useful to write extensible yet type-safe classes for ...
Unboxed values and polymorphic typing revisited
FPCA '95: Proceedings of the seventh international conference on Functional programming languages and computer architecture

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue POPL

January 2024

2820 pages

EISSN:2475-1421

DOI:10.1145/3554315

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2024

Published in PACMPL Volume 8, Issue POPL

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,655
Total Downloads

Downloads (Last 12 months)1,655
Downloads (Last 6 weeks)39

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Teo BTitzer BNaik MPereira FTitzer B(2024)Unboxing Virgil ADTs for Fun and ProfitProceedings of the Workshop Dedicated to Jens Palsberg on the Occasion of His 60th Birthday10.1145/3694848.3694857(43-52)Online publication date: 22-Oct-2024
https://dl.acm.org/doi/10.1145/3694848.3694857
Elsman M(2024)Double-Ended Bit-Stealing for Algebraic Data TypesProceedings of the ACM on Programming Languages10.1145/36746288:ICFP(88-120)Online publication date: 15-Aug-2024
https://dl.acm.org/doi/10.1145/3674628

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents