Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization

Published: 29 October 2013 Publication History

Abstract

As more applications migrate to the cloud, and as "big data" edges into even more production environments, the performance and simplicity of exchanging data between compute nodes/devices is increasing in importance. An issue central to distributed programming, yet often under-considered, is serialization or pickling, i.e., persisting runtime objects by converting them into a binary or text representation. Pickler combinators are a popular approach from functional programming; their composability alleviates some of the tedium of writing pickling code by hand, but they don't translate well to object-oriented programming due to qualities like open class hierarchies and subtyping polymorphism. Furthermore, both functional pickler combinators and popular, Java-based serialization frameworks tend to be tied to a specific pickle format, leaving programmers with no choice of how their data is persisted. In this paper, we present object-oriented pickler combinators and a framework for generating them at compile-time, called scala/pickling, designed to be the default serialization mechanism of the Scala programming language. The static generation of OO picklers enables significant performance improvements, outperforming Java and Kryo in most of our benchmarks. In addition to high performance and the need for little to no boilerplate, our framework is extensible: using the type class pattern, users can provide both (1) custom, easily interchangeable pickle formats and (2) custom picklers, to override the default behavior of the pickling framework. In benchmarks, we compare scala/pickling with other popular industrial frameworks, and present results on time, memory usage, and size when pickling/unpickling a number of data types used in real-world, large-scale distributed applications and frameworks.

References

[1]
AvroApache. Avro®. http://avro.apache.org. Accessed: 2013-08-11.
[2]
A. W. Appel and M. J. R. Gonçalves. Hash-consing garbage collection. Technical Report CS-TR-412-93, Princeton University, Computer Science Department, 1993.
[3]
M. Armbrust, A. Fox, D. A. Patterson, N. Lanham, B. Trushkowsky, J. Trutna, and H. Oh. SCADS: Scale-independent storage for social computing applications. In CIDR, 2009.
[4]
Azavea. GeoTrellis. http://www.azavea.com/products/geotrellis/, 2010. Accessed: 2013-08-11.
[5]
E. Burmako and M. Odersky. Scala macros, a technical report. In Third International Valentin Turchin Workshop on Meta-computation, 2012.
[6]
L. Cardelli, J. E. Donahue, M. J. Jordan, B. Kalsow, and G. Nelson. The modula-3 type system. In POPL, pages 202--212, 1989.
[7]
B. Carpenter, G. Fox, S. H. Ko, and S. Lim. Object serialization for marshalling data in a Java interface to MPI. In Java Grande, pages 66--71, 1999.
[8]
B. C. d. S. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits. In OOPSLA, pages 341--360, 2010.
[9]
G. Dubochet. Embedded Domain-Specific Languages using Libraries and Dynamic Metaprogramming. PhD thesis, EPFL, Switzerland, 2011.
[10]
M. Elsman. Type-specialized serialization with sharing. In Trends in Functional Programming, pages 47--62, 2005.
[11]
C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In PLDI, pages 237--247. 1993.
[12]
J. Gil and I. Maman. Whiteoak: introducing structural typing into Java. In G. E. Harris, editor, OOPSLA, pages 73--90, 2008.
[13]
Google. Protocol Buffers. https://code.google.com/p/protobuf/, 2008. Accessed: 2013-08-11.
[14]
P. Haller and M. Odersky. Capabilities for uniqueness and borrowing. In T. D'Hondt, editor, ECOOP, pages 354--378, 2010.
[15]
M. Herlihy and B. Liskov. A value transmission method for abstract data types. ACM Trans. Program. Lang. Syst, 4 (4): 527--551, 1982.
[16]
A. Igarashi, B. C. Pierce, and P. Wadler. Featherweight Java: a minimal core calculus for Java and GJ. ACM Trans. Program. Lang. Syst, 23 (3): 396--450, May 2001.
[17]
A. Kennedy. Pickler combinators. J. Funct. Program., 14 (6): 727--739, 2004.
[18]
J. Maassen, R. van Nieuwpoort, R. Veldema, H. E. Bal, and A. Plaat. An efficient implementation of Java's remote method invocation. In PPOPP, pages 173--182, Aug. 1999.
[19]
J. P. Magalhães, A. Dijkstra, J. Jeuring, and A. Löh. A generic deriving mechanism for Haskell. In J. Gibbons, editor, Haskell, pages 37--48, 2010.
[20]
Nathan Marz and James Xu and Jason Jackson et al. Storm. http://storm-project.net/, 2012. Accessed: 2013-08-11.
[21]
Nathan Sweet et al. Kryo. https://code.google.com/p/kryo/. Accessed: 2013-08-11.
[22]
K. Ng, M. Warren, P. Golde, and A. Hejlsberg. The Roslyn project: Exposing the C# and VB compiler's code analysis. http://msdn.microsoft.com/en-gb/hh500769, Sept. 2012. Accessed: 2013-08-11.
[23]
M. Odersky. Scala Language Specification. http://www.scala-lang.org/files/archive/nightly/pdfs/ScalaReference.pdf, 2013. Accessed: 2013-08-11.
[24]
M. Odersky and M. Zenger. Scalable component abstractions. In R. E. Johnson and R. P. Gabriel, editors, OOPSLA, pages 41--57, 2005.
[25]
Oracle, Inc. Java Object Serialization Specification. http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serialTOC.html, 2011. Accessed: 2013-08-11.
[26]
Oscar Boykin and Mike Gagnon and Sam Ritchie. Twitter Chill. https://github.com/twitter/chill, 2012. Accessed: 2013-08-11.
[27]
M. Philippsen, B. Haumacher, and C. Nester. More efficient serialization and RMI for Java. Concurrency - Practice and Experience, 12 (7): 495--518, 2000.
[28]
B. C. Pierce. Types and Programming Languages. MIT Press, Cambridge, MA, 2002.
[29]
G. D. Reis and B. Stroustrup. Specifying C++ concepts. In J. G. Morrisett and S. L. P. Jones, editors, POPL, pages 295--308, 2006.
[30]
A. Rossberg. Typed open programming: a higher-order, typed approach to dynamic modularity and distribution. PhD thesis, Saarland University, 2007.
[31]
A. Rossberg, G. Tack, and L. Kornstaedt. Status report: HOT pickles, and how to serve them. In ML, pages 25--36, 2007.
[32]
P. V. Roy. Announcing the mozart programming system. SIGPLAN Notices, 34 (4): 33--34, 1999.
[33]
D. Shabalin, E. Burmako, and M. Odersky. Quasiquotes for Scala. Technical Report EPFL-REPORT-185242, EPFL, Switzerland, 2013.
[34]
K. Skalski. Syntax-extending and type-reflecting macros in an object-oriented language. Master's thesis, University of Warsaw, Poland, 2005.
[35]
R. Strnisa, P. Sewell, and M. J. Parkinson. The Java module system: core design and semantic definition. In OOPSLA, pages 499--514, 2007.
[36]
G. Tack, L. Kornstaedt, and G. Smolka. Generic pickling and minimization. Electr. Notes Theor. Comput. Sci, 148 (2): 79--103, 2006.
[37]
Typesafe. Akka. http://akka.io/, 2009. Accessed: 2013-08-11.
[38]
G. van Rossum. Python programming language. In USENIX Annual Technical Conference. USENIX, 2007.
[39]
D. Vytiniotis and A. J. Kennedy. Functional pearl: every bit counts. SIGPLAN Not., 45 (9): 15--26, Sept. 2010.
[40]
S. Wehr and P. Thiemann. JavaGI: The interaction of type classes with interfaces and inheritance. ACM Trans. Program. Lang. Syst, 33 (4): 12, 2011.
[41]
M. Welsh and D. E. Culler. Jaguar: enabling efficient communication and I/O in Java. Concurrency - Practice and Experience, 12 (7), 2000.
[42]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI. USENIX, 2012.

Cited By

View all
  • (2018)Cloud‐based video analytics using convolutional neural networksSoftware: Practice and Experience10.1002/spe.263649:4(565-583)Online publication date: 13-Sep-2018
  • (2023)A comprehensive deep learning method for empirical spectral prediction and its quantitative validation of nano-structured dimersScientific Reports10.1038/s41598-023-28076-313:1Online publication date: 20-Jan-2023
  • (2022)Artificial Neural Network Modelling for Optimizing the Optical Parameters of Plasmonic Paired NanostructuresNanomaterials10.3390/nano1201017012:1(170)Online publication date: 4-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 48, Issue 10
OOPSLA '13
October 2013
867 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2544173
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
    October 2013
    904 pages
    ISBN:9781450323741
    DOI:10.1145/2509136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2013
Published in SIGPLAN Volume 48, Issue 10

Check for updates

Author Tags

  1. distributed programming
  2. meta-programming
  3. pickling
  4. scala
  5. serialization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Cloud‐based video analytics using convolutional neural networksSoftware: Practice and Experience10.1002/spe.263649:4(565-583)Online publication date: 13-Sep-2018
  • (2023)A comprehensive deep learning method for empirical spectral prediction and its quantitative validation of nano-structured dimersScientific Reports10.1038/s41598-023-28076-313:1Online publication date: 20-Jan-2023
  • (2022)Artificial Neural Network Modelling for Optimizing the Optical Parameters of Plasmonic Paired NanostructuresNanomaterials10.3390/nano1201017012:1(170)Online publication date: 4-Jan-2022
  • (2022)Enhancing closures in scala 3 with spores3Proceedings of the Scala Symposium10.1145/3550198.3550428(22-27)Online publication date: 6-Jun-2022
  • (2021)FlashByte: Improving Memory Efficiency with Lightweight Native Storage2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00016(61-70)Online publication date: May-2021
  • (2020)A specialized architecture for object serialization with applications to big data analyticsProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00036(322-334)Online publication date: 30-May-2020
  • (2019)Scala implicits are everywhere: a large-scale study of the use of Scala implicits in the wildProceedings of the ACM on Programming Languages10.1145/33605893:OOPSLA(1-28)Online publication date: 10-Oct-2019
  • (2019)DecaACM Transactions on Computer Systems10.1145/331036136:1(1-47)Online publication date: 14-Mar-2019
  • (2018)A programming model and foundation for lineage-based distributed computationJournal of Functional Programming10.1017/S095679681800003528Online publication date: 12-Mar-2018
  • (2017)Interactive Proof Presentations with CobraElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.239.4239(43-52)Online publication date: 24-Jan-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media