Abstract
Apache Parquet is an efficient, structured, column-oriented (also called columnar storage), compressed, binary file format. Parquet supports several compression codecs, including Snappy, GZIP, deflate, and BZIP2. Snappy is the default. Structured file formats such as RCFile, Avro, SequenceFile, and Parquet offer better performance with compression support, which reduces the size of the data on the disk and consequently the I/O and CPU resources required to deserialize data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Deepak Vohra
About this chapter
Cite this chapter
Vohra, D. (2016). Apache Parquet. In: Practical Hadoop Ecosystem. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-2199-0_8
Download citation
DOI: https://doi.org/10.1007/978-1-4842-2199-0_8
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-2198-3
Online ISBN: 978-1-4842-2199-0
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)