If you are a complete novice to Paruqet we would recommend starting with these documents:
- The striping and assembly algorithms from the Dremel paper (what Parquet is based on)
- To better understand Parquet, especially what repetition and definition levels are - Dremel made simple with Parquet
If you are looking for a description of parquet encodings please follow this link.
To understand how Parquet represents rich logical types read this
There are already working implementations in other languages we find useful to check we are doing things right or when stuck understanding how particular feature is supposed to work.
parquet-mr is an official specification repository containing Thrift definitions for data structures within the Parquet file. This spec is referenced by any library that impelments Parquet.
fastparquet is probably the best implementation for Python, and it's extremely easy to follow. This is also our library of choice to work with parquet format (of course, before parquet-dotnet was created :) )
parquet-mr is an official Java implementation, somewheat overengineered, however the most stable.
parquet-cpp is an awful implementation in C++ language, struggling both with code quality and compatibility and I wouldn't recommend looking at it if you're new to parquet.
Snappy Sharp is used to compress and decompress via Snappy Algorithm