https://github.com/elastacloud/parquet-dotnet is about to be released in the following few days. Since v3.0 was pushed to the public, it saw a lot of interest and appraisal for it's incredible performance boost, however there were problems as well. To reiterate, v3.0 was a complete rewrite of 2.0 and allowed you to get deeper into parquet internals, especially API for creating *row groups*, writing columns directly, controlling row group sizes etc. Although this was a big improvement in the library's core itself, it made it harder to use for a general audience, because v2.0 had a handy *row-based interface* for accessing data. Although working with rows slows down parquet library, you will eventually run into a situation where you need to work with rows anyway. For instance, writing utilities for viewing parquet data, converting between parquet and row-based formats like CSV and so on. Therefore, V3.1 *resurrects row-based access* and makes it faster and better. The way you work with rows has changed slightly but mostly you shouldn't notice any differences at all. They come in play when working with complex data structures like maps, list, structures etc. Preview documentation for this feature is located here https://github.com/elastacloud/parquet-dotnet/blob/features/rows/doc/rows.md so feel free to browse and leave feedback either on this page or raise an issue on GitHub. PARQ We'd also like to announce that we're introducing .NET Core Global Tool in this version called parq. Full description is located here https://github.com/elastacloud/parquet-dotnet/blob/features/rows/doc/parq.md. Essentially it's a hassle free way to work with parquet files locally and the number of commands supported will continue to grow.
top of page
bottom of page