-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add options to skip decoding Statistics and SizeStatistics in Parquet metadata
#9008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Sorry I didn't see this one before. I'll try and review it shortly |
|
run benchmark encoding metadata |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @etseidl
I launched a few more benchmarks off just to be sure this doesn't have some weird impact but I don't expect it to
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Which issue does this PR close?
Rationale for this change
Add ability to skip the decoding of more types of statistics contained in the Parquet column metadata. While this currently doesn't have a huge impact on decode time, it can reduce the amount of memory used by the
ParquetMetaData.What changes are included in this PR?
Adds more options and tests for those options. Also adds size statistics to the metadata bench.
Are these changes tested?
Yes
Are there any user-facing changes?
Only adds new options, no breaking changes.