-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add benchmarks for Utf8View scalars for zip #8988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
93a5b60 to
6d5422a
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mkleen
arrow/benches/zip_kernels.rs
Outdated
|
|
||
| let null_scalar = input_generator_1.generate_scalar_with_null_value(); | ||
|
|
||
| let [non_null_scalar_1]: [_; 1] = input_generator_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this code:
- It seems to be making an array (not a scalar)
generate_non_null_scalarsreturns a vector but then the callsite only ever seems to use a single return value -- why not just return a single array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, generate_non_null_scalars is also used at https://github.com/apache/arrow-rs/blob/main/arrow/benches/zip_kernels.rs#L156-L157 to retrieve two values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you prefer to introduce a generate_non_null_scalar function returning a single Array for all InputGenerators ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a simplified version. I hope this makes more sense to you.
fc37028 to
5b6d229
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mkleen
arrow/benches/zip_kernels.rs
Outdated
|
|
||
| let null_scalar = input_generator_1.generate_null(); | ||
|
|
||
| let non_null_scalar_1 = input_generator_1.generate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry -- I still don't get this benchmark -- calling input_generator_1.generate() results in a array (not a scalar):
impl GenerateStringView {
fn name(&self) -> &str {
self.description.as_str()
}
fn generate_null(&self) -> ArrayRef {
new_null_array(&DataType::Utf8View, 1)
}
fn generate(&self) -> ArrayRef {
Arc::new(create_string_view_array_with_fixed_len(
1,
0.0,
self.str_len,
))
}
}I would have expected a test something like
let null_scalar = input_generator.generate_null_scalar();
let non_null_scalar = input_generator.generate_scalar(); // make a 1 row array
let array = input_generator.generate_array(); // make a 8192 array
...
And then you could run the combinations (array, non_null_scalar),
(non_null_scalar, array),
(array, null_scalar),
(null_scalar, array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to create the benchmark for the combination where both sides are scalars, with the following combinations:
(non_null_scalar, non_null_scalar)
(null_scalar, non_null_scalar)
(non_null_scalar, null_scalar)
and different string-view sizes on each side with different distributions of true/false etc. Each value is an array with a single value, which is then wrapped in a Scalar to match this condition:
https://github.com/apache/arrow-rs/blob/main/arrow-select/src/zip.rs#L104
and dispatch to the implementation to zip two scalars.
So you are saying this assumption is not correct and it should be instead (array, non_null_scalar) with an array of the length 8192 ?
Then i need to rework the benchmarks completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this code from the given zip benchmark as foundation:
https://github.com/apache/arrow-rs/blob/main/arrow/benches/zip_kernels.rs#L149-L183
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just double-checked the benchmarks from #8653 (the original optimization) see #8653 (comment). The optimisation only kicks in when both truthy and falsy are scalars, meaning arrays with single rows. The array vs scalar case is not improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the generator returns now scalars instead of arrays directly. Regarding the combinations: I still think these are valid. The optimisation is based on scalar vs scalar values.
53577d7 to
f2c7015
Compare
|
Dear @alamb, I just pushed another version. This just implements the existing This does the job to show the improvements of #8963 for the first three combinations in the spirit of #8653 If you prefer to also refactor the benchmarks further with your suggestion regarding the return types of the |
f2c7015 to
2c558de
Compare
Which issue does this PR close?
N/A
Rationale for this change
I have a PR to improve zip perf for Utf8View/BinaryView scalars and I need benchmarks for that.
What changes are included in this PR?
This extends the zip benchmarks by one new Input Generator for StringViews and two more functions to test scalar combinations of different StringViews combinations.
Are these changes tested?
N/A
Are there any user-facing changes?
No