Allocate buffers before work in boolean_kernels benchmark
#9035
+21
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
notkernel by 50%, addBooleanBuffer::from_bitwise_unary#8996Rationale for this change
When working on improving the boolean kernels, I have seen significant and unexplained noise from run to run. For example, just adding a fast path for
u64aligned data resulted in a reported 30% regression in the speed of slice24 (code that is not affected by the change at all).for example, from #9022
I also can't reproduce this effect locally or when I run the benchmarks individually.
Given the above, and the tiny amount of time spent in the benchmark (hundreds of nanoseconds), I believe what is happening is that changing the allocation pattern during the benchmark runs (each kernel allocates output), data for subsequent iterations is allocated subtlety differently (e.g. the exact alignment or some other factor is different).
This results in different performance characteristics even when the code has not
changed.
What changes are included in this PR?
To reduce this noise, I want to change the benchmarks to pre-allocate the input.
Are these changes tested?
I ran them manually
Are there any user-facing changes?
No, this is just a benchmark change