Skip to content

Conversation

@maxburke
Copy link
Contributor

Use the suggested Arc<[Buffer]> storage for ViewArray storage instead of an owned Vec so that the slice clone does not allocate.

Which issue does this PR close?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 18, 2025
nulls.shrink_to_fit();
}
*/
todo!()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave this todo here? It seems to me that it might be important to finish this part. If it isn't necessary could you add a comment explaining why.

Copy link
Contributor

@alamb alamb Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is also bad we have no test coverage for this. we should make a PR to add some

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I didn't mean to do that. I've pushed a fix. But at least it caught this test coverage miss? 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I didn't mean to do that. I've pushed a fix. But at least it caught this test coverage miss? 😅

Yes indeed -- it is bad we don't have a test for this

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @maxburke and @mhilton -- this is a really cool PR. I like it a lot

I think once we sort out the "shrink_to_fit" bit it will be ready to go

cc @XiangpengHao and @Dandandan

}

fn shrink_to_fit(&mut self) {
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use Arc::make_mut here to modify (or clone) the buffers here as needed

    fn shrink_to_fit(&mut self) {
        self.views.shrink_to_fit();
        Arc::make_mut(&mut self.buffers).iter_mut().for_each(|b| b.shrink_to_fit());
        if let Some(nulls) = &mut self.nulls {
            nulls.shrink_to_fit();
        }
    }

I think the call to

        self.buffers.shrink_to_fit();

which would shrink the Vec today doesn't really have an analog when the view has an Arc of the slice -- we could potentially call shrink_to_fit that prior to creating the Arc but I think that might be unecessary.

I would personally suggest not messing with the actual self.buffers call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just committed a fix before you posted this 😅

Is shrink_to_fit a best-effort operation? If so, it's probably not necessary to try to shrink self.buffers...?

Copy link
Contributor

@alamb alamb Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my review overlapped with your push 🏃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is shrink_to_fit a best-effort operation? If so, it's probably not necessary to try to shrink self.buffers...?

I think it is reasonable behavior. Maybe we can leave a comment explaining the rationale ('best effort' or something) -- the memory that will be saved from a few elements in a Vec is not likely high -- especially as slicing a generic view array doesn't adjust the buffers anyways

let len = array.len();
array.buffers.insert(0, array.views.into_inner());

let mut buffers = array.buffers.iter().cloned().collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could call to_vec here

Suggested change
let mut buffers = array.buffers.iter().cloned().collect::<Vec<_>>();
let mut buffers = array.buffers.into_vec();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alamb alamb added the api-change Changes to the arrow API label Dec 19, 2025
@alamb
Copy link
Contributor

alamb commented Dec 19, 2025

Since buffers is exposed I think this is an API change that will have to wait for the next major release (in Feb)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxburke -- this looks great to me. I do think some small finagling of shrink_to_fit would be useful too

///
/// Panics if [`GenericByteViewArray::try_new`] returns an error
pub fn new(views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>) -> Self {
pub fn new<U>(views: ScalarBuffer<u128>, buffers: U, nulls: Option<NullBuffer>) -> Self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very elegant. It is nice to keep this API backwards compatible (anything that used to compile still compiles).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉


/// Deconstruct this array into its constituent parts
pub fn into_parts(self) -> (ScalarBuffer<u128>, Vec<Buffer>, Option<NullBuffer>) {
pub fn into_parts(self) -> (ScalarBuffer<u128>, Arc<[Buffer]>, Option<NullBuffer>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this I think is a breaking API change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it with Datafusion and it dropped right in (both the mainline version and the hacked up and patched version we're using).

But, yeah, I can't speak for other users of the package.

self.buffers.iter_mut().for_each(|b| b.shrink_to_fit());
self.buffers.shrink_to_fit();

if let Some(buffers) = Arc::get_mut(&mut self.buffers) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will only shrink the buffers when there are no other outstanding references to this code. I think it would be better to call Arc::make_mut here to ensure that the buffers get shrunken

Copy link
Contributor Author

@maxburke maxburke Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue I have with Arc::make_mut is that if I slice array I now have two references to the underlying buffers. If I call shrink_to_fit on one of them, Arc::make_mut will clone the buffers and then shrink_to_fit will be called on one of them. But because the underlying buffers end up being cloned, it doesn't make the original allocation shrink and in the end it'll end up using more memory, until the other reference is dropped.

Additionally it'll create more allocator pressure because the buffer cloning will duplicate the buffer at it's pre-shrunken size before it's shrunk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense -- let's keep it this way then. I do think it is worth a comment explaining the rationale, though, for future readers that may wonder the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Dec 19, 2025
@alamb
Copy link
Contributor

alamb commented Dec 19, 2025

run benchmark view_types

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (3691d2d) to 116ae12 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                             arrow-6408                              main
-----                                             ----------                              ----
gc view types all without nulls[100000]           1.00  1701.4±150.16µs        ? ?/sec    1.26      2.1±0.13ms        ? ?/sec
gc view types all without nulls[8000]             1.00     65.0±2.01µs        ? ?/sec     1.01     65.4±3.28µs        ? ?/sec
gc view types all[100000]                         1.08    290.2±7.91µs        ? ?/sec     1.00    268.1±7.14µs        ? ?/sec
gc view types all[8000]                           1.16     22.0±0.18µs        ? ?/sec     1.00     18.9±0.14µs        ? ?/sec
gc view types slice half without nulls[100000]    1.00   573.6±37.89µs        ? ?/sec     1.09   625.6±49.20µs        ? ?/sec
gc view types slice half without nulls[8000]      1.05     28.7±1.47µs        ? ?/sec     1.00     27.4±0.52µs        ? ?/sec
gc view types slice half[100000]                  1.10    141.3±2.93µs        ? ?/sec     1.00    128.1±2.54µs        ? ?/sec
gc view types slice half[8000]                    1.19     11.2±0.28µs        ? ?/sec     1.00      9.4±0.05µs        ? ?/sec
view types slice                                  1.00   610.4±11.65ns        ? ?/sec     1.12    681.4±9.32ns        ? ?/sec

Use the suggested Arc<[Buffer]> storage for ViewArray storage instead of
an owned Vec<Buffer> so that the slice clone does not allocate.
@alamb
Copy link
Contributor

alamb commented Dec 19, 2025

view types slice 1.00 610.4±11.65ns ? ?/sec 1.12 681.4±9.32ns ? ?/sec

Not bad 😎

@alamb
Copy link
Contributor

alamb commented Dec 19, 2025

run benchmark view_types

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                             arrow-6408                             main
-----                                             ----------                             ----
gc view types all without nulls[100000]           1.00  1633.5±52.88µs        ? ?/sec    1.03  1686.3±52.95µs        ? ?/sec
gc view types all without nulls[8000]             1.03     65.5±2.96µs        ? ?/sec    1.00     63.5±3.07µs        ? ?/sec
gc view types all[100000]                         1.12    291.5±7.88µs        ? ?/sec    1.00    259.4±6.79µs        ? ?/sec
gc view types all[8000]                           1.15     21.9±0.20µs        ? ?/sec    1.00     19.0±0.18µs        ? ?/sec
gc view types slice half without nulls[100000]    1.03   545.3±19.09µs        ? ?/sec    1.00   530.6±18.02µs        ? ?/sec
gc view types slice half without nulls[8000]      1.00     27.5±0.49µs        ? ?/sec    1.00     27.6±0.49µs        ? ?/sec
gc view types slice half[100000]                  1.08    138.7±3.05µs        ? ?/sec    1.00    128.0±3.65µs        ? ?/sec
gc view types slice half[8000]                    1.16     11.2±0.22µs        ? ?/sec    1.00      9.6±0.06µs        ? ?/sec
view types slice                                  1.00    611.8±9.83ns        ? ?/sec    1.11   681.0±14.72ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Dec 20, 2025

🤔 it actually looks like GC'ing is slowing down -- probably from the need to do an extra allocation for new buffers

@alamb
Copy link
Contributor

alamb commented Dec 20, 2025

🤔 it actually looks like GC'ing is slowing down -- probably from the need to do an extra allocation for new buffers

Upon some more thought I think an extra allocation in GC is a better tradeoff, especially since a lot of GC'ing in downstream systems like DataFusion actually happens as part of concat and coalesce kernels

@XiangpengHao and @ctsk do you haven any thoughts on this tradeoff?

@alamb
Copy link
Contributor

alamb commented Dec 20, 2025

run benchmark coalesce_kernels concatenate_kernel

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@apache apache deleted a comment from alamb-ghbot Dec 20, 2025
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                arrow-6408                             main
-----                                                                                ----------                             ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.01    261.8±3.68ms        ? ?/sec    1.00    258.1±2.69ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.01      8.6±0.15ms        ? ?/sec    1.00      8.5±0.32ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.0±0.10ms        ? ?/sec    1.03      4.1±0.14ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.01      3.5±0.04ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.00    251.8±2.55ms        ? ?/sec    1.00    250.7±2.69ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.02      9.6±0.15ms        ? ?/sec    1.00      9.4±0.11ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.01      4.6±0.09ms        ? ?/sec    1.00      4.6±0.12ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.6±0.03ms        ? ?/sec    1.02      4.7±0.17ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.03     59.3±0.51ms        ? ?/sec    1.00     57.4±1.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.02     11.4±0.23ms        ? ?/sec    1.00     11.2±0.12ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.1±0.25ms        ? ?/sec    1.01      9.2±0.25ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.07      8.7±0.34ms        ? ?/sec    1.00      8.1±0.18ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.01     68.9±0.54ms        ? ?/sec    1.00     68.0±1.49ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     12.7±0.39ms        ? ?/sec    1.00     12.7±0.33ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.03      9.8±0.30ms        ? ?/sec    1.00      9.6±0.15ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00     10.2±0.27ms        ? ?/sec    1.02     10.5±0.30ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.02     48.6±0.42ms        ? ?/sec    1.00     47.5±0.74ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.01      6.0±0.06ms        ? ?/sec    1.00      5.9±0.11ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.00      4.4±0.14ms        ? ?/sec    1.05      4.6±0.29ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.00      3.2±0.04ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.02     59.0±2.20ms        ? ?/sec    1.00     57.7±0.40ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.02      8.0±0.07ms        ? ?/sec    1.00      7.8±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.02      5.6±0.21ms        ? ?/sec    1.00      5.5±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.09ms        ? ?/sec    1.01      3.8±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.04     43.0±0.44ms        ? ?/sec    1.00     41.4±1.18ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.6±0.04ms        ? ?/sec    1.00      4.6±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.2±0.13ms        ? ?/sec    1.07      2.4±0.18ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00  1509.9±13.35µs        ? ?/sec    1.02  1535.8±23.70µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.03     52.4±0.55ms        ? ?/sec    1.00     51.0±1.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.03      7.1±0.06ms        ? ?/sec    1.00      6.9±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.05      3.7±0.23ms        ? ?/sec    1.00      3.5±0.08ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      3.8±0.02ms        ? ?/sec    1.02      3.9±0.03ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.03     97.1±1.30ms        ? ?/sec    1.00     94.3±0.76ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.01      9.1±0.22ms        ? ?/sec    1.00      9.0±0.04ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.08      4.1±0.29ms        ? ?/sec    1.00      3.8±0.42ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.02      3.1±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.02    125.8±2.13ms        ? ?/sec    1.00    123.0±2.27ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.02     14.8±0.21ms        ? ?/sec    1.00     14.6±0.11ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.00      7.0±0.34ms        ? ?/sec    1.06      7.4±0.45ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      8.8±0.08ms        ? ?/sec    1.02      9.0±0.05ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.07     69.3±1.32ms        ? ?/sec    1.00     64.5±0.33ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.10      8.0±0.11ms        ? ?/sec    1.00      7.3±0.35ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.08      4.6±0.27ms        ? ?/sec    1.00      4.3±0.41ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00  1422.9±11.40µs        ? ?/sec    1.00  1420.2±15.88µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.06     93.5±1.01ms        ? ?/sec    1.00     88.3±0.39ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.03     11.4±0.09ms        ? ?/sec    1.00     11.1±0.07ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.01      5.2±0.32ms        ? ?/sec    1.00      5.2±0.39ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.8±0.04ms        ? ?/sec    1.04      3.9±0.09ms        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                          arrow-6408                             main
-----                                                          ----------                             ----
concat 1024 arrays boolean 4                                   1.00     21.4±0.09µs        ? ?/sec    1.04     22.3±0.19µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.8±0.16µs        ? ?/sec    1.07     14.9±0.23µs        ? ?/sec
concat 1024 arrays str 4                                       1.02     37.0±0.43µs        ? ?/sec    1.00     36.3±0.32µs        ? ?/sec
concat boolean 1024                                            1.00    308.1±1.60ns        ? ?/sec    1.00    308.2±3.53ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.01      5.1±0.02µs        ? ?/sec    1.00      5.1±0.03µs        ? ?/sec
concat boolean nulls 1024                                      1.02    582.9±3.29ns        ? ?/sec    1.00   572.7±14.53ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.2±0.67µs        ? ?/sec    1.01     18.3±1.05µs        ? ?/sec
concat fixed size lists                                        1.00   726.4±28.95µs        ? ?/sec    1.01   735.4±22.92µs        ? ?/sec
concat i32 1024                                                1.01   391.9±17.46ns        ? ?/sec    1.00    386.2±1.64ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.02    208.6±6.13µs        ? ?/sec    1.00    205.2±6.90µs        ? ?/sec
concat i32 nulls 1024                                          1.00    603.8±6.72ns        ? ?/sec    1.03    621.9±4.39ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.01    238.9±9.09µs        ? ?/sec    1.00    237.4±8.95µs        ? ?/sec
concat str 1024                                                1.00     13.9±1.19µs        ? ?/sec    1.03     14.3±1.27µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    105.1±1.11ms        ? ?/sec    1.02    106.7±0.92ms        ? ?/sec
concat str nulls 1024                                          1.06      6.3±0.83µs        ? ?/sec    1.00      6.0±0.59µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     53.4±0.51ms        ? ?/sec    1.01     53.7±0.47ms        ? ?/sec
concat str_dict 1024                                           1.00      3.0±0.03µs        ? ?/sec    1.03      3.1±0.08µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.04µs        ? ?/sec    1.00      7.0±0.06µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      7.1±0.05µs        ? ?/sec    1.06      7.5±0.06µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     78.3±0.62µs        ? ?/sec    1.00     78.1±1.44µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.01     80.5±5.16µs        ? ?/sec    1.00     79.8±1.29µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     88.9±0.43µs        ? ?/sec    1.00     88.7±0.35µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.01     90.7±0.37µs        ? ?/sec    1.00     89.6±0.98µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.01     49.2±2.77µs        ? ?/sec    1.00     48.7±2.79µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.04     50.4±2.53µs        ? ?/sec    1.00     48.7±2.81µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Dec 20, 2025

run benchmark coalesce_kernels concatenate_kernel view_types

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                arrow-6408                             main
-----                                                                                ----------                             ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.00    260.9±3.27ms        ? ?/sec    1.00    260.9±2.94ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.5±0.08ms        ? ?/sec    1.00      8.6±0.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.0±0.09ms        ? ?/sec    1.02      4.1±0.12ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.00      3.4±0.05ms        ? ?/sec    1.01      3.5±0.06ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.01    247.6±2.57ms        ? ?/sec    1.00    245.0±2.93ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.03      9.6±0.17ms        ? ?/sec    1.00      9.3±0.11ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.5±0.10ms        ? ?/sec    1.00      4.6±0.11ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.6±0.07ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.01     58.1±0.77ms        ? ?/sec    1.00     57.6±0.61ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.3±0.14ms        ? ?/sec    1.01     11.3±0.32ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.02      9.3±0.28ms        ? ?/sec    1.00      9.1±0.23ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.01      8.4±0.24ms        ? ?/sec    1.00      8.3±0.31ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.01     68.0±1.19ms        ? ?/sec    1.00     67.3±0.30ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     12.6±0.15ms        ? ?/sec    1.01     12.7±0.27ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00      9.7±0.28ms        ? ?/sec    1.03     10.0±0.37ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00      9.8±0.20ms        ? ?/sec    1.01      9.9±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.01     47.8±0.40ms        ? ?/sec    1.00     47.3±0.40ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      5.9±0.05ms        ? ?/sec    1.00      5.9±0.23ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.02      4.7±0.23ms        ? ?/sec    1.00      4.6±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.00      3.1±0.05ms        ? ?/sec    1.01      3.2±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.01     57.6±0.63ms        ? ?/sec    1.00     56.8±0.27ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      7.9±0.09ms        ? ?/sec    1.00      7.9±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.01      5.6±0.20ms        ? ?/sec    1.00      5.6±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.12ms        ? ?/sec    1.01      3.8±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.03     42.4±0.90ms        ? ?/sec    1.00     41.4±0.17ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.6±0.19ms        ? ?/sec    1.00      4.6±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.3±0.20ms        ? ?/sec    1.07      2.4±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00  1510.0±23.30µs        ? ?/sec    1.01  1532.2±28.63µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.01     51.6±0.32ms        ? ?/sec    1.00     50.8±0.29ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.03      7.1±0.25ms        ? ?/sec    1.00      6.9±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.01      3.6±0.13ms        ? ?/sec    1.00      3.6±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      3.8±0.05ms        ? ?/sec    1.01      3.9±0.10ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.02     96.5±0.76ms        ? ?/sec    1.00     94.8±0.39ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.00      9.0±0.07ms        ? ?/sec    1.00      9.0±0.05ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.7±0.24ms        ? ?/sec    1.04      3.8±0.37ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.06      3.2±0.06ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.01    123.9±1.45ms        ? ?/sec    1.00    123.0±1.51ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.01     14.7±0.11ms        ? ?/sec    1.00     14.6±0.10ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.00      7.0±0.34ms        ? ?/sec    1.01      7.1±0.35ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      8.8±0.10ms        ? ?/sec    1.02      9.0±0.15ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.07     68.7±0.49ms        ? ?/sec    1.00     64.3±0.30ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.08      8.0±0.05ms        ? ?/sec    1.00      7.4±0.15ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.03      4.2±0.32ms        ? ?/sec    1.00      4.1±0.37ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.08  1418.1±10.95µs        ? ?/sec    1.00  1308.3±31.98µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.05     92.6±0.58ms        ? ?/sec    1.00     87.9±0.74ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.02     11.3±0.14ms        ? ?/sec    1.00     11.1±0.08ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.02      5.6±0.31ms        ? ?/sec    1.00      5.5±0.40ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.6±0.03ms        ? ?/sec    1.07      3.9±0.02ms        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                          arrow-6408                             main
-----                                                          ----------                             ----
concat 1024 arrays boolean 4                                   1.00     21.4±0.07µs        ? ?/sec    1.04     22.3±0.37µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.8±0.05µs        ? ?/sec    1.06     14.6±0.11µs        ? ?/sec
concat 1024 arrays str 4                                       1.09     39.2±1.15µs        ? ?/sec    1.00     36.1±0.51µs        ? ?/sec
concat boolean 1024                                            1.00    308.6±4.06ns        ? ?/sec    1.00    309.1±7.59ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.02      5.2±0.09µs        ? ?/sec    1.00      5.1±0.03µs        ? ?/sec
concat boolean nulls 1024                                      1.03    585.0±5.15ns        ? ?/sec    1.00    567.8±4.50ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.2±0.33µs        ? ?/sec    1.00     18.3±0.20µs        ? ?/sec
concat fixed size lists                                        1.00   713.1±26.38µs        ? ?/sec    1.00   713.0±21.61µs        ? ?/sec
concat i32 1024                                                1.00    390.7±7.25ns        ? ?/sec    1.00    389.5±3.61ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.04    216.1±7.23µs        ? ?/sec    1.00    207.0±7.78µs        ? ?/sec
concat i32 nulls 1024                                          1.00   605.3±10.15ns        ? ?/sec    1.03   624.3±16.27ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    241.2±7.90µs        ? ?/sec    1.02    245.1±7.83µs        ? ?/sec
concat str 1024                                                1.00     13.6±0.82µs        ? ?/sec    1.07     14.6±1.15µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    106.2±1.03ms        ? ?/sec    1.00    105.3±1.59ms        ? ?/sec
concat str nulls 1024                                          1.00      6.0±0.76µs        ? ?/sec    1.09      6.6±0.93µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.04     53.5±0.68ms        ? ?/sec    1.00     51.2±0.85ms        ? ?/sec
concat str_dict 1024                                           1.00      2.9±0.02µs        ? ?/sec    1.02      3.0±0.07µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.08µs        ? ?/sec    1.01      7.0±0.09µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.01      7.1±0.07µs        ? ?/sec    1.00      7.0±0.05µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     77.8±0.44µs        ? ?/sec    1.00     77.9±1.54µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.6±0.83µs        ? ?/sec    1.00     79.9±0.66µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     88.8±0.69µs        ? ?/sec    1.00     88.7±0.77µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.01     90.9±1.28µs        ? ?/sec    1.00     90.1±0.54µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.03     48.6±3.01µs        ? ?/sec    1.00     47.3±3.58µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.04     51.7±3.10µs        ? ?/sec    1.00     49.8±3.30µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                             arrow-6408                             main
-----                                             ----------                             ----
gc view types all without nulls[100000]           1.00  1527.1±44.91µs        ? ?/sec    1.01  1542.1±46.20µs        ? ?/sec
gc view types all without nulls[8000]             1.02     65.4±3.26µs        ? ?/sec    1.00     63.8±2.65µs        ? ?/sec
gc view types all[100000]                         1.11    284.6±7.60µs        ? ?/sec    1.00    257.5±4.69µs        ? ?/sec
gc view types all[8000]                           1.15     21.9±0.14µs        ? ?/sec    1.00     19.0±0.24µs        ? ?/sec
gc view types slice half without nulls[100000]    1.00    506.3±9.20µs        ? ?/sec    1.02   515.3±22.57µs        ? ?/sec
gc view types slice half without nulls[8000]      1.00     27.7±0.32µs        ? ?/sec    1.00     27.7±0.63µs        ? ?/sec
gc view types slice half[100000]                  1.09    140.9±3.52µs        ? ?/sec    1.00    129.7±3.36µs        ? ?/sec
gc view types slice half[8000]                    1.19     11.2±0.19µs        ? ?/sec    1.00      9.4±0.05µs        ? ?/sec
view types slice                                  1.00    615.2±8.47ns        ? ?/sec    1.11    682.3±4.06ns        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Utf8View / BinaryView / StringViewArray::slice() and BinaryViewArray::slice() are slow (they allocate)

4 participants