[arrow] Minimize allocation in GenericViewArray::slice() #9016

maxburke · 2025-12-18T16:14:21Z

Use the suggested Arc<[Buffer]> storage for ViewArray storage instead of an owned Vec so that the slice clone does not allocate.

Which issue does this PR close?

Closes Utf8View / BinaryView / StringViewArray::slice() and BinaryViewArray::slice() are slow (they allocate) #6408 .

mhilton · 2025-12-19T10:28:21Z

arrow-array/src/array/byte_view_array.rs

            nulls.shrink_to_fit();
        }
+        */
+        todo!()


Did you mean to leave this todo here? It seems to me that it might be important to finish this part. If it isn't necessary could you add a comment explaining why.

it is also bad we have no test coverage for this. we should make a PR to add some

Nope! I didn't mean to do that. I've pushed a fix. But at least it caught this test coverage miss? 😅

Nope! I didn't mean to do that. I've pushed a fix. But at least it caught this test coverage miss? 😅

Yes indeed -- it is bad we don't have a test for this

alamb

Thank you @maxburke and @mhilton -- this is a really cool PR. I like it a lot

I think once we sort out the "shrink_to_fit" bit it will be ready to go

cc @XiangpengHao and @Dandandan

alamb · 2025-12-19T18:27:04Z

arrow-array/src/array/byte_view_array.rs

    }

    fn shrink_to_fit(&mut self) {
+        /*


I think you can use Arc::make_mut here to modify (or clone) the buffers here as needed

fn shrink_to_fit(&mut self) { self.views.shrink_to_fit(); Arc::make_mut(&mut self.buffers).iter_mut().for_each(|b| b.shrink_to_fit()); if let Some(nulls) = &mut self.nulls { nulls.shrink_to_fit(); } }

I think the call to

self.buffers.shrink_to_fit();

which would shrink the Vec today doesn't really have an analog when the view has an Arc of the slice -- we could potentially call shrink_to_fit that prior to creating the Arc but I think that might be unecessary.

I would personally suggest not messing with the actual self.buffers call

I think I just committed a fix before you posted this 😅

Is shrink_to_fit a best-effort operation? If so, it's probably not necessary to try to shrink self.buffers...?

yeah my review overlapped with your push 🏃

Is shrink_to_fit a best-effort operation? If so, it's probably not necessary to try to shrink self.buffers...?

I think it is reasonable behavior. Maybe we can leave a comment explaining the rationale ('best effort' or something) -- the memory that will be saved from a few elements in a Vec is not likely high -- especially as slicing a generic view array doesn't adjust the buffers anyways

alamb · 2025-12-19T18:28:53Z

arrow-array/src/array/byte_view_array.rs

        let len = array.len();
-        array.buffers.insert(0, array.views.into_inner());
+
+        let mut buffers = array.buffers.iter().cloned().collect::<Vec<_>>();


I think you could call to_vec here

Suggested change

let mut buffers = array.buffers.iter().cloned().collect::<Vec<_>>();

let mut buffers = array.buffers.into_vec();

alamb · 2025-12-19T18:29:43Z

Since buffers is exposed I think this is an API change that will have to wait for the next major release (in Feb)

alamb

Thanks @maxburke -- this looks great to me. I do think some small finagling of shrink_to_fit would be useful too

alamb · 2025-12-19T18:30:52Z

arrow-array/src/array/byte_view_array.rs

    ///
    /// Panics if [`GenericByteViewArray::try_new`] returns an error
-    pub fn new(views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>) -> Self {
+    pub fn new<U>(views: ScalarBuffer<u128>, buffers: U, nulls: Option<NullBuffer>) -> Self


This is very elegant. It is nice to keep this API backwards compatible (anything that used to compile still compiles).

alamb · 2025-12-19T18:31:13Z

arrow-array/src/array/byte_view_array.rs


    /// Deconstruct this array into its constituent parts
-    pub fn into_parts(self) -> (ScalarBuffer<u128>, Vec<Buffer>, Option<NullBuffer>) {
+    pub fn into_parts(self) -> (ScalarBuffer<u128>, Arc<[Buffer]>, Option<NullBuffer>) {


this I think is a breaking API change

I've tested it with Datafusion and it dropped right in (both the mainline version and the hacked up and patched version we're using).

But, yeah, I can't speak for other users of the package.

alamb · 2025-12-19T18:34:14Z

arrow-array/src/array/byte_view_array.rs

-        self.buffers.iter_mut().for_each(|b| b.shrink_to_fit());
-        self.buffers.shrink_to_fit();
+
+        if let Some(buffers) = Arc::get_mut(&mut self.buffers) {


I think this will only shrink the buffers when there are no other outstanding references to this code. I think it would be better to call Arc::make_mut here to ensure that the buffers get shrunken

So the issue I have with Arc::make_mut is that if I slice array I now have two references to the underlying buffers. If I call shrink_to_fit on one of them, Arc::make_mut will clone the buffers and then shrink_to_fit will be called on one of them. But because the underlying buffers end up being cloned, it doesn't make the original allocation shrink and in the end it'll end up using more memory, until the other reference is dropped.

Additionally it'll create more allocator pressure because the buffer cloning will duplicate the buffer at it's pre-shrunken size before it's shrunk.

That makes sense -- let's keep it this way then. I do think it is worth a comment explaining the rationale, though, for future readers that may wonder the same thing

alamb · 2025-12-19T20:12:12Z

run benchmark view_types

alamb-ghbot · 2025-12-19T20:12:22Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (3691d2d) to 116ae12 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-19T20:15:56Z

🤖: Benchmark completed

Details

group                                             arrow-6408                              main
-----                                             ----------                              ----
gc view types all without nulls[100000]           1.00  1701.4±150.16µs        ? ?/sec    1.26      2.1±0.13ms        ? ?/sec
gc view types all without nulls[8000]             1.00     65.0±2.01µs        ? ?/sec     1.01     65.4±3.28µs        ? ?/sec
gc view types all[100000]                         1.08    290.2±7.91µs        ? ?/sec     1.00    268.1±7.14µs        ? ?/sec
gc view types all[8000]                           1.16     22.0±0.18µs        ? ?/sec     1.00     18.9±0.14µs        ? ?/sec
gc view types slice half without nulls[100000]    1.00   573.6±37.89µs        ? ?/sec     1.09   625.6±49.20µs        ? ?/sec
gc view types slice half without nulls[8000]      1.05     28.7±1.47µs        ? ?/sec     1.00     27.4±0.52µs        ? ?/sec
gc view types slice half[100000]                  1.10    141.3±2.93µs        ? ?/sec     1.00    128.1±2.54µs        ? ?/sec
gc view types slice half[8000]                    1.19     11.2±0.28µs        ? ?/sec     1.00      9.4±0.05µs        ? ?/sec
view types slice                                  1.00   610.4±11.65ns        ? ?/sec     1.12    681.4±9.32ns        ? ?/sec

Use the suggested Arc<[Buffer]> storage for ViewArray storage instead of an owned Vec<Buffer> so that the slice clone does not allocate.

alamb · 2025-12-19T22:57:12Z

view types slice 1.00 610.4±11.65ns ? ?/sec 1.12 681.4±9.32ns ? ?/sec

Not bad 😎

alamb · 2025-12-19T22:57:19Z

run benchmark view_types

alamb-ghbot · 2025-12-19T23:02:53Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-19T23:06:21Z

🤖: Benchmark completed

Details

group                                             arrow-6408                             main
-----                                             ----------                             ----
gc view types all without nulls[100000]           1.00  1633.5±52.88µs        ? ?/sec    1.03  1686.3±52.95µs        ? ?/sec
gc view types all without nulls[8000]             1.03     65.5±2.96µs        ? ?/sec    1.00     63.5±3.07µs        ? ?/sec
gc view types all[100000]                         1.12    291.5±7.88µs        ? ?/sec    1.00    259.4±6.79µs        ? ?/sec
gc view types all[8000]                           1.15     21.9±0.20µs        ? ?/sec    1.00     19.0±0.18µs        ? ?/sec
gc view types slice half without nulls[100000]    1.03   545.3±19.09µs        ? ?/sec    1.00   530.6±18.02µs        ? ?/sec
gc view types slice half without nulls[8000]      1.00     27.5±0.49µs        ? ?/sec    1.00     27.6±0.49µs        ? ?/sec
gc view types slice half[100000]                  1.08    138.7±3.05µs        ? ?/sec    1.00    128.0±3.65µs        ? ?/sec
gc view types slice half[8000]                    1.16     11.2±0.22µs        ? ?/sec    1.00      9.6±0.06µs        ? ?/sec
view types slice                                  1.00    611.8±9.83ns        ? ?/sec    1.11   681.0±14.72ns        ? ?/sec

alamb · 2025-12-20T11:51:15Z

🤔 it actually looks like GC'ing is slowing down -- probably from the need to do an extra allocation for new buffers

alamb · 2025-12-20T12:12:04Z

🤔 it actually looks like GC'ing is slowing down -- probably from the need to do an extra allocation for new buffers

Upon some more thought I think an extra allocation in GC is a better tradeoff, especially since a lot of GC'ing in downstream systems like DataFusion actually happens as part of concat and coalesce kernels

@XiangpengHao and @ctsk do you haven any thoughts on this tradeoff?

alamb · 2025-12-20T12:12:26Z

run benchmark coalesce_kernels concatenate_kernel

alamb-ghbot · 2025-12-20T12:12:52Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-20T12:33:10Z

🤖: Benchmark completed

Details

group                                                                                arrow-6408                             main
-----                                                                                ----------                             ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.01    261.8±3.68ms        ? ?/sec    1.00    258.1±2.69ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.01      8.6±0.15ms        ? ?/sec    1.00      8.5±0.32ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.0±0.10ms        ? ?/sec    1.03      4.1±0.14ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.01      3.5±0.04ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.00    251.8±2.55ms        ? ?/sec    1.00    250.7±2.69ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.02      9.6±0.15ms        ? ?/sec    1.00      9.4±0.11ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.01      4.6±0.09ms        ? ?/sec    1.00      4.6±0.12ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.6±0.03ms        ? ?/sec    1.02      4.7±0.17ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.03     59.3±0.51ms        ? ?/sec    1.00     57.4±1.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.02     11.4±0.23ms        ? ?/sec    1.00     11.2±0.12ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.1±0.25ms        ? ?/sec    1.01      9.2±0.25ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.07      8.7±0.34ms        ? ?/sec    1.00      8.1±0.18ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.01     68.9±0.54ms        ? ?/sec    1.00     68.0±1.49ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     12.7±0.39ms        ? ?/sec    1.00     12.7±0.33ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.03      9.8±0.30ms        ? ?/sec    1.00      9.6±0.15ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00     10.2±0.27ms        ? ?/sec    1.02     10.5±0.30ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.02     48.6±0.42ms        ? ?/sec    1.00     47.5±0.74ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.01      6.0±0.06ms        ? ?/sec    1.00      5.9±0.11ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.00      4.4±0.14ms        ? ?/sec    1.05      4.6±0.29ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.00      3.2±0.04ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.02     59.0±2.20ms        ? ?/sec    1.00     57.7±0.40ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.02      8.0±0.07ms        ? ?/sec    1.00      7.8±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.02      5.6±0.21ms        ? ?/sec    1.00      5.5±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.09ms        ? ?/sec    1.01      3.8±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.04     43.0±0.44ms        ? ?/sec    1.00     41.4±1.18ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.6±0.04ms        ? ?/sec    1.00      4.6±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.2±0.13ms        ? ?/sec    1.07      2.4±0.18ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00  1509.9±13.35µs        ? ?/sec    1.02  1535.8±23.70µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.03     52.4±0.55ms        ? ?/sec    1.00     51.0±1.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.03      7.1±0.06ms        ? ?/sec    1.00      6.9±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.05      3.7±0.23ms        ? ?/sec    1.00      3.5±0.08ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      3.8±0.02ms        ? ?/sec    1.02      3.9±0.03ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.03     97.1±1.30ms        ? ?/sec    1.00     94.3±0.76ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.01      9.1±0.22ms        ? ?/sec    1.00      9.0±0.04ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.08      4.1±0.29ms        ? ?/sec    1.00      3.8±0.42ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.02      3.1±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.02    125.8±2.13ms        ? ?/sec    1.00    123.0±2.27ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.02     14.8±0.21ms        ? ?/sec    1.00     14.6±0.11ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.00      7.0±0.34ms        ? ?/sec    1.06      7.4±0.45ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      8.8±0.08ms        ? ?/sec    1.02      9.0±0.05ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.07     69.3±1.32ms        ? ?/sec    1.00     64.5±0.33ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.10      8.0±0.11ms        ? ?/sec    1.00      7.3±0.35ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.08      4.6±0.27ms        ? ?/sec    1.00      4.3±0.41ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00  1422.9±11.40µs        ? ?/sec    1.00  1420.2±15.88µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.06     93.5±1.01ms        ? ?/sec    1.00     88.3±0.39ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.03     11.4±0.09ms        ? ?/sec    1.00     11.1±0.07ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.01      5.2±0.32ms        ? ?/sec    1.00      5.2±0.39ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.8±0.04ms        ? ?/sec    1.04      3.9±0.09ms        ? ?/sec

alamb-ghbot · 2025-12-20T12:33:14Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-20T12:43:36Z

🤖: Benchmark completed

Details

group                                                          arrow-6408                             main
-----                                                          ----------                             ----
concat 1024 arrays boolean 4                                   1.00     21.4±0.09µs        ? ?/sec    1.04     22.3±0.19µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.8±0.16µs        ? ?/sec    1.07     14.9±0.23µs        ? ?/sec
concat 1024 arrays str 4                                       1.02     37.0±0.43µs        ? ?/sec    1.00     36.3±0.32µs        ? ?/sec
concat boolean 1024                                            1.00    308.1±1.60ns        ? ?/sec    1.00    308.2±3.53ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.01      5.1±0.02µs        ? ?/sec    1.00      5.1±0.03µs        ? ?/sec
concat boolean nulls 1024                                      1.02    582.9±3.29ns        ? ?/sec    1.00   572.7±14.53ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.2±0.67µs        ? ?/sec    1.01     18.3±1.05µs        ? ?/sec
concat fixed size lists                                        1.00   726.4±28.95µs        ? ?/sec    1.01   735.4±22.92µs        ? ?/sec
concat i32 1024                                                1.01   391.9±17.46ns        ? ?/sec    1.00    386.2±1.64ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.02    208.6±6.13µs        ? ?/sec    1.00    205.2±6.90µs        ? ?/sec
concat i32 nulls 1024                                          1.00    603.8±6.72ns        ? ?/sec    1.03    621.9±4.39ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.01    238.9±9.09µs        ? ?/sec    1.00    237.4±8.95µs        ? ?/sec
concat str 1024                                                1.00     13.9±1.19µs        ? ?/sec    1.03     14.3±1.27µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    105.1±1.11ms        ? ?/sec    1.02    106.7±0.92ms        ? ?/sec
concat str nulls 1024                                          1.06      6.3±0.83µs        ? ?/sec    1.00      6.0±0.59µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     53.4±0.51ms        ? ?/sec    1.01     53.7±0.47ms        ? ?/sec
concat str_dict 1024                                           1.00      3.0±0.03µs        ? ?/sec    1.03      3.1±0.08µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.04µs        ? ?/sec    1.00      7.0±0.06µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      7.1±0.05µs        ? ?/sec    1.06      7.5±0.06µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     78.3±0.62µs        ? ?/sec    1.00     78.1±1.44µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.01     80.5±5.16µs        ? ?/sec    1.00     79.8±1.29µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     88.9±0.43µs        ? ?/sec    1.00     88.7±0.35µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.01     90.7±0.37µs        ? ?/sec    1.00     89.6±0.98µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.01     49.2±2.77µs        ? ?/sec    1.00     48.7±2.79µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.04     50.4±2.53µs        ? ?/sec    1.00     48.7±2.81µs        ? ?/sec

alamb · 2025-12-20T13:13:54Z

run benchmark coalesce_kernels concatenate_kernel view_types

alamb-ghbot · 2025-12-20T13:13:59Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-20T13:34:08Z

🤖: Benchmark completed

Details

group                                                                                arrow-6408                             main
-----                                                                                ----------                             ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.00    260.9±3.27ms        ? ?/sec    1.00    260.9±2.94ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.5±0.08ms        ? ?/sec    1.00      8.6±0.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.0±0.09ms        ? ?/sec    1.02      4.1±0.12ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.00      3.4±0.05ms        ? ?/sec    1.01      3.5±0.06ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.01    247.6±2.57ms        ? ?/sec    1.00    245.0±2.93ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.03      9.6±0.17ms        ? ?/sec    1.00      9.3±0.11ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.5±0.10ms        ? ?/sec    1.00      4.6±0.11ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.6±0.07ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.01     58.1±0.77ms        ? ?/sec    1.00     57.6±0.61ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.3±0.14ms        ? ?/sec    1.01     11.3±0.32ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.02      9.3±0.28ms        ? ?/sec    1.00      9.1±0.23ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.01      8.4±0.24ms        ? ?/sec    1.00      8.3±0.31ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.01     68.0±1.19ms        ? ?/sec    1.00     67.3±0.30ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     12.6±0.15ms        ? ?/sec    1.01     12.7±0.27ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00      9.7±0.28ms        ? ?/sec    1.03     10.0±0.37ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00      9.8±0.20ms        ? ?/sec    1.01      9.9±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.01     47.8±0.40ms        ? ?/sec    1.00     47.3±0.40ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      5.9±0.05ms        ? ?/sec    1.00      5.9±0.23ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.02      4.7±0.23ms        ? ?/sec    1.00      4.6±0.19ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.00      3.1±0.05ms        ? ?/sec    1.01      3.2±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.01     57.6±0.63ms        ? ?/sec    1.00     56.8±0.27ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      7.9±0.09ms        ? ?/sec    1.00      7.9±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.01      5.6±0.20ms        ? ?/sec    1.00      5.6±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.12ms        ? ?/sec    1.01      3.8±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.03     42.4±0.90ms        ? ?/sec    1.00     41.4±0.17ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.6±0.19ms        ? ?/sec    1.00      4.6±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.3±0.20ms        ? ?/sec    1.07      2.4±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00  1510.0±23.30µs        ? ?/sec    1.01  1532.2±28.63µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.01     51.6±0.32ms        ? ?/sec    1.00     50.8±0.29ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.03      7.1±0.25ms        ? ?/sec    1.00      6.9±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.01      3.6±0.13ms        ? ?/sec    1.00      3.6±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      3.8±0.05ms        ? ?/sec    1.01      3.9±0.10ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.02     96.5±0.76ms        ? ?/sec    1.00     94.8±0.39ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.00      9.0±0.07ms        ? ?/sec    1.00      9.0±0.05ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.7±0.24ms        ? ?/sec    1.04      3.8±0.37ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.06      3.2±0.06ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.01    123.9±1.45ms        ? ?/sec    1.00    123.0±1.51ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.01     14.7±0.11ms        ? ?/sec    1.00     14.6±0.10ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.00      7.0±0.34ms        ? ?/sec    1.01      7.1±0.35ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      8.8±0.10ms        ? ?/sec    1.02      9.0±0.15ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.07     68.7±0.49ms        ? ?/sec    1.00     64.3±0.30ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.08      8.0±0.05ms        ? ?/sec    1.00      7.4±0.15ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.03      4.2±0.32ms        ? ?/sec    1.00      4.1±0.37ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.08  1418.1±10.95µs        ? ?/sec    1.00  1308.3±31.98µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.05     92.6±0.58ms        ? ?/sec    1.00     87.9±0.74ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.02     11.3±0.14ms        ? ?/sec    1.00     11.1±0.08ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.02      5.6±0.31ms        ? ?/sec    1.00      5.5±0.40ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.6±0.03ms        ? ?/sec    1.07      3.9±0.02ms        ? ?/sec

alamb-ghbot · 2025-12-20T13:34:12Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-20T13:44:33Z

🤖: Benchmark completed

Details

group                                                          arrow-6408                             main
-----                                                          ----------                             ----
concat 1024 arrays boolean 4                                   1.00     21.4±0.07µs        ? ?/sec    1.04     22.3±0.37µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.8±0.05µs        ? ?/sec    1.06     14.6±0.11µs        ? ?/sec
concat 1024 arrays str 4                                       1.09     39.2±1.15µs        ? ?/sec    1.00     36.1±0.51µs        ? ?/sec
concat boolean 1024                                            1.00    308.6±4.06ns        ? ?/sec    1.00    309.1±7.59ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.02      5.2±0.09µs        ? ?/sec    1.00      5.1±0.03µs        ? ?/sec
concat boolean nulls 1024                                      1.03    585.0±5.15ns        ? ?/sec    1.00    567.8±4.50ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.2±0.33µs        ? ?/sec    1.00     18.3±0.20µs        ? ?/sec
concat fixed size lists                                        1.00   713.1±26.38µs        ? ?/sec    1.00   713.0±21.61µs        ? ?/sec
concat i32 1024                                                1.00    390.7±7.25ns        ? ?/sec    1.00    389.5±3.61ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.04    216.1±7.23µs        ? ?/sec    1.00    207.0±7.78µs        ? ?/sec
concat i32 nulls 1024                                          1.00   605.3±10.15ns        ? ?/sec    1.03   624.3±16.27ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    241.2±7.90µs        ? ?/sec    1.02    245.1±7.83µs        ? ?/sec
concat str 1024                                                1.00     13.6±0.82µs        ? ?/sec    1.07     14.6±1.15µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    106.2±1.03ms        ? ?/sec    1.00    105.3±1.59ms        ? ?/sec
concat str nulls 1024                                          1.00      6.0±0.76µs        ? ?/sec    1.09      6.6±0.93µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.04     53.5±0.68ms        ? ?/sec    1.00     51.2±0.85ms        ? ?/sec
concat str_dict 1024                                           1.00      2.9±0.02µs        ? ?/sec    1.02      3.0±0.07µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.08µs        ? ?/sec    1.01      7.0±0.09µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.01      7.1±0.07µs        ? ?/sec    1.00      7.0±0.05µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     77.8±0.44µs        ? ?/sec    1.00     77.9±1.54µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.6±0.83µs        ? ?/sec    1.00     79.9±0.66µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     88.8±0.69µs        ? ?/sec    1.00     88.7±0.77µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.01     90.9±1.28µs        ? ?/sec    1.00     90.1±0.54µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.03     48.6±3.01µs        ? ?/sec    1.00     47.3±3.58µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.04     51.7±3.10µs        ? ?/sec    1.00     49.8±3.30µs        ? ?/sec

alamb-ghbot · 2025-12-20T13:44:37Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arrow-6408 (5841826) to 240cbf4 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=arrow-6408
Results will be posted here when complete

alamb-ghbot · 2025-12-20T13:47:57Z

🤖: Benchmark completed

Details

group                                             arrow-6408                             main
-----                                             ----------                             ----
gc view types all without nulls[100000]           1.00  1527.1±44.91µs        ? ?/sec    1.01  1542.1±46.20µs        ? ?/sec
gc view types all without nulls[8000]             1.02     65.4±3.26µs        ? ?/sec    1.00     63.8±2.65µs        ? ?/sec
gc view types all[100000]                         1.11    284.6±7.60µs        ? ?/sec    1.00    257.5±4.69µs        ? ?/sec
gc view types all[8000]                           1.15     21.9±0.14µs        ? ?/sec    1.00     19.0±0.24µs        ? ?/sec
gc view types slice half without nulls[100000]    1.00    506.3±9.20µs        ? ?/sec    1.02   515.3±22.57µs        ? ?/sec
gc view types slice half without nulls[8000]      1.00     27.7±0.32µs        ? ?/sec    1.00     27.7±0.63µs        ? ?/sec
gc view types slice half[100000]                  1.09    140.9±3.52µs        ? ?/sec    1.00    129.7±3.36µs        ? ?/sec
gc view types slice half[8000]                    1.19     11.2±0.19µs        ? ?/sec    1.00      9.4±0.05µs        ? ?/sec
view types slice                                  1.00    615.2±8.47ns        ? ?/sec    1.11    682.3±4.06ns        ? ?/sec

github-actions bot added the arrow Changes to the arrow crate label Dec 18, 2025

mhilton reviewed Dec 19, 2025

View reviewed changes

maxburke force-pushed the arrow-6408 branch from f92385d to 934c195 Compare December 19, 2025 18:25

alamb reviewed Dec 19, 2025

View reviewed changes

alamb added the api-change Changes to the arrow API label Dec 19, 2025

alamb approved these changes Dec 19, 2025

View reviewed changes

alamb added the next-major-release the PR has API changes and it waiting on the next major version label Dec 19, 2025

alamb mentioned this pull request Dec 19, 2025

Continued: Use Arc<[Buffer]> instead of raw Vec<Buffer> in GenericByteViewArray for faster slice #7773

Closed

[arrow] Minimize allocation in GenericViewArray::slice()

5800dc1

Use the suggested Arc<[Buffer]> storage for ViewArray storage instead of an owned Vec<Buffer> so that the slice clone does not allocate.

maxburke force-pushed the arrow-6408 branch from 3691d2d to 5800dc1 Compare December 19, 2025 20:40

Merge branch 'main' into arrow-6408

5841826

apache deleted a comment from alamb-ghbot Dec 20, 2025

maxburke added 2 commits December 20, 2025 12:26

Merge branch 'main' into arrow-6408

4281e96

Merge branch 'main' into arrow-6408

094fa09

	let mut buffers = array.buffers.iter().cloned().collect::<Vec<_>>();
	let mut buffers = array.buffers.into_vec();

[arrow] Minimize allocation in GenericViewArray::slice() #9016

Are you sure you want to change the base?

[arrow] Minimize allocation in GenericViewArray::slice() #9016

Conversation

maxburke commented Dec 18, 2025

Which issue does this PR close?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 19, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxburke Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 19, 2025

Uh oh!

alamb-ghbot commented Dec 19, 2025

Uh oh!

alamb-ghbot commented Dec 19, 2025

Uh oh!

alamb commented Dec 19, 2025

Uh oh!

alamb commented Dec 19, 2025

Uh oh!

alamb-ghbot commented Dec 19, 2025

Uh oh!

alamb-ghbot commented Dec 19, 2025

Uh oh!

alamb commented Dec 20, 2025

Uh oh!

alamb commented Dec 20, 2025

Uh oh!

alamb commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb-ghbot commented Dec 20, 2025

Uh oh!

alamb Dec 19, 2025 •

edited

Loading

alamb Dec 19, 2025 •

edited

Loading

maxburke Dec 19, 2025 •

edited

Loading

alamb commented Dec 20, 2025 •

edited

Loading