Fix row slice bug in Union column decoding with many columns #9000

friendlymatthew · 2025-12-15T18:44:35Z

Which issue does this PR close?

Closes Fix panic when decoding multiple Union columns in RowConverter #8999

Rationale for this change

This PR fixes a bug in the row-to-column conversion for Union types when multiple union columns are present in the same row converter

Previously, the row slice was being consumed from reading their data correctly. The fix tracks bytes consumed per row across all union fields, this way it properly advances row slices

friendlymatthew · 2025-12-22T17:28:51Z

@Jefffrey would you have time to review this

Jefffrey · 2025-12-23T08:05:49Z

@Jefffrey would you have time to review this

I'll try take a look when I have time

Jefffrey · 2025-12-23T14:10:33Z

@friendlymatthew when I check this PR out locally, it seems to fail to compile regarding the tests; I assume it's because it depends on #8891?

Funny that the CI didn't fail, presumably because it merged in from main? 🤔

Jefffrey

This is a nice pickup, and the fix makes sense to me.

One minor note:

This PR fixes a bug in the row-to-column conversion for Union types when multiple union columns are present in the same row converter

I think the bug would occur as long as there are multiple columns and there is a union column that isn't last; so something like [Union, Int64] array input would also fail, not necessarily about multiple union columns.

arrow-row/src/lib.rs

Jefffrey · 2025-12-23T15:37:13Z

arrow-row/src/lib.rs

+                        // track bytes consumed for rows that belong to this field
+                        for (row_idx, child_row) in field_rows.iter() {
+                            let remaining_len = sparse_data[*row_idx].len();
+                            bytes_consumed[*row_idx] = 1 + child_row.len() - remaining_len;
+                        }


Suggested change

// track bytes consumed for rows that belong to this field

for (row_idx, child_row) in field_rows.iter() {

let remaining_len = sparse_data[*row_idx].len();

bytes_consumed[*row_idx] = 1 + child_row.len() - remaining_len;

}

// ensure we advance pass consumed bytes in rows

for (row_idx, child_row) in field_rows.iter() {

let remaining_len = sparse_data[*row_idx].len();

let consumed_length = 1 + child_row.len() - remaining_len;

rows[*row_idx] = &rows[*row_idx][consumed_length..];

}

Thoughts of inlining it like this, which can remove the need for a separate bytes_consumed vec?

github-actions bot added the arrow Changes to the arrow crate label Dec 15, 2025

friendlymatthew force-pushed the friendlymatthew/track-bytes-during-union-row-decoding branch from c64321b to ba7698c Compare December 15, 2025 18:50

Jefffrey approved these changes Dec 23, 2025

View reviewed changes

Fix panic

9a418ec

friendlymatthew force-pushed the friendlymatthew/track-bytes-during-union-row-decoding branch from ba7698c to 9a418ec Compare December 24, 2025 16:40

Inline, remove intermediary buff

a50d1e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix row slice bug in Union column decoding with many columns #9000

Fix row slice bug in Union column decoding with many columns #9000

friendlymatthew commented Dec 15, 2025

Uh oh!

friendlymatthew commented Dec 22, 2025

Uh oh!

Jefffrey commented Dec 23, 2025

Uh oh!

Jefffrey commented Dec 23, 2025 •

edited

Loading

Uh oh!

Jefffrey left a comment

Uh oh!

Uh oh!

Jefffrey Dec 23, 2025

Uh oh!

friendlymatthew Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix row slice bug in Union column decoding with many columns #9000

Are you sure you want to change the base?

Fix row slice bug in Union column decoding with many columns #9000

Conversation

friendlymatthew commented Dec 15, 2025

Which issue does this PR close?

Rationale for this change

Uh oh!

friendlymatthew commented Dec 22, 2025

Uh oh!

Jefffrey commented Dec 23, 2025

Uh oh!

Jefffrey commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jefffrey Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

friendlymatthew Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jefffrey commented Dec 23, 2025 •

edited

Loading