Skip to content

Conversation

@NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Dec 18, 2025

Rationale for this change

Fixes #22081

Efficiently/correctly creating List arrays from numpy arrays with ndims > 1.

What changes are included in this PR?

Before, pa.array(np.arange(6).reshape(2,3)) would fail. Now it returns an array of length 2, where each element is a size-3 list.

I think this is the only intuitive behavior. But if you can think of an alternative behavior a user might want/expect from this, then please let's talk about it.

I am not super familiar with numpy/pyarrow memory layout internals to understand if there are other cases besides the C-continuous memory layout where we could use zero-copy. But even if there are other cases, I'm not sure if we need to bother with them, I bet the c-continuous covers 95% of usage.

I also am not sure if this is a good way to to do this, or if there is a more succinct way.

This was written entirely by GH copilot. You can see my dialog with copilot as I tweaked it's directions and chose an implementation in NickCrews#3

Perhaps this logic should be pulled into its own _from_n_dim_numpy(np_arr) helper function to keep the larger control flow of the function more clear, let me know if you think so.

Are these changes tested?

Yes, I think adequately. It doesn't actually verify that the zero-copy path is used, just that the results are correct. I didn't really want to deal with messing with monkeypatching/spying on things to detect the 0-copy, but can add this if we want to verify.

We also just compare the results to the result via the .tolist() path, but perhaps we should instead write out the actual expected value as boilerplate so that it is even more obvious what the expected behavior is.

Are there any user-facing changes?

No breaking changes, only newly supported features!

Copilot AI and others added 5 commits December 17, 2025 22:24
Co-authored-by: NickCrews <10820686+NickCrews@users.noreply.github.com>
Co-authored-by: NickCrews <10820686+NickCrews@users.noreply.github.com>
Co-authored-by: NickCrews <10820686+NickCrews@users.noreply.github.com>
… arrays

Co-authored-by: NickCrews <10820686+NickCrews@users.noreply.github.com>
@github-actions
Copy link

⚠️ GitHub issue #22081 has been automatically assigned in GitHub to PR creator.

Co-authored-by: NickCrews <10820686+NickCrews@users.noreply.github.com>
@NickCrews
Copy link
Contributor Author

I marked as WIP until the tests actually pass. sigh, time to babysit AI again...

Co-authored-by: NickCrews <10820686+NickCrews@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Support inferring nested ndarray with ndim > 1

1 participant