GH-22081: [Python] Support reading numpy arrays with ndims > 1 into pa arrays of Lists #48581
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Fixes #22081
Efficiently/correctly creating List arrays from numpy arrays with ndims > 1.
What changes are included in this PR?
Before,
pa.array(np.arange(6).reshape(2,3))would fail. Now it returns an array of length 2, where each element is a size-3 list.I think this is the only intuitive behavior. But if you can think of an alternative behavior a user might want/expect from this, then please let's talk about it.
I am not super familiar with numpy/pyarrow memory layout internals to understand if there are other cases besides the C-continuous memory layout where we could use zero-copy. But even if there are other cases, I'm not sure if we need to bother with them, I bet the c-continuous covers 95% of usage.
I also am not sure if this is a good way to to do this, or if there is a more succinct way.
This was written entirely by GH copilot. You can see my dialog with copilot as I tweaked it's directions and chose an implementation in NickCrews#3
Perhaps this logic should be pulled into its own
_from_n_dim_numpy(np_arr)helper function to keep the larger control flow of the function more clear, let me know if you think so.Are these changes tested?
Yes, I think adequately. It doesn't actually verify that the zero-copy path is used, just that the results are correct. I didn't really want to deal with messing with monkeypatching/spying on things to detect the 0-copy, but can add this if we want to verify.
We also just compare the results to the result via the .tolist() path, but perhaps we should instead write out the actual expected value as boilerplate so that it is even more obvious what the expected behavior is.
Are there any user-facing changes?
No breaking changes, only newly supported features!