-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-47348: [C++][Python] Make S3FileSystem pickle-able with retry strategy #47350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
bd294f1 to
983021c
Compare
|
@github-actions crossbow submit -g python |
|
Revision: 983021c6c7246f468a2647f34b0064625e08ddbc Submitted crossbow builds: ursacomputing/crossbow @ actions-19f6d98dfd |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/tests/test_fs.py
Outdated
|
|
||
| fs = S3FileSystem( | ||
| retry_strategy=AwsDefaultS3RetryStrategy(max_attempts=5)) | ||
| assert isinstance(fs, S3FileSystem) | ||
| assert pickle_module.loads(pickle_module.dumps(fs)) == fs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but should also assert that it's different from a filesystem configured with another strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I will look into it. It seems like Cython will call cpp side of things, e.g., Equals of FileSystem. I think I will need to add AwsRetryStrategy::Equals as well.
983021c to
9b2dbc9
Compare
9b2dbc9 to
436673a
Compare
436673a to
92fb660
Compare
|
@raulcd @pitrou please take a second look. I have added some code in c++ and updated the PR description. Based on one of the comments, to compare S3FileSystem with different retry strategies, I had to implement it in C++. I tried multiple things but found it's most straight-forward to use std::visit. It should be flexible enough for a new retry strategy to define its equality. I did update a few things in AwsRetryStrategy because of the use std::variant, but it's a private API so I hope it's okay. C++ is not my day-job language, any comment is appreciated. thanks. |
|
bump |
fe361b2 to
c97cc4c
Compare
|
Hi @AlenkaF , just wonder if you can give this PR a review or help me find someone to do this? I am an open-source contributor to Ray and this filesystem's retry strategy has been bugging us for a while because they were not properly pickled and sent to remote processes. Thank you. |
…ryStrategy [C++] Added Equals() to S3RetryStrategy. Now S3Options and S3FileSystem will compare their retry strategies as well. [Python] Included retry_strategy in S3FileSystem.__reduce__ method. S3FileSystem's retry_strategy should survive serialization and deserialization. Signed-off-by: Kit Lee <7000003+wingkitlee0@users.noreply.github.com>
c97cc4c to
1f42910
Compare
Rationale for this change
This change fixes the serialization/deserialization (and equality) of
S3FileSystem. Previously, theretry_strategyparameter is missed in the serialization. This affects other distributed libraries like Ray to incorrectly reconstruct aS3FileSystemwith no retry_strategy parameter in other processes.What changes are included in this PR?
The changes can be summarized in two parts, one in pyarrow and one in c++ library.
For pyarrow, this PR adds
retry_strategyinS3FileSystem.__reduce__and a property method to allow access toretry_strategy(a python object). SinceS3FileSystemis a cdef class butAWSDefaultRetryStrategyetc are not,objecttype is used for simplicity.In C++, an
Equalsmethod is added toS3RetryStrategyin C++. This allows the equality comparison in both c++ and python to consider retry strategy properly, e.g., comparingS3OptionsandS3FileSystemwith retry strategy taken into account. For now, the comparison returns true when they are the same class and parameters (implemented forAwsDefaultRetryStrategyandAwsStandardRetryStrategy)Are these changes tested?
[Python] Updated the unit tests to test for
retry_strategy.The new
retry_strategyproperty is used to verify the correct reconstruction.[C++] New tests are added for
AwsRetryStrategyEquals() method.Are there any user-facing changes?
No.