-
Notifications
You must be signed in to change notification settings - Fork 911
Opaque Object Layout #4678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Opaque Object Layout #4678
Conversation
This allows for variable sized base classes in future by using negative basicsize values.
Variable sized base classes require the use of relative offsets
Metaclasses cannot define `__new__` so `__init__` is the only alternative.
|
I didn't realise I was going to hit the issue that the minimum supported rust version of pyo3 (1.63) doesn't support generic associated types which I'm using for Rust 1.65 (the first with GAT support) was released on 2022-11-03 and python 3.12 (the first version to allow extending variable sized base classes) was released on 2023-10-02. I'm going to bump the MSRV to 1.65 but I'm open to other suggestions that avoid requiring GATs. I checked the MSRV requirements according to Debian bookworm is supported until June 2026 which seems like a long time to wait... |
4ce2db0 to
401af6c
Compare
7820766 to
de2c2a3
Compare
de2c2a3 to
426b218
Compare
You can't do that unfortunately. Our msrv policy is at https://github.com/PyO3/pyo3/blob/main/Contributing.md#python-and-rust-version-support-policy What you can do is to feature gate this and only enable the functionality on Rust 1.65 and up, or figure out a way to do it without gats. |
That sounds good to me. Do you mean introduce a feature like Would it be sufficient to introduce the feature and have it disabled by default? |
|
A feature is one way to do it, a cfg that is automatically enabled when the build script detects it's on rust 1.65+ is another method. Both have pros and cons as far as user experience goes, so it's best to have a solution that doesn't need them. The layout code is also reasonably complex, so I'd rather not mix conditional compilation into it if possible. |
|
The design currently revolves around the This way the layout is set generically in the base types like The ways I can currently see around this are:
I'm going to implement (1) for now because the steps seem clearest, so this PR can be in a state where it could hypothetically be merged, then I'm open to suggestions if someone finds a way around the limitations I'm running into. Attempt 1The problem here is that the static and variable markers aren't known to be disjoint, so they conflict. trait IsStaticType {}
trait IsVariableType {}
struct PyAny;
impl IsStaticType for PyAny {}
struct PyType;
impl IsVariableType for PyType {}
struct Layout<T> { _data: PhantomData<T> }
trait LayoutMethods {
fn access_data();
}
impl<T: IsStaticType> LayoutMethods for Layout<T> {
fn access_data() {
println!("hi from static");
}
}
impl<T: IsVariableType> LayoutMethods for Layout<T> { // ERROR: conflicting implementations
fn access_data() {
println!("hi from variable");
}
}Attempt 2The problem here is that trait LayoutMethods {
fn access_data() {
panic!("unknown layout")
}
}
impl<T: PyClassImpl> LayoutMethods for T {}
trait StaticLayout: Sized + 'static + LayoutMethods {
fn access_data() {
println!("hi from static");
}
}
impl<T, B> StaticLayout for T
where
T: PyClassImpl<BaseType = B> + 'static,
B: StaticLayout,
{ }
trait VariableLayout: Sized + 'static + LayoutMethods {
fn access_data() {
println!("hi from variable");
}
}
impl<T, B> VariableLayout for T
where
T: PyClassImpl<BaseType = B> + 'static,
B: VariableLayout,
{ }
trait PyClassImpl { type BaseType; }
struct PyAny;
impl LayoutMethods for PyAny {}
impl StaticLayout for PyAny {}
struct PyType;
impl LayoutMethods for PyType {}
impl VariableLayout for PyType {}
struct MyClass {}
impl PyClassImpl for MyClass {
type BaseType = PyAny;
} |
Before this change, classes inheriting the base object was a special case where `object.__new__` is not called, and inheriting from other base classes requires use of the unlimited API. Previously this was not very limiting, but with <PyO3#4678> it will be possible to inherit from native base classes with the limited API and possibly from dynamically imported native base classes which may require `__new__` arguments to reach them.
|
Hi @davidhewitt , just wondering if anyone has capacity to start looking at these changes, starting with #4798 . I see the project has been quiet since January so there aren't many conflicts yet. I can start breaking this PR into smaller ones if you are happy with the overall approach. I'm also happy to answer any questions if something isn't clear :) |
|
My sincerest apologies for the delay, I have had a massive backlog build up due to family commitments. I have new childcare support starting from March (so just a couple weeks away now) and the intention is to work long days Fridays to burn the backlog down. I'll aim to get to this on either the first or second Friday on March (please do ping me to remind, the backlog is long!) |
mejrs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by no means an exhaustive review but these jumped out to me initially.
| #[repr(C)] | ||
| pub struct InvalidStaticLayout; | ||
|
|
||
| /// This is valid insofar as casting a `*mut ffi::PyObject` to `*mut InvalidStaticLayout` is valid | ||
| /// since `InvalidStaticLayout` has no fields to read. | ||
| unsafe impl<T> PyLayout<T> for InvalidStaticLayout {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just from stopping people from not declaring an opaque layout when inheriting from PyType? It appears to me like it would be better to have another trait for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. When defining a PyClassBaseType implementation for a type, a value for StaticLayout is required. For types where a static layout is not possible (such as PyType) this value can be provided which panics if it is actually used with the static layout accessors. I have added PyStaticLayout::IS_VALID to make this clearer.
Can you elaborate on the alternative you have in mind?
|
Apologies for the pings, but this is one of the things I'd like to see progress on :) @mbway are you available to address the feedback and/or resolve the merge conflicts? @davidhewitt Can you do that review? |
|
I can try to find some time to bring things up to date. Since everyone seems on board with having the feature I can start breaking this big PR down as I expect you don't want to merge it all in one go? The first part is #4798 |
|
I have brought this PR and #4798 up to date and addressed the comments. There are CI failures though so more work is needed, but they should be in a better state to review. |
593cff8 to
14513d6
Compare
14513d6 to
61bc306
Compare
|
both PRs should be ready for review now. All tests are passing, only the code coverage is still failing. That will be very tricky to overcome I think. |
|
@davidhewitt is your backlog looking any better now? If you plan to review you can ping me whenever so I can bring everything up to date (can't guarantee a quick response but I'll make sure to do it) |
|
Yes, I am moving through reviews much faster now. Sorry I missed that this was in a condition where it was waiting for me 🙈 Which PR would you like me to start with? Given the scale of the conflicts, it might make sense for me to give this a first pass before you invest time into fixing up. |
|
#4798 is still a good place to start I think as it's much smaller and self-contained. Feel free to take a look at this PR as well if you want, I think that would be useful, but I think it's probably best to break it up further to get it merged. I didn't want to break off the next part until #4798 is in. Also #4951 (not mine) would be useful because I added basic |
davidhewitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this (massive) PR, and sorry I've taken so long to come around to reading it.
Assuming there is no significant performance cost, I might be tempted to fully move towards using the opaque layout on 3.12 for simplicity. Otherwise I guess we just do it for types inheriting from bases of unknown size. Given your code here already made it optional, maybe there's no point to have the "simpler" option.
I think this PR probably can be split in three:
__init__support (possibly to be replaced by #4951 as you noted)opaquelayout- support for inheriting from
typeand the metaclass documentation
... maybe let's start by converging on a design for __init__ which meets all the use cases, and then we can circle back here?
| - `__init__(<self>, ...) -> ()` - the arguments can be defined as for | ||
| normal `pymethods`. The pyclass struct must implement `Default`. | ||
| If the class defines `__new__` and `__init__` the values set in | ||
| `__new__` are overridden by `Default` before `__init__` is called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the fact that __new__ gets overwritten seems unfortunate. I will have a think what alternatives we have here (e.g. friendly compile error etc.)
| | `subclass` | Allows other Python classes and `#[pyclass]` to inherit from this class. Enums cannot be subclassed. | | ||
| | `unsendable` | Required if your struct is not [`Send`][params-3]. Rather than using `unsendable`, consider implementing your struct in a thread-safe way by e.g. substituting [`Rc`][params-4] with [`Arc`][params-5]. By using `unsendable`, your class will panic when accessed by another thread. Also note the Python's GC is multi-threaded and while unsendable classes will not be traversed on foreign threads to avoid UB, this can lead to memory leaks. | | ||
| | `weakref` | Allows this class to be [weakly referenceable][params-6]. | | ||
| | `opaque` | Forces the use of the 'opaque layout' ([PEP 697](https://peps.python.org/pep-0697/)) for this class and any subclasses that extend it. Primarily used for internal testing. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should consider always using this on 3.12+, rather than having it as an option? (Might depend on the performance cost of doing so.)
I think it might require us to use managed dict & weakref on 3.12+ so that we don't know the exact object layout. The benefit is it might clean up a lot of PyO3's coupling to Python internals. I may try do that in the coming days.
| /// Implement methods for obtaining the type object associated with a native type. | ||
| /// These methods are referred to in `pyobject_native_type_info` for implementing `PyTypeInfo`. | ||
| #[doc(hidden)] | ||
| #[macro_export] | ||
| macro_rules! pyobject_native_type_object_methods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might need explaining as to the rationale of this bit.
|
It looks like #4951 is set to merge, which might unblock this PR moving forward. |
|
Thanks. I've been keeping an eye on it. I should have some time soon to revive these PRs |
This PR introduces the capability to inherit from classes where the exact layout/size is not known using the mechanism described in PEP 697. This allows inheriting from
typein order to create metaclasses. These metaclasses cannot yet be used with pyo3 classes (i.e.#[pyclass(metaclass=Foo)]) but this should be a possibility in future. It may also be possible in future for pyo3 classes to extend builtin classes using the limited API and to potentially extend imported classes (egenum.EnumType).This PR supersedes my first attempt at creating metaclasses: #4621. Previously I was using the normal class layout which relies on the exact layout of every base class to be known. For
typethe base structure isffi::PyHeapTypeObject. The contents of this structure are intended to be private (even for the 'unlimited' API) and so an alternative mechanism is required (PEP 697).Instead of nesting each base class at the beginning of the derived class in memory, objects created with PEP 697 only require knowledge of how much additional memory the derived class requires and the address of this section of memory is accessed via PyObject_GetTypeData. The PEP 697 mechanism could potentially be used to inherit from other builtins when only the 'limited' API is available however many of these builtins also store items (eg
list,set) which introduces more complications, so this PR focuses on inheriting fromPyTypeonly.This PR is a proof of concept but I am sure that some of the decisions I made aren't the best so feedback is welcome. The current design is as follows:
Before this PR the only possible layout was the 'static' layout where the size of the base class is known. This layout was described by
PyClassObject<T>(now renamed toPyStaticClassObject). This PR introduces hides the details of the static layout behind a traitInternalPyClassObjectLayoutso that other layouts are possible.Before this PR:
PyClassObject<T>was used to obtain the exact layout ofT: PyClassImplassuming that layout is the 'static' layout. This PR introducesPyClassImpl::Layoutso that a pyclass declares its layout. The pyclass usesPyTypeInfo::Layoutto setT::Layout. Each base type declares eitherPyTypeInfo::Layout = PyStaticClassObjectorPyTypeInfo::Layout = PyVariableClassObject.Potential Improvements
Some questions still remain:
__init__approach, how to initialise super classes?__init__approach, how to restrict return value to()orPyResult<()>?type -> MyMetaclass -> MySubMetaclassbut I think there are some outstanding problems. I also am not sure how to prevent users from doing this so I have left it for now.setattrworks without using#[pyclass(dict)]but the set values aren't accessible in the subclasssetattron a class extendingtypedoes not fail, unlike on a class extendingobject. I'm not sure if this meanstypehas__dict__already?Initialisation
I'm not sure what the best way is to handle initialisation.
tp_newcannot be used with metaclasses (source) however the best semantics of__init__aren't clear. For now I am requiring that metaclasses implementDefaultwhich allows the struct to be initialised before__init__is called (but this is a hack and may have issues properly initialising superclasses). There are runtime checks when a class is constructed that metaclasses do not define#[new]and do define__init__. This should be sufficient to ensure that users cannot create a struct with uninitialized memory.An alternative to using
__init__could be to repurpose#[new]to wrap it and assign it totp_initinstead in the case of a metaclass but that would be more complex to implement.