-
Notifications
You must be signed in to change notification settings - Fork 28
#235 [Coding Guideline]: Do not create values from uninitialized memory #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for scrc-coding-guidelines ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
I've made some improvement suggestions in a PR here: manhatsu#1 |
Thank you very much. Merged to this branch |
|
Hey @manhatsu 👋 it looks like from the CI that a new tag needs to be added. Could you follow what @rcseacord did in this PR to add the |
d3b29b5 to
57f303a
Compare
57f303a to
4b8cbc7
Compare
PLeVasseur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @manhatsu -- thank you for contributing. Please see the comment I left on how to generate a template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @inkreasing says is correct. This description is insufficient to reflect the restrictions imposed by MIRI here:
Note this is not UB before line 15.
Bytes remain uninit until written. You may not read uninitialized bytes as any initialized type, period, not even if "all" bitpatterns are considered valid, because uninit is the 257th bitpattern for a byte, effectively: 0xUU. By contrast, u8 is 0x00 through 0xFF, inclusive. We use MaybeUninit<u8> to indicate the final state is possible, and it is valid to read that value (well, from any allocation that has a byte in it, at least).
src/coding-guidelines/values.rst
Outdated
| A program shall not create a value of any type from uninitialized memory, | ||
| except when accessing a field of a union type, | ||
| where such reads are explicitly defined to be permitted even if the bytes of that field are uninitialized. | ||
| It is prohibited to interpret uninitialized memory as a value of any Rust type such as a | ||
| primitive, aggregate, reference, pointer, struct, enum, array, or tuple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition does not consider composition of types:
use std::mem::MaybeUninit;
union Uninit32 {
u: u32,
i: i32,
f: f32,
void: (),
}
struct Newtype32(Uninit32);
fn main() {
let x: Newtype32 = unsafe { MaybeUninit::uninit().assume_init() };
}This passes miri because all the bytes in Newtype32 are defined by Uninit32, which is allowed to be uninitialized.
When bytes are read as a type (e.g. by using ptr.read(), *ptr, or mem::transmute), a "typed read" occurs. This asserts the bytes are valid as that type. Uninit32 and MaybeUninit are the same thing here: unions with () as a possibility, which means they must be valid to read from a blob of uninitialized bytes within a valid allocation1. Because Newtype32 is entirely defined by Uninit32, it is also valid to read from uninitialized bytes: the struct wrapper does not impose a novel validity requirement. If this is a mandatory guideline, it should be more exacting about why.
Footnotes
-
Note that this is likely a stronger requirement than the actual rules will be regarding union validity once final details of those are hashed out. I'm just giving an example that is very "in the clear". ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
| .. non_compliant_example:: | ||
| :id: non_compl_ex_Qb5GqYTP6db1 | ||
| :status: draft | ||
|
|
||
| This noncompliant example creates a value of type ``u32`` from uninitialized memory via | ||
| `assume_init <https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init>`_: | ||
|
|
||
| .. code-block:: rust | ||
|
|
||
| use std::mem::MaybeUninit; | ||
|
|
||
| let x: u32 = unsafe { MaybeUninit::uninit().assume_init() }; // UB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got this right, but the lesson needs to be extended elsewhere: assume_init and the read of a union field at its type are not really different operations in the semantics, so why would u32 be valid to read from a union field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
9b81ff4 to
0e2776c
Compare
felix91gr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this helps
| - may violate niche or discriminant validity, | ||
| - may create invalid pointer values, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that these two would benefit of being further developed with their own examples. Going in-depth in the Rationale is fine: your goal is to make it obvious why this guideline is required. Explain as much as you find is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the text here. The examples expand on pointer and reference validity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
| - creates undefined behavior for most types, | ||
| - may violate niche or discriminant validity, | ||
| - may create invalid pointer values, or | ||
| - may produce values that violate type invariants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the first bullet point, I believe the last one needs to point to an official reference or documentation of some kind that explains in full why this is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed these bullets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resovled
src/coding-guidelines/values.rst
Outdated
| :scope: system | ||
| :tags: undefined-behavior, unsafe | ||
|
|
||
| Do not create a typed value from uninitialized memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would perhaps divide this guideline into two.
There is a very good guideline in this PR that deals with the creation of typed values from uninitialized memory by using functions like assume_init on memory that has not been fully initialized on the relevant bytes. That transition, from MaybeUninit<T> to T, &mut T and others, is Undefined Behavior.
But there is also another guideline in this PR that deals with how and when it's valid to access fields of unions. I think it's best for the two to be separate, since one of them deals with the creation of typed values and the other deals with the reading of typed values from unions.
Sidenote:
Keep in mind that MaybeUninit<T> is itself a union, so however the guideline that deals with access to fields of unions ends up being written, you will want to make sure you consider it among the exceptions ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also want to take a look at these pages:
- The Rustonomicon on working with uninitialized memory from
unsafehttps://doc.rust-lang.org/nomicon/unchecked-uninit.html - The explanation of Initialization Invariant, examples and such, from the docs of
MaybeUninit. - The Language Reference on unions: https://doc.rust-lang.org/reference/items/unions.html
The first two contain most of what I think you'll need to complete this / these Guideline(s). I hope they help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to think on this a bit, as splitting guidelines is a lot of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think it would be hard to separate these because a union is an exception to this rule so it would have to be mentioned. Once it is, you have introduced all the tricky union behavior as to what is and is not a violation.
I have changed the title to include reading because my understanding is that reading and calling assume_init are basically the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the title to include reading because my understanding is that reading and calling assume_init are basically the same here.
That's the thing: they are not. But I don't think that matters much anyways.
Why they are not the same
MaybeUninit<T> is very special for the compiler. assume_init is, among other things, a statement of "from here on, assume this memory is initialized". The compiler uses this fact to deduce things about the program, hence why it's UB to violate those assumptions. That memory never needs to be read for this to happen: it happens because a contract with the compiler has been broken.
Oh the other hand, reading from a union field is either well-defined or UB, depending on if the field is properly initialized or not.
They are not the same, except for the fact that both can cause UB if used poorly (which is a property of everything that inhabits in unsafe).
Why I don't think it matters much anyway
MaybeUninit<T> is the main (almost only, in fact) avenue through which a programmer can create initialized, valid values of a type T, from uninitialized memory. It's a carefully defined API, and it has a tight integration with the compiler.
On the other hand, unions are a low-level primitive that can be used, among other things, to emulate MaybeUninit<T>. That's, in principle, how MaybeUninit<T> was constructed - you can see it's itself a union.
But unions are rather obscure in Unsafe Rust, and they have their own niche use cases. The contexts in which they are used are so different from MaybeUninit<T> that bundling them together seems like a mistake to me.
The process for safely working with them is more crude than how it is for MaybeUninit<T>, which I would also consider part of their mismatch if they were to be ruled about in the same guideline.
Remember: we could have a guideline called "Do not invoke UB", which would be more or less well-defined (we could say "MIRI under Tree Borrows", and that would be well-defined enough for our purposes), but it would be a terrible guideline because so many different things can do UB.
What I'd do
Yeah, I'd separate it into two.
One for MaybeUninit<T>
To make it more actionable, you could go straight ahead and make a guideline about how to use this type in a safe manner.
Basically, ruling on how to call the assume_ APIs of MaybeUninit<T>:
assume_initassume_init_dropassume_init_mutassume_init_readassume_init_ref
As well as:
array_assume_init
Their documentation explains their safety invariants. The guideline would basically point out how and why to satisfy them.
One for unions
Again, unions have their own niches and contexts. They are also WAY cruder to use than MaybeUninit<T>. I think the examples will show just how different they are.
I think unions merit their own rule. They are, after all, one of the few low-level primitives the language has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the code you replied you doesn't have UB.
The problem here is what does "fully initialized" mean? It doesn't include padding, so it doesn't mean all bytes. There are also zero-sized types that don't have to (can't?) be initialized. And some types are allowed to hold unitialized data (like MaybeUninit and unions with a () member). These also don't have to be initialized to be "valid". So if you have a type that only has members that can hold uninit bytes then you can do MaybeUninit::uninit().assume_init() without UB.
See also: #general > UB in Safe Rust @ 💬
This passes miri:
use std::mem::MaybeUninit;
union Uninit32 {
u: u32,
i: i32,
f: f32,
void: (),
}
struct Newtype32(Uninit32);
fn main() {
let x: Newtype32 = unsafe { MaybeUninit::uninit().assume_init() };
let y: () = unsafe { MaybeUninit::uninit().assume_init() };
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. Okay, I'm not up to properly reviewing this then. @inkreasing or @workingjubilee can I interest you guys in reviewing this guideline?
I can help with the more form-related bits, like where should explanations go and where should the actionable items go. But I'll need the help of one of you, who are more experienced with this, to give the semantics a thumbs up.
Lemme know and I'll assign you as reviewer :3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the title to include reading because my understanding is that reading and calling
assume_initare basically the same here.
This understanding is functionally correct: the important mechanical element is sometimes referred to as "transmuting" or "read-at-type". It is the low-level statement (expression?) in the Rust abstract machine that moves some bytes and thinks of them as being bytes of a specific type as it does so. Bytes "at rest" have no such identity in the Rust AM, but they acquire a type while in motion because that often has an implication on the specific bytes that actually have to move. For instance, while it is definitely valid to copy all the bytes blindly, it may also be valid to e.g. move individual fields of a struct, instead of moving the entire struct-with-padding, which may be important when considering function calls in ABIs which do precisely that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The missing detail in your reading, @felix91gr, is that the assume_init documentation references the initialization invariant documentation on the type. That documentation describes that individual types have their own initialization invariant. We usually say something like "every byte has to be in the set of valid bitpatterns for that type". This is easily logically handled by tracking certain bytes as having "uninit" as a unique state that doesn't correspond to 0x00 through 0xFF. So for a type that allows uninit for a given byte, the typed read permits a read of the uninit bytes, as "0xUU" is in the valid set of bitpatterns. This does get slightly confusing because we don't consider it a bitpattern but we kinda... do...?
Anyway, it is easy to read as being truly independent and context-free. We should probably change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've resolved all of this in the new version
workingjubilee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better.
|
Sidenote: this probably should be part of the Unsafety chapter. Anything dealing with upholding validity invariants in |
I created an issue #241 that discusses this. See what you think. Anyway, we should probably have this discussion there. |
dcb19bf to
8d4a655
Compare
|
Hi @manhatsu, @rcseacord -- please see this PR: #288 Please simply replace the current commits on your feature branch with that single commit on the above PR. That way we can keep the review history on this PR. |
|
@felix91gr I might be coming around to your view that union should be split out into a different rule. |
20d6c35 to
2ade1b3
Compare
had a slightly older version, so I replaced it with the latest.
| or related functions, is treated in the same manner as a typed read. | ||
| Calling these function when on memory that is not yet fully initialized causes immediate undefined behavior. | ||
| The memory must be properly initialized according to the requirements of the variable’s type. | ||
| For example, a variable of reference type must be aligned and non-null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you want to fully list all validity requirements here, but references also need to point to valid memory.
This code has UB:
fn main() {
let mut uninit: MaybeUninit<&u8> = MaybeUninit::uninit();
unsafe {
// write non-null and aligned address.
(&raw mut uninit).cast::<*const u8>().write(ptr::dangling());
// UB here
let init = uninit.assume_init();
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this requirement and created a new noncompliant example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
added another example and some clarification for references
rcseacord
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
spelling
Co-authored-by: increasing <dev@lucasbaumann.de>
Closes #235.