Skip to content

Conversation

@manhatsu
Copy link
Contributor

@manhatsu manhatsu commented Dec 4, 2025

Closes #235.

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for scrc-coding-guidelines ready!

Name Link
🔨 Latest commit 50101d2
🔍 Latest deploy log https://app.netlify.com/projects/scrc-coding-guidelines/deploys/69444e5b5b545f0008c732f7
😎 Deploy Preview https://deploy-preview-240--scrc-coding-guidelines.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rcseacord
Copy link
Collaborator

I've made some improvement suggestions in a PR here: manhatsu#1

@manhatsu
Copy link
Contributor Author

manhatsu commented Dec 4, 2025

I've made some improvement suggestions in a PR here: manhatsu#1

Thank you very much. Merged to this branch

@PLeVasseur
Copy link
Collaborator

Hey @manhatsu 👋 it looks like from the CI that a new tag needs to be added.

Could you follow what @rcseacord did in this PR to add the unsafe tag with an appropriate description? Ideally you would do that as a separate PR, as that's easy to review and merge.

Copy link
Collaborator

@PLeVasseur PLeVasseur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @manhatsu -- thank you for contributing. Please see the comment I left on how to generate a template.

Copy link

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @inkreasing says is correct. This description is insufficient to reflect the restrictions imposed by MIRI here:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=abb9da1c391902b21c03ed1d21767b58

Note this is not UB before line 15.

Bytes remain uninit until written. You may not read uninitialized bytes as any initialized type, period, not even if "all" bitpatterns are considered valid, because uninit is the 257th bitpattern for a byte, effectively: 0xUU. By contrast, u8 is 0x00 through 0xFF, inclusive. We use MaybeUninit<u8> to indicate the final state is possible, and it is valid to read that value (well, from any allocation that has a byte in it, at least).

Comment on lines 19 to 23
A program shall not create a value of any type from uninitialized memory,
except when accessing a field of a union type,
where such reads are explicitly defined to be permitted even if the bytes of that field are uninitialized.
It is prohibited to interpret uninitialized memory as a value of any Rust type such as a
primitive, aggregate, reference, pointer, struct, enum, array, or tuple.
Copy link

@workingjubilee workingjubilee Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition does not consider composition of types:

use std::mem::MaybeUninit;

union Uninit32 {
    u: u32,
    i: i32,
    f: f32,
    void: (),
}

struct Newtype32(Uninit32);

fn main() {
    let x: Newtype32 = unsafe { MaybeUninit::uninit().assume_init() };
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=feae8c987fa0b2703533ce0ebf8b23ba

This passes miri because all the bytes in Newtype32 are defined by Uninit32, which is allowed to be uninitialized.

When bytes are read as a type (e.g. by using ptr.read(), *ptr, or mem::transmute), a "typed read" occurs. This asserts the bytes are valid as that type. Uninit32 and MaybeUninit are the same thing here: unions with () as a possibility, which means they must be valid to read from a blob of uninitialized bytes within a valid allocation1. Because Newtype32 is entirely defined by Uninit32, it is also valid to read from uninitialized bytes: the struct wrapper does not impose a novel validity requirement. If this is a mandatory guideline, it should be more exacting about why.

Footnotes

  1. Note that this is likely a stronger requirement than the actual rules will be regarding union validity once final details of those are hashed out. I'm just giving an example that is very "in the clear".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Comment on lines 50 to 45
.. non_compliant_example::
:id: non_compl_ex_Qb5GqYTP6db1
:status: draft

This noncompliant example creates a value of type ``u32`` from uninitialized memory via
`assume_init <https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init>`_:

.. code-block:: rust

use std::mem::MaybeUninit;

let x: u32 = unsafe { MaybeUninit::uninit().assume_init() }; // UB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got this right, but the lesson needs to be extended elsewhere: assume_init and the read of a union field at its type are not really different operations in the semantics, so why would u32 be valid to read from a union field?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@manhatsu manhatsu force-pushed the doc/no-uninit-value branch from 9b81ff4 to 0e2776c Compare December 8, 2025 00:44
Copy link
Collaborator

@felix91gr felix91gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this helps

Comment on lines 37 to 30
- may violate niche or discriminant validity,
- may create invalid pointer values, or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that these two would benefit of being further developed with their own examples. Going in-depth in the Rationale is fine: your goal is to make it obvious why this guideline is required. Explain as much as you find is needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the text here. The examples expand on pointer and reference validity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

- creates undefined behavior for most types,
- may violate niche or discriminant validity,
- may create invalid pointer values, or
- may produce values that violate type invariants.
Copy link
Collaborator

@felix91gr felix91gr Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the first bullet point, I believe the last one needs to point to an official reference or documentation of some kind that explains in full why this is the case.

Copy link
Collaborator

@rcseacord rcseacord Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed these bullets.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resovled

:scope: system
:tags: undefined-behavior, unsafe

Do not create a typed value from uninitialized memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perhaps divide this guideline into two.

There is a very good guideline in this PR that deals with the creation of typed values from uninitialized memory by using functions like assume_init on memory that has not been fully initialized on the relevant bytes. That transition, from MaybeUninit<T> to T, &mut T and others, is Undefined Behavior.

But there is also another guideline in this PR that deals with how and when it's valid to access fields of unions. I think it's best for the two to be separate, since one of them deals with the creation of typed values and the other deals with the reading of typed values from unions.


Sidenote:

Keep in mind that MaybeUninit<T> is itself a union, so however the guideline that deals with access to fields of unions ends up being written, you will want to make sure you consider it among the exceptions ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to take a look at these pages:

  1. The Rustonomicon on working with uninitialized memory from unsafe https://doc.rust-lang.org/nomicon/unchecked-uninit.html
  2. The explanation of Initialization Invariant, examples and such, from the docs of MaybeUninit.
  3. The Language Reference on unions: https://doc.rust-lang.org/reference/items/unions.html

The first two contain most of what I think you'll need to complete this / these Guideline(s). I hope they help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to think on this a bit, as splitting guidelines is a lot of work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it would be hard to separate these because a union is an exception to this rule so it would have to be mentioned. Once it is, you have introduced all the tricky union behavior as to what is and is not a violation.

I have changed the title to include reading because my understanding is that reading and calling assume_init are basically the same here.

Copy link
Collaborator

@felix91gr felix91gr Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the title to include reading because my understanding is that reading and calling assume_init are basically the same here.

That's the thing: they are not. But I don't think that matters much anyways.

Why they are not the same

MaybeUninit<T> is very special for the compiler. assume_init is, among other things, a statement of "from here on, assume this memory is initialized". The compiler uses this fact to deduce things about the program, hence why it's UB to violate those assumptions. That memory never needs to be read for this to happen: it happens because a contract with the compiler has been broken.

Oh the other hand, reading from a union field is either well-defined or UB, depending on if the field is properly initialized or not.

They are not the same, except for the fact that both can cause UB if used poorly (which is a property of everything that inhabits in unsafe).

Why I don't think it matters much anyway

MaybeUninit<T> is the main (almost only, in fact) avenue through which a programmer can create initialized, valid values of a type T, from uninitialized memory. It's a carefully defined API, and it has a tight integration with the compiler.

On the other hand, unions are a low-level primitive that can be used, among other things, to emulate MaybeUninit<T>. That's, in principle, how MaybeUninit<T> was constructed - you can see it's itself a union.

But unions are rather obscure in Unsafe Rust, and they have their own niche use cases. The contexts in which they are used are so different from MaybeUninit<T> that bundling them together seems like a mistake to me.

The process for safely working with them is more crude than how it is for MaybeUninit<T>, which I would also consider part of their mismatch if they were to be ruled about in the same guideline.

Remember: we could have a guideline called "Do not invoke UB", which would be more or less well-defined (we could say "MIRI under Tree Borrows", and that would be well-defined enough for our purposes), but it would be a terrible guideline because so many different things can do UB.

What I'd do

Yeah, I'd separate it into two.

One for MaybeUninit<T>

To make it more actionable, you could go straight ahead and make a guideline about how to use this type in a safe manner.

Basically, ruling on how to call the assume_ APIs of MaybeUninit<T>:

  • assume_init
  • assume_init_drop
  • assume_init_mut
  • assume_init_read
  • assume_init_ref

As well as:

  • array_assume_init

Their documentation explains their safety invariants. The guideline would basically point out how and why to satisfy them.

One for unions

Again, unions have their own niches and contexts. They are also WAY cruder to use than MaybeUninit<T>. I think the examples will show just how different they are.

I think unions merit their own rule. They are, after all, one of the few low-level primitives the language has.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the code you replied you doesn't have UB.

The problem here is what does "fully initialized" mean? It doesn't include padding, so it doesn't mean all bytes. There are also zero-sized types that don't have to (can't?) be initialized. And some types are allowed to hold unitialized data (like MaybeUninit and unions with a () member). These also don't have to be initialized to be "valid". So if you have a type that only has members that can hold uninit bytes then you can do MaybeUninit::uninit().assume_init() without UB.

See also: #general > UB in Safe Rust @ 💬

This passes miri:

use std::mem::MaybeUninit;

union Uninit32 {
    u: u32,
    i: i32,
    f: f32,
    void: (),
}

struct Newtype32(Uninit32);

fn main() {
    let x: Newtype32 = unsafe { MaybeUninit::uninit().assume_init() };
    let y: () = unsafe { MaybeUninit::uninit().assume_init() };
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=dc68e3622c93a4a7bf4dcd2d0187cef3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. Okay, I'm not up to properly reviewing this then. @inkreasing or @workingjubilee can I interest you guys in reviewing this guideline?

I can help with the more form-related bits, like where should explanations go and where should the actionable items go. But I'll need the help of one of you, who are more experienced with this, to give the semantics a thumbs up.

Lemme know and I'll assign you as reviewer :3

Copy link

@workingjubilee workingjubilee Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the title to include reading because my understanding is that reading and calling assume_init are basically the same here.

This understanding is functionally correct: the important mechanical element is sometimes referred to as "transmuting" or "read-at-type". It is the low-level statement (expression?) in the Rust abstract machine that moves some bytes and thinks of them as being bytes of a specific type as it does so. Bytes "at rest" have no such identity in the Rust AM, but they acquire a type while in motion because that often has an implication on the specific bytes that actually have to move. For instance, while it is definitely valid to copy all the bytes blindly, it may also be valid to e.g. move individual fields of a struct, instead of moving the entire struct-with-padding, which may be important when considering function calls in ABIs which do precisely that.

Copy link

@workingjubilee workingjubilee Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing detail in your reading, @felix91gr, is that the assume_init documentation references the initialization invariant documentation on the type. That documentation describes that individual types have their own initialization invariant. We usually say something like "every byte has to be in the set of valid bitpatterns for that type". This is easily logically handled by tracking certain bytes as having "uninit" as a unique state that doesn't correspond to 0x00 through 0xFF. So for a type that allows uninit for a given byte, the typed read permits a read of the uninit bytes, as "0xUU" is in the valid set of bitpatterns. This does get slightly confusing because we don't consider it a bitpattern but we kinda... do...?

Anyway, it is easy to read as being truly independent and context-free. We should probably change that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've resolved all of this in the new version

Copy link

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better.

@felix91gr
Copy link
Collaborator

Sidenote: this probably should be part of the Unsafety chapter. Anything dealing with upholding validity invariants in unsafe should probably go there, I believe.

@rcseacord
Copy link
Collaborator

Sidenote: this probably should be part of the Unsafety chapter. Anything dealing with upholding validity invariants in unsafe should probably go there, I believe.

I created an issue #241 that discusses this. See what you think. Anyway, we should probably have this discussion there.

@PLeVasseur
Copy link
Collaborator

Hi @manhatsu, @rcseacord -- please see this PR: #288

Please simply replace the current commits on your feature branch with that single commit on the above PR.

That way we can keep the review history on this PR.

@rcseacord
Copy link
Collaborator

@felix91gr I might be coming around to your view that union should be split out into a different rule.

@manhatsu manhatsu requested a review from rcseacord December 16, 2025 03:39
or related functions, is treated in the same manner as a typed read.
Calling these function when on memory that is not yet fully initialized causes immediate undefined behavior.
The memory must be properly initialized according to the requirements of the variable’s type.
For example, a variable of reference type must be aligned and non-null.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you want to fully list all validity requirements here, but references also need to point to valid memory.

This code has UB:

fn main() {
    let mut uninit: MaybeUninit<&u8> = MaybeUninit::uninit();
    unsafe {
        // write non-null and aligned address.
        (&raw mut uninit).cast::<*const u8>().write(ptr::dangling());
        // UB here
        let init = uninit.assume_init();
    }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this requirement and created a new noncompliant example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

added another example and some clarification for references
Copy link
Collaborator

@rcseacord rcseacord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Co-authored-by: increasing <dev@lucasbaumann.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chapter: values coding guideline An issue related to a suggestion for a coding guideline

Development

Successfully merging this pull request may close these issues.

[Coding Guideline]: Do not create values from uninitialized memory

6 participants