8366717: [lworld] Cleanup defensive fixing of JDK-8365996 #1824

marc-chevalier · 2025-12-18T08:26:03Z

Let's proceed piece by piece.

`G1BarrierSetAssembler::g1_write_barrier_pre` in `g1BarrierSetAssembler_aarch64.cpp`

valhalla/src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp

Lines 216 to 220 in 1077e4f

    
           // TODO 8366717 This came with 8284161: Implementation of Virtual Threads (Preview) later in May 2022 
        
           // Check if it's sufficient 
        
           //__ push_call_clobbered_registers(); 
        
           assert_different_registers(rscratch1, pre_val); // push_CPU_state trashes rscratch1 
        
           __ push_CPU_state(true);

push_call_clobbered_registers/pop_call_clobbered_registers should be enough around a runtime call, that's what clobbered registers are.

But let's dig a bit, that will be useful later!

   push_call_clobbered_registers()
=> push_call_clobbered_registers_except(exclude = empty set)
=> push(call_clobbered_gp_registers() - exclude, sp)  // with exclude = empty set

So, we save at least call_clobbered_gp_registers which is defined as:

valhalla/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

Lines 3614 to 3620 in 1077e4f

    
           RegSet MacroAssembler::call_clobbered_gp_registers() { 
        
             RegSet regs = RegSet::range(r0, r17) - RegSet::of(rscratch1, rscratch2); 
        
           #ifndef R18_RESERVED 
        
             regs += r18_tls; 
        
           #endif 
        
             return regs; 
        
           }

So, we save r0-r7 and r10-r17

`CardTableBarrierSetAssembler::oop_store_at` in `cardTableBarrierSetAssembler_aarch64.cpp`

valhalla/src/hotspot/cpu/aarch64/gc/shared/cardTableBarrierSetAssembler_aarch64.cpp

Lines 116 to 124 in 1077e4f

    
           if (tmp3 != noreg) { 
        
             // TODO 8366717 This change is from before the 'tmp3' arg was added to mainline, check if it's still needed. Same on x64. Also, this should be a __ lea 
        
             // Called by MacroAssembler::pack_inline_helper. We cannot corrupt the dst.base() register 
        
             __ mov(tmp3, dst.base()); 
        
             store_check(masm, tmp3, dst); 
        
           } else { 
        
             // It's OK to corrupt the dst.base() register. 
        
             store_check(masm, dst.base(), dst); 
        
           }

Digging the history looks like it still makes sense, it doesn't look like an accident. If we remove the if-branch, tests are totally on fire. Let's keep it.

`CardTableBarrierSetAssembler::oop_store_at` in `cardTableBarrierSetAssembler_x86.cpp`

Same as before. But it's not even guarantee that InlineTypePassFieldsAsArgs is true. We can have such a backtrace:

CardTableBarrierSetAssembler::oop_store_at (cardTableBarrierSetAssembler_x86.cpp:184)
CardTableBarrierSetAssembler::store_at (cardTableBarrierSetAssembler_x86.cpp:90)
MacroAssembler::store_heap_oop (macroAssembler_x86.cpp:5515)
do_oop_store (templateTable_x86.cpp:148)
(called, for instance, from TemplateTable::putfield_or_static_helper (templateTable_x86.cpp:2964))

where the last give r8 for tmp3. It is not quite clear to me why we don't have a problem in mainline, because it looks like corrupting address base register is a bad idea given that we use it after.

`MacroAssembler::unpack_inline_helper` in `macroAssembler_aarch64.cpp`

First part

valhalla/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

Lines 7163 to 7166 in 1077e4f

    
           // Don't use r14 as tmp because it's used for spilling (see MacroAssembler::spill_reg_for) 
        
           // TODO 8366717 We need to make sure that r14 (and potentially other long-life regs) are kept live in slowpath runtime calls in GC barriers 
        
           Register tmp1 = r10; 
        
           Register tmp2 = r11;

Yes, these registers are saved, as we saw above! I've added some asserts to make sure that they are still in the call_clobbered_gp_registers() set. But it seems a bit redundant: that's what clobber are, registers that we cannot trust to be kept. If we use a register not in this set, the native callee shouldn't touch it. Better safe than sorry?

Second Part

valhalla/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

Lines 7269 to 7271 in 1077e4f

    
           // TODO 8366717 This is probably okay but looks fishy because stream is reset in the "Set null marker to zero" case just above. Same on x64. 
        
           sig_index = stream.sig_index(); 
        
           to_index = stream.regs_index();

This is fine.

Indeed, we might reset the stream just before, but then we exhaust it in the loop.

valhalla/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

Lines 7251 to 7262 in 1077e4f

    
           stream.reset(sig_index, to_index); 
        
           while (stream.next(toReg, bt)) { 
        
             if (sig->at(stream.sig_index())._offset == -1 || 
        
                 bt == T_OBJECT || bt == T_ARRAY) { 
        
               if (toReg->is_stack()) { 
        
                 int st_off = toReg->reg2stack() * VMRegImpl::stack_slot_size; 
        
                 str(zr, Address(sp, st_off)); 
        
               } else { 
        
                 mov(toReg->as_Register(), zr); 
        
               } 
        
             } 
        
           }

This is already what we do before:

valhalla/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

Lines 7182 to 7183 in 1077e4f

    
           stream.reset(sig_index, to_index); 
        
           while (stream.next(toReg, bt)) {

(and this loop has no break).

So entering the if and running the second reset will just leave the stream in the same state.

About this piece, I also took the liberty of changing a tiny bit the ctor of ScalarizedInlineArgsStream as this class was not correct if step is not 1 or -1. We could also have fixed it by changing the line 79 in

valhalla/src/hotspot/share/runtime/signature_cc.hpp

Lines 75 to 81 in 1077e4f

    
           void reset(int sig_idx, int regs_idx) { 
        
             _sig_idx = sig_idx; 
        
             _regs_idx = regs_idx; 
        
             assert(_sig->at(_sig_idx)._bt == (_step > 0) ? T_METADATA : T_VOID, "should be at inline type delimiter"); 
        
             _depth = 1; 
        
             DEBUG_ONLY(_finished = false); 
        
           }

with _depth = abs(_step), but that looks like a recipe for a less explicit constructor, and more obscure code for a feature that is not desirable (skipping elements).

`gen_c2i_adapter` in `sharedRuntime_aarch64.cpp`

First part

valhalla/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp

Lines 623 to 633 in 1077e4f

    
           // TODO 8366717 Is the comment about r13 correct? Isn't that r19_sender_sp? 
        
           // Name some registers to be used in the following code. We can use 
        
           // anything except r0-r7 which are arguments in the Java calling 
        
           // convention, rmethod (r12), and r13 which holds the outgoing sender 
        
           // SP for the interpreter. 
        
           // TODO 8366717 We need to make sure that buf_array, buf_oop (and potentially other long-life regs) are kept live in slowpath runtime calls in GC barriers 
        
           Register buf_array = r10;   // Array of buffered inline types 
        
           Register buf_oop = r11;     // Buffered inline type oop 
        
           Register tmp1 = r15; 
        
           Register tmp2 = r16; 
        
           Register tmp3 = r17;

After some digging, the comment about r13 doesn't seem correct to me. I've changed it.

The second part is again about trusting push_call_clobbered_registers to push enough things. I've added some asserts again.

Second part

valhalla/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp

Lines 645 to 647 in 1077e4f

    
           // TODO 8366717 Do we need to save vectors here? They could be used as arg registers, right? Same on x64. 
        
           RegisterSaver reg_save(true /* save_vectors */); 
        
           OopMap* map = reg_save.save_live_registers(masm, 0, &frame_size_in_words);

Yes, but we save too much. save_live_registers calls push_CPU_state(_save_vectors, ...) defined there:

valhalla/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

Lines 3649 to 3673 in 1077e4f

    
           void MacroAssembler::push_CPU_state(bool save_vectors, bool use_sve, 
        
                                               int sve_vector_size_in_bytes, int total_predicate_in_bytes) { 
        
             push(RegSet::range(r0, r29), sp); // integer registers except lr & sp 
        
             if (save_vectors && use_sve && sve_vector_size_in_bytes > 16) { 
        
               sub(sp, sp, sve_vector_size_in_bytes * FloatRegister::number_of_registers); 
        
               for (int i = 0; i < FloatRegister::number_of_registers; i++) { 
        
                 sve_str(as_FloatRegister(i), Address(sp, i)); 
        
               } 
        
             } else { 
        
               int step = (save_vectors ? 8 : 4) * wordSize; 
        
               mov(rscratch1, -step); 
        
               sub(sp, sp, step); 
        
               for (int i = 28; i >= 4; i -= 4) { 
        
                 st1(as_FloatRegister(i), as_FloatRegister(i+1), as_FloatRegister(i+2), 
        
                     as_FloatRegister(i+3), save_vectors ? T2D : T1D, Address(post(sp, rscratch1))); 
        
               } 
        
               st1(v0, v1, v2, v3, save_vectors ? T2D : T1D, sp); 
        
             } 
        
             if (save_vectors && use_sve && total_predicate_in_bytes > 0) { 
        
               sub(sp, sp, total_predicate_in_bytes); 
        
               for (int i = 0; i < PRegister::number_of_registers; i++) { 
        
                 sve_str(as_PRegister(i), Address(sp, i)); 
        
               } 
        
             } 
        
           }

It will always save registers r0-r29, but also the floating point registers. The difference is that with save_vectors, we save the whole vectors, while without save_vectors, we save only 1 word (8 bytes) per float register (4 words for 4 registers). So, the question is whether we use more that the lower word of each register for arguments, and... I'm not sure. We can keep saving them as it consumes only a bit of stack space, but it runs the same amount of instructions as not saving them.

But I said we save too much, that's because we will save v0-v31 (or pieces of) anyway, while c_farg* and j_farg* are only the registers v0-v7. If I understand well, in this context (we are generating an adapter), around a call (for instance to allocate a buffer) we need to save only the registers that are relevant for the calling convention, like the ones that can hold arguments. If the register v25 (for instance) contained something useful for the caller, it should have been saved by the caller before doing the call, and so before entering the adapter. Since v25 is not used by the adapter or the calling convention, we don't need to save it. Is it worth refining save_live_registers?

`gen_c2i_adapter` in `sharedRuntime_x86_64.cpp`

The x86 version of the last thing:

valhalla/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp

Lines 918 to 921 in 1077e4f

    
           // There is at least an inline type argument: we're coming from 
        
           // compiled code so we have no buffers to back the inline types. 
        
           // Allocate the buffers here with a runtime call. 
        
           OopMap* map = RegisterSaver::save_live_registers(masm, 0, &frame_size_in_words, /*save_vectors*/ false);

/*save_vectors*/ is actually save_wide_vectors. save_live_registers calls push_FPU_state anyway which does fxsave, which saves XMM0-XMM15. If save_wide_vectors, then save_live_registers makes sure to save all of ZMM (the rest of ZMM0-ZMM15, as well as the others ZMM). c_farg* and j_farg* are defined as xmm0-xmm7. XMM registers are already 128 bits (that's 16 bytes), so if we are not using the rest of zmm registers for arguments, it should be fine. Unlike on Aarch64, saving setting save_wide_vectors to true would create a lot more instructions: only one fxsave is enough for the xmm0-15, but the rest must be saved a lot more manually, once piece at a time.

While we probably are good with "only" 16 bytes saved, we need to save only what is used in the calling convention and in the adapter, so my guess would be that we actually only need to save xmm0-xmm7. I'm not sure it's worth improving save_live_registers since saving xmm0-xmm15 is done in a single instruction, we would trade a little of stack space for quite a lot more instructions. Maybe it's just fine to over-save.

`G1BarrierSetAssembler::g1_write_barrier_pre` in `g1BarrierSetAssembler_x86`

valhalla/src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp

Lines 238 to 255 in 1077e4f

    
           if (EnableValhalla && InlineTypePassFieldsAsArgs) { 
        
             // Barriers might be emitted when converting between (scalarized) calling conventions for inline 
        
             // types. Save all argument registers before calling into the runtime. 
        
             // TODO 8366717: use push_set() (see JDK-8283327 push/pop_call_clobbered_registers & aarch64 ) 
        
             __ pusha(); 
        
             __ subptr(rsp, 64); 
        
             __ movdbl(Address(rsp, 0),  j_farg0); 
        
             __ movdbl(Address(rsp, 8),  j_farg1); 
        
             __ movdbl(Address(rsp, 16), j_farg2); 
        
             __ movdbl(Address(rsp, 24), j_farg3); 
        
             __ movdbl(Address(rsp, 32), j_farg4); 
        
             __ movdbl(Address(rsp, 40), j_farg5); 
        
             __ movdbl(Address(rsp, 48), j_farg6); 
        
             __ movdbl(Address(rsp, 56), j_farg7); 
        
           } else { 
        
             // Determine and save the live input values 
        
             __ push_call_clobbered_registers(); 
        
           }

This branch was introduced in 4c3231c "Merge tag 'jdk-20+8' into lworld" on 2022-08-31.

The non-Valhalla side:

was introduced in 9583e36 (8284161: Implementation of Virtual Threads (Preview)) on 2022-05-07
to replace what a push_set from eb4849e (8283327: Add methods to save/restore registers when calling into the VM from C1/interpreter barrier code) on 2022-03-21
that was itself replacing a few push from 2a0986b (8199417: Modularize interpreter GC barriers) on 2018-04-11 (which introduced the whole code)

The Valhalla side:

saving the j_farg* was added in 2455c8e (8251398: [lworld] TestCallingConvention::test36 spuriously fails due to incorrect field value) on 2020-08-11
the pusha comes from d9f3aaa (8242210: [lworld] TestCallingConvention::test36 spuriously fails) on 2020-04-09
which placed the code from 2a0986b (8199417: Modularize interpreter GC barriers) mentioned before.

So it seems the merge jdk was done in a very conservative way, keeping both sides that evolved separately. Yet, the Valhalla side is probably overkill.

   push_call_clobbered_registers(save_fpu = true)
=> push_call_clobbered_registers(exlude = empty set, save_fpu = true)
=> push_set(call_clobbered_xmm_registers(), gp_area_size) if save_fpu

call_clobbered_xmm_registers() is all the xmm on all os, except on windows where it's xmm0-xmm5 + xmm16-xmm_max. That covers the c_farg* (since on windows it only goes from 0 to 3), but it doesn't cover the j_farg* (that are from 0 to 7, even on windows). But on Windows, only xmm0-xmm5 are considered volatile (clobbers). Since we are calling a native function, it should be fine (that is call_clobbered_xmm_registers() and its comment are correct).

Thanks,
Marc

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

JDK-8366717: [lworld] Cleanup defensive fixing of JDK-8365996 (Enhancement - P4)

Reviewers

Tobias Hartmann (@TobiHartmann - Committer) ⚠️ Review applies to 3f013c81

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1824/head:pull/1824
$ git checkout pull/1824

Update a local copy of the PR:
$ git checkout pull/1824
$ git pull https://git.openjdk.org/valhalla.git pull/1824/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1824

View PR using the GUI difftool:
$ git pr show -t 1824

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1824.diff

Using Webrev

Link to Webrev Comment

…gate

bridgekeeper · 2025-12-18T08:28:26Z

👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-12-18T08:28:56Z

@marc-chevalier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8366717: [lworld] Cleanup defensive fixing of JDK-8365996

Reviewed-by: thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the lworld branch:

69399ce: 8373875: [lworld] runtime/cds/DumpSymbolAndStringTable.java fails with --enable-preview
e98a149: 8374028: [lworld] LoadableDescriptors order should be deterministic
1e6e72e: 8372686: [lworld] runtime/cds/serviceability/ReplaceCriticalClasses.java fails with --enable-preview
... and 91 more: https://git.openjdk.org/valhalla/compare/619ef43b00b1e13452a2dd8f4d8320f45e27f8e0...lworld

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the lworld branch, type /integrate in a new comment.

mlbridge · 2025-12-18T08:35:21Z

Webrevs

01: Full - Incremental (bf65d118)
00: Full (3f013c81)

TobiHartmann

Thanks for the detailed analysis Marc! Changes look good to me, I just added a few comments.

src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp

TobiHartmann · 2025-12-19T08:15:29Z

src/hotspot/cpu/aarch64/gc/shared/cardTableBarrierSetAssembler_aarch64.cpp

      if (tmp3 != noreg) {
-        // TODO 8366717 This change is from before the 'tmp3' arg was added to mainline, check if it's still needed. Same on x64. Also, this should be a __ lea
-        // Called by MacroAssembler::pack_inline_helper. We cannot corrupt the dst.base() register
+        // When tmp3 is given, we cannot corrupt the dst.base() register (from MacroAssembler::pack_inline_helper or do_oop_store)


it doesn't look like an accident. If we remove the if-branch, tests are totally on fire. Let's keep it.

It's only the __ mov(tmp3, dst.base()); that could potentially be removed, right? Not the entire branch. If the mov is still needed, should it be guarded by InlineTypePassFieldsAsArgs?

I don't think we can just remove the mov part.

I don't really understand how to interpret that. We have

__ mov(tmp3, dst.base()); store_check(masm, tmp3, dst);

do you suggest we write

store_check(masm, tmp3, dst);

? But then, tmp3 is not set to anything meaningful yet. Or

store_check(masm, dst.base(), dst);

But that is exactly the else-branch. Moreover, I've tried to guard and it makes some test fails. For instance, we can come from

CardTableBarrierSetAssembler::oop_store_at (cardTableBarrierSetAssembler_x86.cpp:184) CardTableBarrierSetAssembler::store_at (cardTableBarrierSetAssembler_x86.cpp:90) MacroAssembler::store_heap_oop (macroAssembler_x86.cpp:5515) do_oop_store (templateTable_x86.cpp:148) (called, for instance, from TemplateTable::putfield_or_static_helper (templateTable_x86.cpp:2964))

that doesn't need InlineTypePassFieldsAsArgs but will give tmp3 and if we don't use it here (for instance, if I replace the condition by InlineTypePassFieldsAsArgs && tmp3 != noreg), we get various test failures.

I don't understand why it's not a problem on mainline, tho. And I agree it's weird. I've looked, but it's hard to investigate why something works rather than not (on a somewhat diverging codebase). I spent some time on that, but eventually decided it's not that critical, as long as we have a way to make it work for Valhalla too.

I was suggesting that we should do what mainline does if !InlineTypePassFieldsAsArgs, would that make sense?

I.e. do store_check(masm, dst.base(), dst);

That is the else-branch, right? Right now we have

if (tmp3 != noreg) { __ mov(tmp3, dst.base()); store_check(masm, tmp3, dst); } else { // It's OK to corrupt the dst.base() register. store_check(masm, dst.base(), dst); }

If I understand well, you're suggesting to write

if (InlineTypePassFieldsAsArgs) { if (tmp3 != noreg) { __ mov(tmp3, dst.base()); store_check(masm, tmp3, dst); } else { // It's OK to corrupt the dst.base() register. store_check(masm, dst.base(), dst); } } else { // as mainline store_check(masm, dst.base(), dst); }

but to me, it looks equivalent to

if (InlineTypePassFieldsAsArgs && tmp3 != noreg) { __ mov(tmp3, dst.base()); store_check(masm, tmp3, dst); } else { // It's OK to corrupt the dst.base() register. store_check(masm, dst.base(), dst); }

since both else-branch are identical.

And this, I experimentally found that it's failing quite a lot, with backtraces as I mentioned. Is there an obvious mistake in my logic I'm missing?

Ah, you are right. I missed the outer if (tmp3 != noreg) { which is Valhalla specific. As we discussed off-thread, it would be good to get a better understanding of why this fails in Valhalla. Thanks a lot!

TobiHartmann · 2025-12-19T08:15:50Z

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

+  RegSet clobbered_gp_regs = MacroAssembler::call_clobbered_gp_registers();
+  assert(clobbered_gp_regs.contains(tmp1), "tmp1 must be saved explicitly if it's not a clobber");
+  assert(clobbered_gp_regs.contains(tmp2), "tmp2 must be saved explicitly if it's not a clobber");
+  assert(clobbered_gp_regs.contains(r14), "r14 must be saved explicitly if it's not a clobber");


Great idea with these asserts!

marc-chevalier added 12 commits December 12, 2025 14:09

Remove push_CPU_state in g1BarrierSetAssembler_aarch64.cpp

5cdce25

Add push_call_clobbered_registers in g1BarrierSetAssembler_aarch64.cpp

e3d7c53

Asserting

ac19049

Try the thing in oop_store_at

1a53198

Actually no

e2220aa

unpack_inline_helper

7ae82a0

gen_c2i_adapter

dbe7c69

Experimenting g1_write_barrier_pre

e234616

Cleanup

c749000

assert in oop_store_at

4adfc04

oop_store_at

6696808

Merge remote-tracking branch 'origin/lworld' into JDK-8366717.investi…

3f013c8

…gate

marc-chevalier marked this pull request as ready for review December 18, 2025 08:29

openjdk bot added the rfr Pull request is ready for review label Dec 18, 2025

TobiHartmann approved these changes Dec 19, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Dec 19, 2025

Pick the right comment again

bf65d11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8366717: [lworld] Cleanup defensive fixing of JDK-8365996 #1824

8366717: [lworld] Cleanup defensive fixing of JDK-8365996 #1824

marc-chevalier commented Dec 18, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Dec 18, 2025

Uh oh!

openjdk bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

mlbridge bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

TobiHartmann left a comment

Uh oh!

Uh oh!

TobiHartmann Dec 19, 2025

Uh oh!

marc-chevalier Dec 19, 2025 •

edited

Loading

Uh oh!

marc-chevalier Dec 19, 2025

Uh oh!

TobiHartmann Dec 19, 2025 •

edited

Loading

Uh oh!

marc-chevalier Dec 19, 2025 •

edited

Loading

Uh oh!

TobiHartmann Dec 19, 2025

Uh oh!

TobiHartmann Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

	// TODO 8366717 This came with 8284161: Implementation of Virtual Threads (Preview) later in May 2022
	// Check if it's sufficient
	//__ push_call_clobbered_registers();
	assert_different_registers(rscratch1, pre_val); // push_CPU_state trashes rscratch1
	__ push_CPU_state(true);

	RegSet MacroAssembler::call_clobbered_gp_registers() {
	RegSet regs = RegSet::range(r0, r17) - RegSet::of(rscratch1, rscratch2);
	#ifndef R18_RESERVED
	regs += r18_tls;
	#endif
	return regs;
	}

	if (tmp3 != noreg) {
	// TODO 8366717 This change is from before the 'tmp3' arg was added to mainline, check if it's still needed. Same on x64. Also, this should be a __ lea
	// Called by MacroAssembler::pack_inline_helper. We cannot corrupt the dst.base() register
	__ mov(tmp3, dst.base());
	store_check(masm, tmp3, dst);
	} else {
	// It's OK to corrupt the dst.base() register.
	store_check(masm, dst.base(), dst);
	}

	// Don't use r14 as tmp because it's used for spilling (see MacroAssembler::spill_reg_for)
	// TODO 8366717 We need to make sure that r14 (and potentially other long-life regs) are kept live in slowpath runtime calls in GC barriers
	Register tmp1 = r10;
	Register tmp2 = r11;

	// TODO 8366717 This is probably okay but looks fishy because stream is reset in the "Set null marker to zero" case just above. Same on x64.
	sig_index = stream.sig_index();
	to_index = stream.regs_index();

	stream.reset(sig_index, to_index);
	while (stream.next(toReg, bt)) {
	if (sig->at(stream.sig_index())._offset == -1 \|\|
	bt == T_OBJECT \|\| bt == T_ARRAY) {
	if (toReg->is_stack()) {
	int st_off = toReg->reg2stack() * VMRegImpl::stack_slot_size;
	str(zr, Address(sp, st_off));
	} else {
	mov(toReg->as_Register(), zr);
	}
	}
	}

	void reset(int sig_idx, int regs_idx) {
	_sig_idx = sig_idx;
	_regs_idx = regs_idx;
	assert(_sig->at(_sig_idx)._bt == (_step > 0) ? T_METADATA : T_VOID, "should be at inline type delimiter");
	_depth = 1;
	DEBUG_ONLY(_finished = false);
	}

	// TODO 8366717 Is the comment about r13 correct? Isn't that r19_sender_sp?
	// Name some registers to be used in the following code. We can use
	// anything except r0-r7 which are arguments in the Java calling
	// convention, rmethod (r12), and r13 which holds the outgoing sender
	// SP for the interpreter.
	// TODO 8366717 We need to make sure that buf_array, buf_oop (and potentially other long-life regs) are kept live in slowpath runtime calls in GC barriers
	Register buf_array = r10; // Array of buffered inline types
	Register buf_oop = r11; // Buffered inline type oop
	Register tmp1 = r15;
	Register tmp2 = r16;
	Register tmp3 = r17;

	// TODO 8366717 Do we need to save vectors here? They could be used as arg registers, right? Same on x64.
	RegisterSaver reg_save(true /* save_vectors */);
	OopMap* map = reg_save.save_live_registers(masm, 0, &frame_size_in_words);

	void MacroAssembler::push_CPU_state(bool save_vectors, bool use_sve,
	int sve_vector_size_in_bytes, int total_predicate_in_bytes) {
	push(RegSet::range(r0, r29), sp); // integer registers except lr & sp
	if (save_vectors && use_sve && sve_vector_size_in_bytes > 16) {
	sub(sp, sp, sve_vector_size_in_bytes * FloatRegister::number_of_registers);
	for (int i = 0; i < FloatRegister::number_of_registers; i++) {
	sve_str(as_FloatRegister(i), Address(sp, i));
	}
	} else {
	int step = (save_vectors ? 8 : 4) * wordSize;
	mov(rscratch1, -step);
	sub(sp, sp, step);
	for (int i = 28; i >= 4; i -= 4) {
	st1(as_FloatRegister(i), as_FloatRegister(i+1), as_FloatRegister(i+2),
	as_FloatRegister(i+3), save_vectors ? T2D : T1D, Address(post(sp, rscratch1)));
	}
	st1(v0, v1, v2, v3, save_vectors ? T2D : T1D, sp);
	}
	if (save_vectors && use_sve && total_predicate_in_bytes > 0) {
	sub(sp, sp, total_predicate_in_bytes);
	for (int i = 0; i < PRegister::number_of_registers; i++) {
	sve_str(as_PRegister(i), Address(sp, i));
	}
	}
	}

	// There is at least an inline type argument: we're coming from
	// compiled code so we have no buffers to back the inline types.
	// Allocate the buffers here with a runtime call.
	OopMap* map = RegisterSaver::save_live_registers(masm, 0, &frame_size_in_words, /save_vectors/ false);

	if (EnableValhalla && InlineTypePassFieldsAsArgs) {
	// Barriers might be emitted when converting between (scalarized) calling conventions for inline
	// types. Save all argument registers before calling into the runtime.
	// TODO 8366717: use push_set() (see JDK-8283327 push/pop_call_clobbered_registers & aarch64 )
	__ pusha();
	__ subptr(rsp, 64);
	__ movdbl(Address(rsp, 0), j_farg0);
	__ movdbl(Address(rsp, 8), j_farg1);
	__ movdbl(Address(rsp, 16), j_farg2);
	__ movdbl(Address(rsp, 24), j_farg3);
	__ movdbl(Address(rsp, 32), j_farg4);
	__ movdbl(Address(rsp, 40), j_farg5);
	__ movdbl(Address(rsp, 48), j_farg6);
	__ movdbl(Address(rsp, 56), j_farg7);
	} else {
	// Determine and save the live input values
	__ push_call_clobbered_registers();
	}

8366717: [lworld] Cleanup defensive fixing of JDK-8365996 #1824

Are you sure you want to change the base?

8366717: [lworld] Cleanup defensive fixing of JDK-8365996 #1824

Conversation

marc-chevalier commented Dec 18, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

G1BarrierSetAssembler::g1_write_barrier_pre in g1BarrierSetAssembler_aarch64.cpp

CardTableBarrierSetAssembler::oop_store_at in cardTableBarrierSetAssembler_aarch64.cpp

CardTableBarrierSetAssembler::oop_store_at in cardTableBarrierSetAssembler_x86.cpp

MacroAssembler::unpack_inline_helper in macroAssembler_aarch64.cpp

First part

Second Part

gen_c2i_adapter in sharedRuntime_aarch64.cpp

First part

Second part

gen_c2i_adapter in sharedRuntime_x86_64.cpp

G1BarrierSetAssembler::g1_write_barrier_pre in g1BarrierSetAssembler_x86

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Dec 18, 2025

Uh oh!

openjdk bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

TobiHartmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TobiHartmann Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

marc-chevalier Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc-chevalier Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

TobiHartmann Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc-chevalier Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TobiHartmann Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

TobiHartmann Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

marc-chevalier commented Dec 18, 2025 •

edited by openjdk bot

Loading

`G1BarrierSetAssembler::g1_write_barrier_pre` in `g1BarrierSetAssembler_aarch64.cpp`

`CardTableBarrierSetAssembler::oop_store_at` in `cardTableBarrierSetAssembler_aarch64.cpp`

`CardTableBarrierSetAssembler::oop_store_at` in `cardTableBarrierSetAssembler_x86.cpp`

`MacroAssembler::unpack_inline_helper` in `macroAssembler_aarch64.cpp`

`gen_c2i_adapter` in `sharedRuntime_aarch64.cpp`

`gen_c2i_adapter` in `sharedRuntime_x86_64.cpp`

`G1BarrierSetAssembler::g1_write_barrier_pre` in `g1BarrierSetAssembler_x86`

openjdk bot commented Dec 18, 2025 •

edited

Loading

mlbridge bot commented Dec 18, 2025 •

edited

Loading

marc-chevalier Dec 19, 2025 •

edited

Loading

TobiHartmann Dec 19, 2025 •

edited

Loading

marc-chevalier Dec 19, 2025 •

edited

Loading