-
Notifications
You must be signed in to change notification settings - Fork 259
Implement __sync builtins for thumbv6-none-eabi #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
compiler-builtins/src/thumbv6k.rs
Outdated
| }; | ||
| } | ||
|
|
||
| atomic_rmw!(@old __sync_fetch_and_add_1, u8, |a: u8, b: u8| a.wrapping_add(b)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro calls from here to L319 are the same as in arm_linux.rs (the definitions of the called macros are different), so I think it would be preferable to extract them into a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in af8d4af.
compiler-builtins/src/thumbv6k.rs
Outdated
| "mcr p15, #0, {zero}, c7, c10, #5" | ||
| }; | ||
| } | ||
| macro_rules! asm_use_dmb { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this macro is just an asm block, but with macro expansion (and an extra {zero} register which I guess is useful somewhere). But it could probably do with a comment to explain all that because it's really non-obvious. Especially as this macro doesn't have anything to do with dmb (other than it's written as a macro that needs to be expanded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally a helper for writing code that required no changes other than using the Data Memory Barrier (DMB instruction or equivalent instructions).
In this PR, it is indeed unclear, so it can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed asm_use_dmb! macro and renamed dmb! macro to cp15_barrier! (to more clearly reflect what it does) in a1a5401.
|
This looks OK to me, but it's a bit beyond my Arm experience. |
tgross35
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, thanks for the patch. This should wait until one of the compiler leads approves the target PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could you name this arm_thumb_sync_builtins or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we may as well group this by functionality like we have for other areas, something like:
src/sync/
arm_linux.rs
thumbv6k.rs
arm_thumb_shared.rs // this file
I'll move the aarch64 builtins there too.
| // FIXME(safety): preconditions review needed | ||
| let curval = unsafe { T::load_relaxed(ptr) }; | ||
| let newval = f(curval); | ||
| // FIXME(safety): preconditions review needed | ||
| if unsafe { T::cmpxchg(ptr, curval, newval) } == curval { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind filling these in? Most of them are probably just related to pointer validity.
| dst = in(reg) dst, | ||
| // Note: this cast must be a zero-extend since loaded value | ||
| // which compared to it is zero-extended. | ||
| old = in(reg) old as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u32::from(old), since this should never truncate
| concat!("ldrex", $suffix, " {out}, [{dst}]"), // atomic { out = *dst; EXCLUSIVE = dst } | ||
| "cmp {out}, {old}", // if out == old { Z = 1 } else { Z = 0 } | ||
| "bne 3f", // if Z == 0 { jump 'cmp-fail } | ||
| cp15_barrier!(), // fence | ||
| "2:", // 'retry: | ||
| concat!("strex", $suffix, " {r}, {new}, [{dst}]"), // atomic { if EXCLUSIVE == dst { *dst = new; r = 0 } else { r = 1 }; EXCLUSIVE = None } | ||
| "cmp {r}, #0", // if r == 0 { Z = 1 } else { Z = 0 } | ||
| "beq 3f", // if Z == 1 { jump 'success } | ||
| concat!("ldrex", $suffix, " {out}, [{dst}]"), // atomic { out = *dst; EXCLUSIVE = dst } | ||
| "cmp {out}, {old}", // if out == old { Z = 1 } else { Z = 0 } | ||
| "beq 2b", // if Z == 1 { jump 'retry } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding: why is the barrier needed between the first ldrex/strex, but not between subsequent l/s pairs?
| // SAFETY: the caller must uphold the safety contract. | ||
| // casts are okay because $ty and $base implement the same layout. | ||
| unsafe { <$base as Atomic>::load_relaxed(src.cast::<$base>()) as Self } | ||
| } | ||
| #[inline] | ||
| unsafe fn cmpxchg(dst: *mut Self, old: Self, new: Self) -> Self { | ||
| // SAFETY: the caller must uphold the safety contract. | ||
| // casts are okay because $ty and $base implement the same layout. | ||
| unsafe { | ||
| <$base as Atomic>::cmpxchg(dst.cast::<$base>(), old as $base, new as $base) | ||
| as Self | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.cast_{un,}signed() is stable now and would make the as casts more clear
| #[rustfmt::skip] | ||
| macro_rules! atomic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think rustfmt does any changes here
If you get the chance at some point, I would be rather interested in having some of your atomic test suite here. Especially the part you mention since we don't have any no-std tests set up.
Assuming this would allow us to set |
This is a PR for thumbv6-none-eabi (bere-metal Armv6k in Thumb mode) which proposed to be added by rust-lang/rust#150138.
Armv6k supports atomic instructions, but they are unavailable in Thumb mode unless Thumb-2 instructions available (v6t2).
Using Thumb interworking (can be used via
#[instruction_set]) allows us to use these instructions even from Thumb mode without Thumb-2 instructions, but LLVM does not implement that processing (as of LLVM 21), so this PR implements it in compiler-builtins.The code around
__syncbuiltins is basically copied fromarm_linux.rswhich uses kernel_user_helpers for atomic implementation.The atomic implementation is a port of my atomic-maybe-uninit inline assembly code.
This PR has been tested on QEMU 10.2.0 using patched compiler-builtins and core that applied the changes in this PR and rust-lang/rust#150138 and the portable-atomic no-std test suite (can be run with
./tools/no-std.sh thumbv6-none-eabion that repo) which tests wrappers aroundcore::sync::atomic. (Note that the target-spec used in test sets max-atomic-width to 32 and atomic_cas to true, unlike the current rust-lang/rust#150138.) The original atomic-maybe-uninit implementation has been tested on real Arm hardware.(Note that Armv6k also supports 64-bit atomic instructions, but they are skipped here. This is because there is no corresponding code in
arm_linux.rs(since the kernel requirements increased in 1.64, it may be possible to implement 64-bit atomics there as well. see also taiki-e/portable-atomic#82), the code becomes more complex than for 32-bit and smaller atomics.)cc @thejpster (target maintainer)
I'll undraft this PR once the target maintainer approved this approach.