Skip to content

Conversation

@nspin
Copy link
Member

@nspin nspin commented Jul 9, 2024

There are cases where it would be useful to be able specify the program image separately from the symbols that will be used to manipulate it.

seL4/rust-sel4#167 adds support for resettable protection domains. The best way that I've come up with to achieve this requires manipulating the program image after linking. Specifically, a new program image is created from the original, and the result is an ELF file with only program headers, and no section headers. The symbol and debugging information in the original ELF file still apply to the final program image. So, we are left with one ELF file specifying the program image, and one ELF file with symbol and debugging information.

This PR adds support for the optional path_for_symbols attribute on the <program_image> element, which allows for such a split program image.

@nspin nspin force-pushed the pr/split-program-image branch 4 times, most recently from c8ddcde to 9d5c095 Compare July 10, 2024 07:16
@nspin nspin force-pushed the pr/split-program-image branch 2 times, most recently from caad5c4 to 2492fa6 Compare September 18, 2024 22:29
@nspin
Copy link
Member Author

nspin commented Sep 18, 2024

I've updated the PR. I've also moved code around to minimize the diff to make it easier to review. If it is going to be accepted, I'll reorganize the code again in an additional to make it more natural to read linearly.

@nspin nspin force-pushed the pr/split-program-image branch from 2492fa6 to 4ef80cb Compare October 30, 2025 09:31
@nspin
Copy link
Member Author

nspin commented Oct 30, 2025

I've updated this PR on top of #337

@Indanz
Copy link

Indanz commented Oct 30, 2025

seL4/rust-sel4#167 adds support for resettable protection domains. The best way that I've come up with to achieve this requires manipulating the program image after linking.

If you copied the data section to a read-only area of the same size after linking, and have the program use that to init its data section at startup, you ended up doing the same thing I did to implement restarting.

It's by far the cleanest solution, because it makes restarting a trivial operation. The alternative would be to basically reload (part of) the program when restarting, which has much higher complexity. If Microkit wants to have restartability support, it probably wants to do the same thing.

Specifically, a new program image is created from the original, and the result is an ELF file with only program headers, and no section headers. The symbol and debugging information in the original ELF file still apply to the final program image. So, we are left with one ELF file specifying the program image, and one ELF file with symbol and debugging information.

This isn't strictly needed to achieve your goal, but if you're modifying the binary anyway, you can as well strip it.

This PR adds support for the optional path_for_symbols attribute on the <program_image> element, which allows for such a split program image.

This seems useful for supporting more aggressively stripped binaries in general.

@nspin
Copy link
Member Author

nspin commented Nov 9, 2025

If you copied the data section to a read-only area of the same size after linking, and have the program use that to init its data section at startup, you ended up doing the same thing I did to implement restarting.

It's by far the cleanest solution, because it makes restarting a trivial operation. The alternative would be to basically reload (part of) the program when restarting, which has much higher complexity. If Microkit wants to have restartability support, it probably wants to do the same thing.

Yes, this is exactly what I did, and it did indeed turn out to work quite well.

This isn't strictly needed to achieve your goal, but if you're modifying the binary anyway, you can as well strip it.

Do you mean that it’s not necessary to end up with an ELF file with only program headers, no section headers? That’s true - ending up with an ELF file like this isn’t a goal of the ELF modification tool, but I’ve found that the simplest implementation of such a tool results in an ELF file like this.

Do you know of some linker script magic to make this not this case?

seL4/rust-sel4#167

@Indanz
Copy link

Indanz commented Nov 10, 2025

This isn't strictly needed to achieve your goal, but if you're modifying the binary anyway, you can as well strip it.

Do you mean that it’s not necessary to end up with an ELF file with only program headers, no section headers? That’s true - ending up with an ELF file like this isn’t a goal of the ELF modification tool, but I’ve found that the simplest implementation of such a tool results in an ELF file like this.

Yes, that's what I meant.

Do you know of some linker script magic to make this not this case?

I added a symbol with the size of the data section to the linker script, and used that to create the read-only section of the right size. Then I used objcopy --dump-section to extract the data section and write it to the read-only section with objcpy --update-section.

(But I vaguely remember that back then AArch64 objcopy didn't support update-section yet and instead I put the read-only section to the beginning or end of the file or something like that and used cat instead, can't remember the details.)

@nspin
Copy link
Member Author

nspin commented Nov 11, 2025

I added a symbol with the size of the data section to the linker script, and used that to create the read-only section of the right size.

Ah, I forgot to mention: one constraint I'm working under is that I don't control the linker script. I can supply the linker with linker script fragments that are appended to the linker script with -T, but that's it. I tried to find a way to do what you've done within those constraints, but I ultimately did not find one.

@Indanz
Copy link

Indanz commented Nov 11, 2025

Ah, I forgot to mention: one constraint I'm working under is that I don't control the linker script. I can supply the linker with linker script fragments that are appended to the linker script with -T, but that's it. I tried to find a way to do what you've done within those constraints, but I ultimately did not find one.

That makes no sense to me. How do you assure that your init code runs first and how does it find all writeable sections and the BSS without linker script control? Are you assuming there is one data and one bss section and hope for the best?

Edit: If your code can figure all that out during runtime, you should be able to do the same at link time somehow.

@nspin
Copy link
Member Author

nspin commented Nov 11, 2025

How do you assure that your init code runs first

-T reset.lds

where reset.lds looks something like:

SECTIONS {
    .persistent : {
        *(.persistent .persistent.*)
    }
} INSERT BEFORE .data;

ASSERT(DEFINED(_reset), "_reset is not defined")

ENTRY(_reset)

how does it find all writeable sections and the BSS without linker script control

A simple utility that operates on the ELF file after link-time. Enabling the use of a tool like this is the purpose of this PR. The tool operates on the segment level, not the section level. It crates a new ELF file base on the old one, with all of the read-only segments of the original ELF, and all of the writeable segments deflated (same location and size, but 0-initialized). It then creates a new read-only segment that includes the information that the entrypoint prologue (_reset) needs to initialize those deflated writeable segments. That is, the data that they originally contained, and information about where that data goes.

https://github.com/seL4/rust-sel4/blob/main/crates/sel4-reset/cli/src/main.rs

Yes, all of the relevant information is in theory available at link-time, but due to the constraints arising from the fact that this functionality is provided as a library(+ command line utility), I can't leverage it in a linker script.

@Indanz
Copy link

Indanz commented Nov 11, 2025

Seems a reasonable approach considering your limitations. Only assumption you seem to make is that the real entry point is called _start, but you could fix that in your tool if you need to.

Overall seems quite complicated though, it's much simpler if you have one contiguous region to copy, no need for stacks or other bits and pieces.

@nspin
Copy link
Member Author

nspin commented Nov 13, 2025

Only assumption you seem to make is that the real entry point is called _start, but you could fix that in your tool if you need to.

I suppose that since I'm already only one assumption away from no assumptions, perhaps I should remove this assumption too. As you mention, doing so would not be hard.

Overall seems quite complicated though, it's much simpler if you have one contiguous region to copy, no need for stacks or other bits and pieces.

Yes, the (almost)-no-assumptions constraint does complicate things a bit.

@nspin nspin force-pushed the pr/split-program-image branch from 4ef80cb to 97d2be0 Compare November 18, 2025 01:23
Signed-off-by: Nick Spinale <nick@nickspinale.com>
Signed-off-by: Nick Spinale <nick@nickspinale.com>
Signed-off-by: Nick Spinale <nick@nickspinale.com>
@nspin nspin force-pushed the pr/split-program-image branch from 97d2be0 to c1e2793 Compare November 21, 2025 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants