Skip to content

Conversation

@adamperlin
Copy link
Contributor

@adamperlin adamperlin commented Dec 11, 2025

This PR adds a new instrDesc subtype for encoding Wasm local declarations, which have the form count: u32 type:valtype, where valtype can be encoded as a single byte for number and vector types.

The PR also adds code to emit local count and local declarations as part of the prolog for a function. Some of this may need to change based on what we decide around calling convention and register (locals) allocation.

Copilot AI review requested due to automatic review settings December 11, 2025 00:53
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 11, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces support for WebAssembly local variable declarations in the JIT compiler by adding a new instruction descriptor subtype and emission logic for encoding local declarations in the function prolog.

Key Changes

  • Adds instWasmValueType enum and mapping functions to convert between JIT types and WASM types
  • Introduces instrDescLclVarDecl instruction descriptor for encoding local declarations with count and type
  • Implements local declaration emission in the function prolog via genWasmLocals()

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/coreclr/jit/wasmtypesdef.h New header defining WASM value types and type conversion functions
src/coreclr/jit/registeropswasm.h Removes trailing blank line
src/coreclr/jit/instrswasm.h Adds local_cnt and local_decl instructions
src/coreclr/jit/emitwasm.h Declares new emission methods for local declarations
src/coreclr/jit/emitwasm.cpp Implements local declaration emission and descriptor handling
src/coreclr/jit/emitfmtswasm.h Adds IF_LOCAL_CNT and IF_LOCAL_DECL instruction formats
src/coreclr/jit/emit.h Adds instrDescLclVarDecl structure and bitfield flag
src/coreclr/jit/codegenwasm.cpp Adds INS_end instruction to epilog
src/coreclr/jit/codegencommon.cpp Implements prolog local count and declaration emission
src/coreclr/jit/codegen.h Declares genWasmLocals() method

adamperlin and others added 5 commits December 11, 2025 09:16
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
adamperlin and others added 2 commits December 11, 2025 09:22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for (unsigned i = 0; i < compiler->info.compLocalsCount - compiler->info.compArgsCount; i++)
{
LclVarDsc* varDsc = compiler->lvaGetDesc(i);
assert(!varDsc->lvIsParam);
Copy link
Contributor Author

@adamperlin adamperlin Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This assert probably isn't necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are trying to enumerate the IL visible locals?

The jit can generate new locals for lots of reasons, and these may also end up as Wasm locals. So this emission needs to examine all the non-param locals.

Also the wasm register number will not be the local number, typically. So if we have to emit these in increasing wasm register number then more work is needed; we need to walk all the locals and collect their (reg,type) pairs, then sort by reg, then emit these.

Or maybe we can bias the register allocation to make this emission simpler?

If not, we probably need something like:

        // set up a collection object
        jitstd::vector<...> localsig (...);

        // find enregistered locals -- collect reg and type
        for (unsigned i = 0; i < lvaCount; i++)
        {
            LclVarDsc* varDsc = compiler->lvaGetDesc(i);
            if (varDsc->IsInReg() && !varDsc->lvIsParam)
            {
                localsig.push_back(UnpackWasmReg(varDsc->GetRegNum()),  emitter::instWasmValueType type = GetEmitter()->genWasmTypeFromVarType(varDsc->TypeGet())
            }
        }
        
        // sort collection by increasing reg
        jitstd::sort(...);

        // paranoia: verify no dups or holes in the ordering

        // emit types in reg order
        for (... : localsig)
        {
               emit (type)
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I think I will try and implement something like you are suggesting re: enumerating all locals and ensuring that they emitted in sorted register order. Then maybe later, depending on the register allocation implementation, we can come back and change how this is handled?

Copy link
Contributor

@SingleAccretion SingleAccretion Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need RA data structures to communicate the full set of used locals (the algorithm above won't discover locals used as e. g. temporaries). I would suggest doing something like emitting one local of each value type for now to validate the emitter APIs and we can add the real local counts later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're suggesting we just declare locals that we never actually refer to, for now?

struct instrDescLclVarDecl : instrDesc
{
instrDescLclVarDecl() = delete;
cnsval_ssize_t lclCnt;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: is it possible to reuse the constant field inside the parent instr desc for small size constants?

case TYP_INT:
case TYP_REF:
case TYP_BYREF:
return instWasmValueType::I32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're trying to stay flexible and not hard-wire to Wasm32.

For Wasm64 REF/BYREF will be I64.

// TODO-WASM-CQ: do not emit "return" in case this is the last block

instGen(INS_return);
instGen(INS_end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this end required at the end of methods? We can have multiple epilogs (though we might want to reconsider)... and there's no guarantee even with one epilog that the epilog will be at the end of the instruction stream.

Copy link
Contributor Author

@adamperlin adamperlin Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is required. If the epilog isn't guaranteed to be the end, where would be a safe place to ensure that the end comes at the actual end of the instruction stream for a method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably at the end of genCodeForBBlist or similar.

Copy link
Contributor

@SingleAccretion SingleAccretion Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genFnEpilog is a BasicBlock* parameter. I am pretty sure you can just check block->IsLast() and switch whether you return return or end based on that. That would address the TODO above as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be the actual last block when we have funclets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. This case can be detected by checking if the next block is a funclet entry.

Copy link
Contributor Author

@adamperlin adamperlin Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this at the end of genCodeForBBList proved a little bit tricky because of instruction groups. It does look like we track the last used instruction group and could grab that and append after we've finished all method code? Will the last used group be guaranteed to be at the end of the method though? Also, what is the best way to detect if the next block is a funclet? For now I've added an IsLast check to genFnEpilog with a comment that we need to revisit if we're doing funclets.

for (unsigned i = 0; i < compiler->info.compLocalsCount - compiler->info.compArgsCount; i++)
{
LclVarDsc* varDsc = compiler->lvaGetDesc(i);
assert(!varDsc->lvIsParam);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are trying to enumerate the IL visible locals?

The jit can generate new locals for lots of reasons, and these may also end up as Wasm locals. So this emission needs to examine all the non-param locals.

Also the wasm register number will not be the local number, typically. So if we have to emit these in increasing wasm register number then more work is needed; we need to walk all the locals and collect their (reg,type) pairs, then sort by reg, then emit these.

Or maybe we can bias the register allocation to make this emission simpler?

If not, we probably need something like:

        // set up a collection object
        jitstd::vector<...> localsig (...);

        // find enregistered locals -- collect reg and type
        for (unsigned i = 0; i < lvaCount; i++)
        {
            LclVarDsc* varDsc = compiler->lvaGetDesc(i);
            if (varDsc->IsInReg() && !varDsc->lvIsParam)
            {
                localsig.push_back(UnpackWasmReg(varDsc->GetRegNum()),  emitter::instWasmValueType type = GetEmitter()->genWasmTypeFromVarType(varDsc->TypeGet())
            }
        }
        
        // sort collection by increasing reg
        jitstd::sort(...);

        // paranoia: verify no dups or holes in the ordering

        // emit types in reg order
        for (... : localsig)
        {
               emit (type)
        }

@SingleAccretion SingleAccretion self-requested a review December 12, 2025 23:08
Comment on lines 43 to +49
instGen(INS_return);
// TODO-WASM: this condition will not be sufficient if we have funclet to determine if we're at the end of
// codegen for a method. Revisit later.
if (block->IsLast())
{
instGen(INS_end);
}
Copy link
Contributor

@SingleAccretion SingleAccretion Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instGen(INS_return);
// TODO-WASM: this condition will not be sufficient if we have funclet to determine if we're at the end of
// codegen for a method. Revisit later.
if (block->IsLast())
{
instGen(INS_end);
}
if (block->IsLast() || bbIsFuncletBeg(block->Next()))
{
instGen(INS_end);
}
else
{
instGen(INS_return);
}

(And remove the TODO above).

Notably, this is still not quite enough because the method can end up on a non-epilogue-generating basic block (e. g. an infinite loop, or an always-throw call). To handle those cases we will need to add the logical equivalent of:

if (isEndOfFunctionOrFunclet(block) && !inspiredEpilogue(block))
   instGen(INS_end);

To genEmitEndBlock. But no need to do this in this change.

// offset 21:
//
// ( 0, 0) [000040] ------------ il_offset void IL offset: 21
// ( 0, 0) [000040] ------------ / il_offset void IL offset: 21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional?

// genWasmLocals: generate wasm locals for all locals
//
// TODO-WASM: pre-declare all "register" locals
void CodeGen::genWasmLocals()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in codegenwasm.cpp.

// TODO-WASM: prolog zeroing, shadow stack maintenance
// TODO-WASM: proper local count, local declarations, and shadow stack maintenance
assert(compiler->info.compLocalsCount >= compiler->info.compArgsCount);
GetEmitter()->emitIns_I(INS_local_cnt, EA_8BYTE, (unsigned)WasmValueType::Count - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be moved into genWasmLocals. Also, no need for the assert.

#include "jitgcinfo.h"

#if defined(TARGET_WASM)
#include "wasmtypesdef.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed deleting this file (now unused).

Comment on lines +40 to +41
INST(local_cnt, "local.cnt", 0, IF_LOCAL_CNT, 0x00)
INST(local_decl, "local", 0, IF_LOCAL_DECL, 0x00)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe move these together with label, so that all our "pseudos" are together?

Comment on lines +135 to +142
instrDesc* id = this->emitNewInstrLclVarDecl(EA_8BYTE, imm, valType);
insFormat fmt = this->emitInsFormat(ins);

id->idIns(ins);
id->idInsFmt(fmt);

this->dispIns(id);
this->appendToCurIG(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all this->?

Comment on lines +572 to +574
cnsval_ssize_t imm = emitGetLclVarDeclCount(id);
WasmValueType valType = emitGetLclVarDeclType(id);
printf(" %llu %s", (uint64_t)imm, WasmValueTypeName(valType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to keep this disasm format the same as wasm-objdump; the wasm-objdump way to display this would be like so:

 1c08cc: 14 7f                      | local[6..25] type=i32
 1c08ce: 01 7e                      | local[26] type=i64
 1c08d0: 01 7f                      | local[27] type=i32

It is useful since it also lists the indices instead of the raw encoding. However, here we only have counts. The easiest way to fix that would be to add another field to idDebugOnlyInfo that'd store the "base" index.

"Count",
};

inline const char* WasmValueTypeName(WasmValueType type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (implementation) should be moved to registeropswasm.cpp (and WasmValueTypeNames can be made a static array local to the function). In general it is an anti-pattern to have arrays of data in headers (since they get duplicated into each object file and may even get duplicated in the binary depending on how careful are you with linkage).


inline const char* WasmValueTypeName(WasmValueType type)
{
static_assert(ArrLen(WasmValueTypeNames) == static_cast<unsigned>(WasmValueType::Count) + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never see a WasmValueType type equal to WasmValueType::Count. You can exclude that from printing (e. g. have an assert that type < WasmValueType::Count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants