Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
269 changes: 156 additions & 113 deletions doc/FASTALLOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,21 +42,14 @@ During allocation, it's necessary to determine which VReg is in a PReg
to generate the right move(s) for eviction.
`vreg_in_preg` is a vector that stores this information.

## Available PRegs For Use In Instruction (`available_pregs_for_regs`, `available_pregs_for_any`)
## Available PRegs For Use In Instruction (`available_pregs`)

These are a 2-tuples of `PRegSet`s, a bitset of physical registers, one for
the instruction's early phase and one for the late phase.
They are used to determine which registers are available for use in the
early/late phases of an instruction.
This is a 2-tuple of PRegSets, a bitset of physical registers, one for the
instruction's early phase and one for the late phase. They are used to determine
which registers are available for use in the early/late phases of an instruction.

Prior to the beginning of any instruction's allocation, `available_pregs_for_regs`
is reset to include all allocatable physical registers, some of which may already
contain a VReg.

The two sets have the same function, except that `available_pregs_for_regs` is
used to determine which registers are available for operands with a register-only
constraint while `available_pregs_for_any` is used to determine which registers
are available for operands with no constraints.
Prior to the beginning of any instruction's allocation, this set is reset to
include all allocatable physical registers, some of which may already contain a VReg.

## VReg Liverange Location Info (`vreg_to_live_inst_range`)

Expand All @@ -66,6 +59,35 @@ to be in throughout that liverange.
This is used to build the debug locations vector after allocation
is complete.

## Number of Available Registers (`num_available_registers`)

These are counters that keep track of the number of registers that
can be allocated to any-reg and anywhere operands for int, float and
vector registers, in the late, early and both phases of an instruction.

Prior to the beginning of any instruction, this set is reset to
include the number of all allocatable physical registers.

## Number of Any-Reg Operands (`num_any_reg_operands`)

These are counters that keep track of the number of any-reg
operands that are yet to be allocated in an instruction.

It is closely associated with `num_available_registers` and
are used together for the same purpose.
The two counters are used together to avoid allocating too many
registers to anywhere operands when any-reg operands need them.
When register reservations are made, the corresponding number
of available registers in `num_available_registers` are decremented.
When an any-reg operand is allocated, the corresponding
`num_any_reg_operands` is decremented.
The sole purpose of this is so that when anywhere operands are
allocated, a check can be made to see if the available registers
`num_available_registers` are enough to cover the remaining
any-reg operands in the instruction `num_any_reg_operands`,
to determine whether or not it is safe to allocate a register to
the operand instead of a spillslot.

# Allocation Process Breakdown

Allocation proceeds in reverse: from the last block to the first block,
Expand All @@ -76,11 +98,11 @@ in four phases: selection, assignment, eviction, and edit insertion.

## Allocation Phase: Selection

In this phase, a PReg is selected from `available_pregs_for_regs` or
`available_pregs_for_any` for the operand based on the operand constraints.
Depending on the operand's position, the selected PReg is removed from either
the early or late phase or both, indicating that the PReg is no longer available
for allocation by other operands in that phase.
In this phase, a PReg is selected from available_pregs for the operand
based on the operand constraints. Depending on the operand's position
the selected PReg is removed from either the early or late phase or both,
indicating that the PReg is no longer available for allocation by other
operands in that phase.

## Allocation Phase: Assignment

Expand Down Expand Up @@ -128,114 +150,112 @@ arguments will be in their dedicated spillslots.
4. At the beginning of a block, all branch parameters and livein
virtual registers will be in their dedicated spillslots.

# Instruction Allocation

To allocate a single instruction, the first step is to reset the
`available_pregs_for_regs` sets to all allocatable PRegs.

Next, the selection phase is carried out for all operands with
fixed register constraints: the registers they are constrained to use are
marked as unavailable in the `available_pregs_for_regs` set, depending on the
phase that they are valid in. If the operand is an early use or late
def operand, then the register will be marked as unavailable in the
early set or late set, respectively. Otherwise, the PReg is marked
as unavailable in both the early and late sets, because a PReg
assigned to an early def or late use operand cannot be reused by another
operand in the same instruction.

Next, all clobbers are removed from the early and late `available_pregs_for_regs`
sets to avoid allocating a clobber to a def.

Next, registers are reserved for register-only operands and marked as
unavailable in `available_pregs_for_regs`.
Then `available_pregs_for_any` for the instruction is derived from
`available_pregs_for_regs` by marking all other registers not reserved as
available. This is to avoid a situation where operands with no
constraints take up all available registers, leaving none for operands
with register-only constraints.

After selection for register-only operands, the eviction phase is
carried out for fixed register operands. Any VReg in their selected
registers, indicated by `vreg_in_preg`, is evicted: a dedicated
spillslot is allocated for the VReg (if it doesn't have one already),
an edit is inserted to move from the slot to the PReg, which is where
the VReg expected to be after the instruction, and its current
allocation in `vreg_allocs` is set to the spillslot.
The same is then done for clobbers, then register-only operands.

Next, the selection, assignment, eviction, and edit insertion phases are
carried out for all def operands. When each def operand's allocation is
complete, the def operand is immediately freed, marking the end of the
VReg's liverange. It is removed from the `live_vregs` set, its allocation
in `vreg_allocs` is set to none, and if it was in a PReg, that PReg's
entry in `vreg_in_preg` is set to none. The selection and eviction phases
are omitted if the operand has a fixed constraint, as those phases have
already been carried out.

Next, the selection, assignment, and eviction phases are carried out for all
use operands. As with def operands, the selection and eviction phases are
omitted if the operand has a fixed constraint, as those phases have already
been carried out.
There is an exception to invariant 2 and 3: if a branch instruction defines
the VReg used as a branch arg, then there may be no opportunity for
the VReg to be placed in its spillslot.

Then the edit insertion phase is carried out for all use operands.
# Instruction Allocation

Lastly, if the instruction being processed is a branch instruction, the
parallel move resolver is used to insert edits before the instruction
to move from the branch arguments spillslots to the block parameter
spillslots.
To allocate a single instruction, the first step is to reset the
`available_pregs` sets to all allocatable PRegs.

Next, the selection phase is carried out for all operands with
fixed register constraints: the registers they are constrained
to use are marked as unavailable in the `available_pregs` set,
depending on the phase that they are valid in. If the operand
is an early use or late def operand, then the register will be
marked as unavailable in the early set or late set, respectively.
Otherwise, the PReg is marked as unavailable in both the early
and late sets, because a PReg assigned to an early def or late
use operand cannot be reused by another operand in the same instruction.

After selection for fixed register operands, the eviction phase
is carried out for fixed register operands. Any VReg in their
selected registers, indicated by vreg_in_preg, is evicted: a
dedicated spillslot is allocated for the VReg (if it doesn't
have one already), an edit is inserted to move from the slot to
the PReg, which is where the VReg expected to be after the instruction,
and its current allocation in vreg_allocs is set to the spillslot.

Next, all clobbers are removed from the late `available_pregs` set
to avoid allocating a clobber to a late operand.

Next, the selection, assignment, eviction, and edit insertion
phases are carried out for all late operands, both defs and uses.
Then the early operands are processed in the same manner, after the
late operands.

In both late and early processing, when a def operand's
allocation is complete, the def operand is immediately freed,
marking the end of the VReg's liverange. It is removed from the
`live_vregs` set, its allocation in `vreg_allocs` is set to none,
and if it was in a PReg, that PReg's entry in `vreg_in_preg` is
set to none. The selection and eviction phases are omitted if the
operand has a fixed constraint, as those phases have already been
carried out.

When a use operand is processed, the selection, assignment, and eviction
phases only are carried out. As with def operands, the selection and
eviction phases are omitted if the operand has a fixed constraint, as
those phases have already been carried out.

After the late and early operands have completed processing,
the edit insertion phase is carried out for all use operands.

Lastly, if the instruction being processed is a branch instruction,
the parallel move resolver is used to insert edits before the instruction
to move from the branch arguments spillslots to the block parameter spillslots.

## Operand Allocation

During the allocation of an operand, a check is first made to
see if the VReg's current allocation as indicated in
`vreg_allocs` is within the operand constraints.

If it is, the assignment phase is carried out, setting the final
allocation output's entry for that operand to the allocation.
The selection phase is carried out, marking the PReg
(if the allocation is a PReg) as unavailable in the respective
early/late sets. The state of the LRUs is also updated to reflect
the new most recently used PReg.
No eviction needs to be done since the VReg is already in the
allocation and no edit insertion needs to be done either.

On the other hand, if the VReg's current allocation is not within
constraints, the selection and eviction phases are carried out for
non-fixed operands. First, a set of PRegs that can be drawn from is
created from `available_pregs_for_regs` or `available_pregs_for_any`,
depending on whether the operand has a register-only constraint
or no constraint. For early uses and late defs,
this draw-from set is the early set or late set, respectively.
For late uses and early defs, the draw-from set is an intersection
of the available early and late sets (because a PReg used for a late
use can't be reassigned to another operand in the early phase;
likewise, a PReg used for an early def can't be reassigned to another
operand in the late phase).
The LRU for the VReg's regclass is then traversed from the end to find
the least recently used PReg in the draw-from set. Once a PReg is found,
it is marked as the most recently used in the LRU, unavailable in both
available pregs sets, and whatever VReg was in it before is evicted.

The assignment phase is carried out next. The final allocation for the
If it is, the assignment phase is carried out, setting the
final allocation output's entry for that operand to the allocation.
The selection phase is carried out, marking the PReg (if the
allocation is a PReg) as unavailable in the respective early/late
sets. The state of the LRUs is also updated to reflect the new
most recently used PReg. No eviction needs to be done since the
VReg is already in the allocation and no edit insertion needs to
be done either.

On the other hand, if the VReg's current allocation is not within
constraints, the selection and eviction phases are carried out
for non-fixed operands. First, a set of PRegs that can be drawn
from is created from `available_pregs`. For early uses and late
defs, this draw-from set is the early set or late set, respectively.
For late uses and early defs, the draw-from set is an intersection
of the available early and late sets (because a PReg used for a
late use can't be reassigned to another operand in the early phase;
likewise, a PReg used for an early def can't be reassigned to another
operand in the late phase). The LRU for the VReg's regclass is then
traversed from the end to find the least recently used PReg in the
draw-from set. Once a PReg is found, it is marked as the most recently
used in the LRU, unavailable in the `available_pregs` sets, and whatever
VReg was in it before is evicted.

The assignment phase is carried out next. The final allocation for the
operand is set to the selected register.

If the newly allocated operand has not been allocated before, that is,
this is the first use/def of the VReg encountered; the VReg is
inserted into `live_vregs` and marked as the value in the allocated
PReg in `vreg_in_preg`.
If the newly allocated operand has not been allocated before,
that is, this is the first use/def of the VReg encountered;
the VReg is inserted into live_vregs and marked as the value
in the allocated PReg in vreg_in_preg.

Otherwise, if the VReg has been allocated before, then an edit will need
to be inserted to ensure that the dataflow remains correct.
The edit insertion phase is now carried out if the operand is a def
operand: an edit is inserted after the instruction to move from the
new allocation to the allocation it's expected to be in after the
instruction.
Otherwise, if the VReg has been allocated before, then an edit
will need to be inserted to ensure that the dataflow remains correct.
The edit insertion phase is now carried out if the operand is a
def operand: an edit is inserted after the instruction to move
from the new allocation to the allocation it's expected to be
in after the instruction.

The edit insertion phase for use operands is done after all operands
have been processed. Edits are inserted to move from the current
allocations in `vreg_allocs` to the final allocated position before
the instruction. This is to account for the possibility of multiple
uses of the same operand in the instruction.
The edit insertion phase for use operands is done after all
operands have been processed. Edits are inserted to move from
the current allocations in `vreg_allocs` to the final allocated
position before the instruction. This is to account for the
possibility of multiple uses of the same operand in the instruction.

## Reuse Operands

Expand Down Expand Up @@ -283,6 +303,15 @@ It's after these edits have been inserted that the parallel move
resolver is then used to generate and insert edits to move from
those spillslots to the spillslots of the block parameters.

There is an exception to the invariant - it's possible that the
branch argument is defined in the same branch instruction.
If the branch argument VReg has a fixed-reg constraint, the move
will have to be done in the successor.
If it has an stack or anywhere constraint, it is allocated directly
into the block param's spillslot, so there is no need to insert moves.
The other constraints, reuse and any-reg, are not supported in this
case.

# Across Blocks

When a block completes processing, some VRegs will still be live.
Expand All @@ -297,6 +326,20 @@ to be in from the first instruction.
All block parameters are freed, just like defs, and liveins' current
allocations in `vreg_allocs` are set to their spillslots.

Any block parameter that receives a branch argument from a predecessor
where the argument VReg was defined in the branch instruction will
also need moves inserted at the block beginning because the predecessor
couldn't have inserted the required moves.
All predecessors branch arguments to the block are checked to see if any
are defined in the same branch instruction. For all branch arguments that
are defined in the branch instruction and have fixed-reg constraints, a
move will be inserted from the fixed-reg to the block param's spillslot
at the beginning of the block. In the case of stack and anywhere constraints,
nothing is done, because in that case, the VRegs used as the branch arguments
will be defined directly into the block param's spillslot. Reuse and any-reg
constraints are not supported and aren't handled.


# Edits Order

`regalloc2`'s outward interface guarantees that edits are in
Expand Down
10 changes: 9 additions & 1 deletion src/fastalloc/iter.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use crate::{Operand, OperandConstraint, OperandKind};
use crate::{Operand, OperandConstraint, OperandKind, OperandPos};

pub struct Operands<'a>(pub &'a [Operand]);

Expand Down Expand Up @@ -37,6 +37,14 @@ impl<'a> Operands<'a> {
pub fn any_reg(&self) -> impl Iterator<Item = (usize, Operand)> + 'a {
self.matches(|op| matches!(op.constraint(), OperandConstraint::Reg))
}

pub fn late(&self) -> impl Iterator<Item = (usize, Operand)> + 'a {
self.matches(|op| op.pos() == OperandPos::Late)
}

pub fn early(&self) -> impl Iterator<Item = (usize, Operand)> + 'a {
self.matches(|op| op.pos() == OperandPos::Early)
}
}

impl<'a> core::ops::Index<usize> for Operands<'a> {
Expand Down
8 changes: 8 additions & 0 deletions src/fastalloc/lru.rs
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,8 @@ pub struct PartedByRegClass<T> {
pub items: [T; 3],
}

impl<T: Copy> Copy for PartedByRegClass<T> {}

impl<T> Index<RegClass> for PartedByRegClass<T> {
type Output = T;

Expand All @@ -286,6 +288,12 @@ impl<T> IndexMut<RegClass> for PartedByRegClass<T> {
}
}

impl<T: PartialEq> PartialEq for PartedByRegClass<T> {
fn eq(&self, other: &Self) -> bool {
self.items.eq(&other.items)
}
}

/// Least-recently-used caches for register classes Int, Float, and Vector, respectively.
pub type Lrus = PartedByRegClass<Lru>;

Expand Down
Loading