Skip to content

Lack of unrolling and very high register usage #462

@nazar-pc

Description

@nazar-pc

I have many places in the code that can be condensed to something like this:

const MAX_BUCKET_SIZE: usize = 512;
const WORKGROUP_SIZE: u32 = 256;

unsafe {
    core::hint::assert_unchecked(matches_count <= MAX_BUCKET_SIZE);
}
for index in (local_invocation_id..matches_count as u32).step_by(WORKGROUP_SIZE as usize) {

Can also be rewritten in a way that will probably produce some helpful runtime code in SPIR-V since I suspect the hint will be lost in translation:

const MAX_BUCKET_SIZE: usize = 512;
const WORKGROUP_SIZE: u32 = 256;

for index in (local_invocation_id..matches_count.min(MAX_BUCKET_SIZE) as u32)
    .step_by(WORKGROUP_SIZE as usize)
{

It is not difficult for me to see that there will be at most two loop iterations here per invocation. However, it is not something compiler sees today.

The result is much higher register usage, impacting occupancy in a big way.

Rewriting it to inner function that is called twice with explicit bounds checks fixes register usage (though I hit #461 when doing so), but is far from idiomatic Rust and is quite painful to do manually in all such cases.

I wish end-to-end compilation chain was aware of things like this, it is a very important pattern for performance.

In fact loop unrolling is extremely bad right now, even fixed loops with 3-4 iterations and one or several simply ALU instructions in it are not unrolled and balloon register usage.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions