use fp16 for rope frequencies to reduce rounding errors #60

ngc92 · 2026-01-19T10:59:32Z

due to the boundedness of sin/cos, we don't need bf16 range there and can instead invest in better precision. By keeping it 16 bit, no changes to the kernels are needed.

Copilot

Pull request overview

This PR changes the data type used for ROPE (Rotary Position Embedding) frequency storage from BF16 to FP16 in BF16 models. The motivation is that sine/cosine values are bounded in [-1, 1], so the extended exponent range of BF16 is unnecessary, and FP16's better mantissa precision reduces rounding errors.

Changes:

Modified frequency tensor allocation to use FP16 instead of BF16 when the model uses BF16
Updated ROPE kernel templates to support mixed precision (BF16 I/O with FP16 frequencies)
Updated function signatures across the codebase to reflect the new type pairing

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/models/llama_run_state.cpp	Allocates frequency tensors as FP16 for BF16 models; precomputes frequencies using FP16
src/kernels/rope.cu	Adds FP16 precompute function; templates kernel to support different I/O and frequency types; updates function signatures
src/kernels/kernels.h	Updates function declarations to use `half*` for frequencies with BF16 I/O
src/kernels/kernels.cpp	Updates tensor dispatchers to extract frequencies as `half` when model is BF16

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/models/llama_run_state.cpp

src/kernels/rope.cu

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/models/llama_run_state.cpp:84

The function should handle unsupported dtypes. If Config.DType is set to an FP8 type or another unsupported dtype, the function will silently return an uninitialized tensor. Consider adding an else clause that throws an error for unsupported dtypes, similar to the pattern used in rope_forward and rope_backward functions.

    if(dtype == ETensorDType::FP16) {
        std::vector<half> freq_cpu(Config.MaxPositionEmbeddings * 2 * Config.head_size());
        precompute_freqs_cis(freq_cpu.data(), Config.head_size(), Config.MaxPositionEmbeddings, Config.RopeTheta);
        CUDA_CHECK(cudaMemcpy(freq.Data, freq_cpu.data(), freq_cpu.size() * sizeof(half), cudaMemcpyHostToDevice));
    } else if (dtype == ETensorDType::FP32) {
        std::vector<float> freq_cpu(Config.MaxPositionEmbeddings * 2 * Config.head_size());
        precompute_freqs_cis(freq_cpu.data(), Config.head_size(), Config.MaxPositionEmbeddings, Config.RopeTheta);
        CUDA_CHECK(cudaMemcpy(freq.Data, freq_cpu.data(), freq_cpu.size() * sizeof(float), cudaMemcpyHostToDevice));
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 19, 2026 10:59

Copilot started reviewing on behalf of ngc92 January 19, 2026 10:59 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

src/models/llama_run_state.cpp Outdated Show resolved Hide resolved

src/kernels/rope.cu Show resolved Hide resolved

ngc92 added 2 commits January 20, 2026 11:51

use fp16 for rope frequencies to reduce rounding errors

2cae807

adjust tests

ea66cd6

ngc92 force-pushed the fp16-frequencies branch from da1361c to ea66cd6 Compare January 20, 2026 11:08

ngc92 requested a review from Copilot January 20, 2026 11:08

Copilot started reviewing on behalf of ngc92 January 20, 2026 11:09 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

ngc92 merged commit ee606ff into dev Jan 20, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use fp16 for rope frequencies to reduce rounding errors #60

use fp16 for rope frequencies to reduce rounding errors #60

Uh oh!

ngc92 commented Jan 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

use fp16 for rope frequencies to reduce rounding errors #60

use fp16 for rope frequencies to reduce rounding errors #60

Uh oh!

Conversation

ngc92 commented Jan 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants