Skip to content

Update README with token match rate using ChatML template#50

Open
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:lfm
Open

Update README with token match rate using ChatML template#50
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:lfm

Conversation

@sdeeptan-aws
Copy link
Contributor

Description

Updated LFM2-2.6B contrib model README with 100% token match accuracy using ChatML prompt template. LFM2 is a state space model from Liquid AI — architecturally different from standard transformers but validates using the same NeuronX methodology. The key fix was applying the ChatML template (<|im_start|>user\n...<|im_end|>\n<|im_start|>assistant\n) which the instruct-tuned model requires for accurate token matching against the HuggingFace reference. Without the template, the model generates coherent, factually correct text but tokens diverge from HF output.

Model Information

Model Name: LFM2-2.6B
Model Architecture: Decoder-only state space model (Llama-based registration, 30 layers, hidden_size=2048, GQA 32Q/8KV, SwiGLU)
Purpose: Text generation

Checklist

Required Components

  • Accuracy Test (test/integration/test_model.py)
    • Validates model generation and coherence
    • Performance benchmarks (TTFT, throughput)
    • Test can compile and run the model on Neuron
  • README.md with the following sections:
    • Usage Example: Clear code example showing how to use the model
    • Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
    • Example Checkpoints: Links to compatible model checkpoints
    • Testing Instructions: Command to run the test suite for the model
  • Source Code (src/)
    • Modeling code following NxD Inference patterns (unchanged in this PR)

Optional Components

  • Unit Tests (CPU or Neuron-based)

Folder Structure

/contrib/models/lfm2-2.6b/
  README.md
  /src
    modeling_lfm2.py
  /test
    /integration
      test_model.py

Testing

Model was compiled and tested with TP=1, batch_size=1, seq_len=2048, bfloat16. Two key validation findings:

  1. ChatML template required: Instruct-tuned model needs <|im_start|> format for token matching — without it, outputs are valid but diverge from HF reference
  2. SSM validates like transformers: Despite being a state space model, the same NeuronX validation methodology applies

Test Results:

Test Status Result
Smoke Test ✅ PASS Model loads successfully
Token Matching ✅ PASS 100% match (with ChatML template)
Throughput ✅ PASS 4.69 tok/s

Compatibility

Tested with:

  • Instance Type(s): Trn1
  • Configuration: TP=1, batch_size=1, seq_len=2048, bfloat16

Additional Information

  • State space model: LFM2 is fundamentally different from transformer-based LLMs, but the NeuronX port follows the same validation methodology — architecture differences are handled in the model implementation, not validation
  • ChatML prompt template: Required for accurate token matching. Format: <|im_start|>user\n{text}<|im_end|>\n<|im_start|>assistant\n
  • Raw prompt behavior: Without ChatML, model produces valid, factually correct text (e.g., correctly answers "Paris" for capital of France) but tokens don't match HF reference output
  • Architecture coverage: NeuronX now supports transformers, MoE, hybrid (Mamba+attention), and SSM architectures

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

Copy link

@aws-yishanm aws-yishanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved because Readme and test were present.

Copy link

@aws-yishanm aws-yishanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved because Readme and test were present.

Copy link

@aws-yishanm aws-yishanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved because Readme and test were present.

@petesraj-aws petesraj-aws self-requested a review February 23, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants