Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
584a242
Initialize contrib folder
Jan 21, 2026
22d2aa8
Add cohere2 code to be used as reference
Jan 23, 2026
bfce418
Add code to be migrated
Jan 23, 2026
95fc1b5
Add Kiro steering doc
Jan 23, 2026
4fa42b3
Convert tmp/external-code from submodule to regular directory
Jan 23, 2026
2da18e5
Add Kiro migration spec
Jan 23, 2026
14e1650
Add migrated files
Jan 23, 2026
10a566a
Add migrated unit tests
Jan 23, 2026
c0beef3
Add migrated text-only model
Jan 23, 2026
57b0490
Add passing text model unit tests
Jan 23, 2026
5557f25
Fix typos
Jan 23, 2026
72a9381
Fix test_vision_model.py
Jan 23, 2026
1cd85b2
Add passing unit tests for vision encoder
Jan 23, 2026
041e3a4
Fix test_encoder_layer.py
Jan 23, 2026
b4b80e6
Fix test_encoder.py
Jan 23, 2026
e45ac00
Remove SigLIP pooling head implementation
Jan 23, 2026
9a49c4f
Fix test_siglip_vision_model.py
Jan 23, 2026
93dc733
Fix test_vision_transformer.py
Jan 23, 2026
38155f1
Fix test_attention.py
Jan 26, 2026
31a8b1d
Add offline inference script
Jan 27, 2026
3642909
Patch broken NeuronBaseForImageToText.forward
Jan 27, 2026
307c62a
Enhance docstrings in modeling_gemma3.py
Jan 27, 2026
9321639
Clean NeuronGemma3ForCausalLM get_compiler_args methods
Jan 27, 2026
407d19f
Remove unused image_sizes arg from get_required_kwargs
Jan 27, 2026
a2efd90
Set pipeline_execution=True by default for Gemma3 vision encoder
Jan 27, 2026
fb2c904
Increase readability of NeuronGemma3ForCausalLM.forward
Jan 27, 2026
4549bcc
Add multi-image support in NeuronGemma3ForCausalLM.forward
Jan 27, 2026
1f0bd60
Simplify NeuronGemma3ForCausalLM.pad_positions
Jan 27, 2026
80889d2
Remove NeuronGemma3ForCausalLM.concat_causal_lm_outputs
Jan 27, 2026
192b861
Decorate NeuronGemma3ForCausalLM static methods appropriately
Jan 27, 2026
8a5c2aa
Clean unit tests
Jan 30, 2026
ed03eb4
Add missing load_hf_model, fixed _get_constructed_outputs and _create…
Jan 30, 2026
2233992
Add integration test
Jan 30, 2026
7ae7345
Patch HuggingFaceGenerationAdapter
Jan 30, 2026
614e6d0
Add get_test_name_suffix utility function
Jan 30, 2026
d52ab21
Refactor integration test utility functions into new integration/util…
Jan 30, 2026
62ee893
Refactor test_model.py to be launched with pytest
Jan 30, 2026
593c851
Revert __main__ entrypoint in integration test (pytest bug)
Jan 30, 2026
1eaeecc
Add vLLM offline inference
Feb 3, 2026
52beeef
Add vLLM online inference
Feb 4, 2026
1b7eec4
Remove temporary helper files
Feb 4, 2026
85ae93c
Remove MIGRATION_STATUS.md
Feb 4, 2026
893ab8a
Clean unit tests
Feb 5, 2026
56a6e7b
Clean imports
Feb 5, 2026
1b8e641
Align with HF naming
Feb 5, 2026
cd1fd19
Apply pre-commit hooks
Feb 5, 2026
0042f5d
Update README.md
Feb 5, 2026
fa1a760
Remove Kiro helper files
Feb 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions contrib/models/gemma3-vision/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Contrib Model: Google Gemma3 VLM models

NeuronX Distributed Inference implementationn for Google Gemma3 VLM (Vision-Language Model) based on the HuggingFace Transformers Gemma3 architecture with SigLIP vision encoder.

## Model Information

- **HuggingFace IDs:**
* [`google/gemma-3-4b-it`](https://huggingface.co/google/gemma-3-4b-it)
* [`google/gemma-3-12b-it`](https://huggingface.co/google/gemma-3-12b-it)
* [`google/gemma-3-27b-it`](https://huggingface.co/google/gemma-3-27b-it)
- **Model Type:** LLaVA-style VLM with fixed-resolution SigLIP vision encode (400M) and Transformer-based LLM backbone.
- **License:** Check HuggingFace model card

## Architecture Details

LLM backbones (text models):

| Spec | Gemma 3 4B | Gemma 3 12B | Gemma 3 27B |
|---|---:|---:|---:|
| **Layers** | 34 | 48 | 62 |
| **Hidden Size** | 2560 | 3840 | 5376 |
| **Head Dim** | 256 | 256 | 128 |
| **Attention Heads** | 8 | 16 | 32 |
| **KV Heads** | 4 | 8 | 16 |
| **Intermediate Size** | 10240 | 15360 | 21504 |
| **Vocabulary size** | 32,064 | 32,064 | 32,064 |
| **Max Position Embeddings** | 131,072 | 131,072 | 131,072 |
| **Position Encoding** | RoPE | RoPE | RoPE |
| **Normalization** | RMSNorm | RMSNorm | RMSNorm |
| **Activation type** | GELU | GELU | GELU |
| **Context length** | 128K | 128K | 128K |

The 400M-parameter fixed-resolution SigLIP vision encoder is shared by all models:

| Spec | SigLIP vision tower |
|---|---:|
| **Layers** | 27 |
| **Hidden Size** | 1152 |
| **Head Dim** | 72 |
| **Attention Heads** | 16 |
| **KV Heads** | 16 |
| **Intermediate Size** | 4304 |
| **Activation type** | GELU |
| **Number of multi-modal tokens per image** | 256 |

## Validation Results

**Validated:** 2026-02-05
**Configuration:** Trn1, TP=8, batch_size=1, seq_len=1024, float16, 1 image per sample

### Test Results

| Test | Status | Result |
|------|--------|--------|
| Smoke Test | ✅ PASS | Model loads successfully |
| Token Matching | ✅ PASS | 100.0% match |
| Logits Matching | ⚠️ PARTIAL | ~56.2% match |

### Performance Metrics

| Metric | Value |
|--------|-------|
| E2E Throughput | 360.4 tokens/s |
| CTE Throughput | 49563.7 tokens/s |
| TKG Throughput | 223.8 tokens/s |

**Status:** ✅ GOOD

**Note:** Low token matching is due to sampling divergence at close probability tokens, not model incorrectness.

## Usage

```python
import torch

from gemma3_vision.modeling_gemma3 import NeuronGemma3ForConditionalGeneration
from gemma3_vision.utils import create_neuron_config

model_path = "/path/to/hf/artifacts"
compiled_model_path = "/path/to/compiled/artifacts"

# Create Neuron configuration
nrn_config = create_neuron_config(
hf_config_path=config_file_path,
text_batch_size=1,
vision_batch_size=1, # num_images_per_sample * batch_size
total_max_seq_len=1024,
torch_dtype=torch.bfloat16,
lnc=1, # Logical NC config
tp_degree=8
)

# Initialize model
nrn_model = NeuronGemma3ForConditionalGeneration(
model_path=model_path,
config=nrn_config
)

# Compile and load
nrn_model.compile(compiled_model_path.as_posix())
nrn_model.load(compiled_model_path.as_posix())

# Generate (see integration test for full example)
```

## Compatibility Matrix

| Instance/Version | 2.27 | 2.26 and earlier |
|------------------|-------|------------------|
| Trn2 | ✅ Working | Not tested |
| Trn1 | ✅ Working | Not tested |
| Inf2 | Not tested | Not tested |

## Testing

Run integration tests:

```bash
pytest contrib/models/gemma3-vision/test/integration/test_model.py --capture=tee-sys
```

Or run manually:

```bash
cd contrib/models/gemma3-vision
python3 -m test.integration.test_model
```

## Example Checkpoints

* [`google/gemma-3-4b-it`](https://huggingface.co/google/gemma-3-4b-it)
* [`google/gemma-3-12b-it`](https://huggingface.co/google/gemma-3-12b-it)
* [`google/gemma-3-27b-it`](https://huggingface.co/google/gemma-3-27b-it)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"e2e_model": {"latency_ms_p50": 2857.349991798401, "latency_ms_p90": 2910.9119415283203, "latency_ms_p95": 2926.601207256317, "latency_ms_p99": 2933.258969783783, "latency_ms_p100": 2934.9234104156494, "latency_ms_avg": 2841.0605669021606, "throughput": 360.4287820996898}, "context_encoding_model": {"latency_ms_p50": 20.636916160583496, "latency_ms_p90": 20.769023895263672, "latency_ms_p95": 20.816147327423096, "latency_ms_p99": 20.827925205230713, "latency_ms_p100": 20.830869674682617, "latency_ms_avg": 20.66025733947754, "throughput": 49563.758242417665}, "token_generation_model": {"latency_ms_p50": 4.449129104614258, "latency_ms_p90": 4.684209823608398, "latency_ms_p95": 4.733574390411377, "latency_ms_p99": 4.841475486755371, "latency_ms_p100": 6.889104843139648, "latency_ms_avg": 4.476439462949152, "throughput": 223.8289952216445}, "vision_encoder_model": null}
Loading