Add Mistral3 multimodal support with Pixtral vision encoder #431

dai-yamashita · 2025-12-19T15:51:28Z

Summary

This PR adds support for Mistral3 multimodal models (Ministral series):

Mistral text model extensions: attention_head_size, use_interleaved_attention, tie_word_embeddings
Pixtral vision encoder: Bumblebee.Vision.Pixtral
Mistral3 text decoder: Bumblebee.Text.Mistral3
Multimodal model: Bumblebee.Multimodal.Mistral3

Supported architectures

PixtralVisionModel
Mistral3Model, Mistral3ForCausalLM, Mistral3ForSequenceClassification
Mistral3ForConditionalGeneration (multimodal)

Key features

Function-based attention_window_size in transformer.ex for per-layer attention configuration
Interleaved attention pattern: even layers use global attention, odd layers use sliding window
Multimodal projector: patch merger + linear layers to project vision features to text embedding space

Test plan

All existing tests pass (264 tests, 0 failures)
New tests for Pixtral, Mistral3, and Multimodal added

This adds support for Mistral3 multimodal models (vision + text): - `Bumblebee.Vision.Pixtral`: Pixtral vision encoder with RoPE support - `Bumblebee.Text.Mistral3`: Mistral3 text decoder with interleaved attention - `Bumblebee.Multimodal.Mistral3`: Vision-language model combining Pixtral and Mistral3 with multimodal projector for image-conditioned generation - Ministral/Ministral3 variant support with interleaved attention - Devstral 2 (Ministral3) model support Supported architectures: - PixtralVisionModel - Mistral3Model, Mistral3ForCausalLM, Mistral3ForSequenceClassification - Mistral3ForConditionalGeneration (multimodal) - Ministral3ForCausalLM

dai-yamashita force-pushed the add-mistral3-multimodal-support branch from dd11e25 to 911cc2d Compare December 20, 2025 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Mistral3 multimodal support with Pixtral vision encoder #431

Add Mistral3 multimodal support with Pixtral vision encoder #431

dai-yamashita commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Mistral3 multimodal support with Pixtral vision encoder #431

Are you sure you want to change the base?

Add Mistral3 multimodal support with Pixtral vision encoder #431

Conversation

dai-yamashita commented Dec 19, 2025

Summary

Supported architectures

Key features

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant