Skip to content

Conversation

@dai-yamashita
Copy link

Summary

This PR adds support for Mistral3 multimodal models (Ministral series):

  • Mistral text model extensions: attention_head_size, use_interleaved_attention, tie_word_embeddings
  • Pixtral vision encoder: Bumblebee.Vision.Pixtral
  • Mistral3 text decoder: Bumblebee.Text.Mistral3
  • Multimodal model: Bumblebee.Multimodal.Mistral3

Supported architectures

  • PixtralVisionModel
  • Mistral3Model, Mistral3ForCausalLM, Mistral3ForSequenceClassification
  • Mistral3ForConditionalGeneration (multimodal)

Key features

  1. Function-based attention_window_size in transformer.ex for per-layer attention configuration
  2. Interleaved attention pattern: even layers use global attention, odd layers use sliding window
  3. Multimodal projector: patch merger + linear layers to project vision features to text embedding space

Test plan

  • All existing tests pass (264 tests, 0 failures)
  • New tests for Pixtral, Mistral3, and Multimodal added

This adds support for Mistral3 multimodal models (vision + text):

- `Bumblebee.Vision.Pixtral`: Pixtral vision encoder with RoPE support
- `Bumblebee.Text.Mistral3`: Mistral3 text decoder with interleaved attention
- `Bumblebee.Multimodal.Mistral3`: Vision-language model combining Pixtral
  and Mistral3 with multimodal projector for image-conditioned generation
- Ministral/Ministral3 variant support with interleaved attention
- Devstral 2 (Ministral3) model support

Supported architectures:
- PixtralVisionModel
- Mistral3Model, Mistral3ForCausalLM, Mistral3ForSequenceClassification
- Mistral3ForConditionalGeneration (multimodal)
- Ministral3ForCausalLM
@dai-yamashita dai-yamashita force-pushed the add-mistral3-multimodal-support branch from dd11e25 to 911cc2d Compare December 20, 2025 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant