Releases: Blaizzy/mlx-vlm
Releases · Blaizzy/mlx-vlm
v0.3.9
v0.3.8
v0.3.7
v0.3.6
What's Changed
- Fix: Input cast error by @Blaizzy in #559
- Fix qwen-vl position ids (2.5 & 3) by @Blaizzy in #564
- Add Evals by @Blaizzy in #563
- Fix Qwen3-VL Attention by @Blaizzy in #566
- Update evals init by @Blaizzy in #567
- Fix Qwen3-VL multi image reshape by @Blaizzy in #569
- Add processor args to DeepSeekOCR by @Blaizzy in #570
- host and port params for server by @mguella in #568
- FastVLM by @pcuenca in #502
- Add support for InternVL3 by @iRonJ in #540
- Add Z.ai GLM-4.1v by @Blaizzy in #572
- Make rope deltas private by @Blaizzy in #573
- Add example notebook for interleaving text and images in prompts by @Copilot in #574
- [Bugfix] fix mrope in qwen2vl and qwen2.5vl by @JJJYmmm in #576
- Add lighton-ocr by @Blaizzy in #550
- changed image parameter instead of files in stream_generate by @Manikandan-t in #521
- Remove auto config loading by @Blaizzy in #577
- [BugFix][Qwen3VL] fix deepstack and multi-image inference by @JJJYmmm in #581
- openai compatible endpoints by @mguella in #580
- Bump version to 0.3.6 by @Blaizzy in #582
New Contributors
- @mguella made their first contribution in #568
- @iRonJ made their first contribution in #540
- @Copilot made their first contribution in #574
- @JJJYmmm made their first contribution in #576
- @Manikandan-t made their first contribution in #521
Full Changelog: v0.3.5...v0.3.6
v0.3.5
v0.3.4
What's Changed
- Add n_kv_heads property used in LM Studio to glm4v_moe.LanguageModel by @hehua2008 in #472
- Fix: rope rotation (GLM-4.5v) by @Blaizzy in #481
- Remove scipy dep by @Blaizzy in #482
- Add CUDA and CPU as optional deps by @Blaizzy in #483
- Fix Deepseek vl2 chat template by @Blaizzy in #488
- Fix deepseek-vl default chat template by @Blaizzy in #490
- Fix smolvlm video generate by @Blaizzy in #491
- Fix base64 encoded images by @Blaizzy in #493
- Map Apriel configs to pixtral and fix prompt formatting by @ivanfioravanti in #518
- Fix video understanding NB by @Blaizzy in #519
- Fix fine-tuning bug in trainer.py by @avishekjana in #473
- Bump minimum required Python version to 3.10 by @dokterbob in #485
- Add Qwen3-VL (Dense & MoE) by @Blaizzy & @vincentamato in #528
- Fix video Qwen3-VL by @Blaizzy in #529
- Fix Qwen3 VL (Dense) Sanitize by @Blaizzy in #531
- Bump version to 0.3.4 by @Blaizzy in #535
New Contributors
- @hehua2008 made their first contribution in #472
- @ivanfioravanti made their first contribution in #518
- @avishekjana made their first contribution in #473
- @dokterbob made their first contribution in #485
- @vincentamato made their first contributuin in #529
Full Changelog: v0.3.3...v0.3.4
v0.3.3
What's Changed
- fix changelog task by @Blaizzy in #441
- Add LFM2-VL by @Blaizzy in #460
- [llava_next] Fix config inheritance by @neilmehta24 in #448
- Kimi_VL: Fix activation args by @Blaizzy in #465
- Add GLM-4-5V by @Blaizzy in #458
- External access to LFM2-VL merge-input-IDs method by @christian-lms in #466
- Add Command-A-Vision by @Blaizzy in #467
- [kernels] Use a header for bicubic_interpolate for compatibility with macOS < 15 by @mattjcly in #469
New Contributors
- @christian-lms made their first contribution in #466
Full Changelog: v0.3.2...v0.3.3
v0.3.2
What's Changed
- Fix energy calc in omni.py by @Blaizzy in #427
- Feat(build): migrate to
pyproject.tomlby @SauravMaheshkar in #282 - Fix quant predicate by @Blaizzy in #430
- Load dependencies from requirements.txt by @Blaizzy in #431
- Fix broken wheel builds caused by ambiguous package spec by @neilmehta24 in #432
- Make UI and audio dependencies optional by @zhnext in #433
- Cleanup: Refactor config by @Blaizzy in #437
- Add cuda support by @Blaizzy in #438
- Fix/server module exposure by @zhnext in #434
- Support text only training by @Goekdeniz-Guelmez in #424
- Fix phi3_v and molmo mask by @Blaizzy in #440
New Contributors
- @zhnext made their first contribution in #433
- @Goekdeniz-Guelmez made their first contribution in #424
Full Changelog: v0.3.1...v0.3.2
v0.3.1
What's Changed
- fix(chat-ui): Fix imports, blocking Chat-ui server start by @zenyr in #420
- Fix server empty image for Gemma3n by @Blaizzy in #422
- [Gemma3n] Add hooks for image embedding computation by @will-lms in #407
- [gemma3n] Fix OCR after weight re-upload by @neilmehta24 in #425
- Add gemma3n omni example by @Blaizzy in #426
New Contributors
Full Changelog: v0.3.0...v0.3.1
v0.3.0
What's Changed
- [gemma3n] Correctly scale text embeddings for quantized gemma3n conversions by @neilmehta24 in #397
- smolvlm_video example: fix typo in system prompt by @pcuenca in #389
- Fix gemma3n pixel casting by @Blaizzy in #398
- Fix audio model check and prompt utils by @Blaizzy in #395
- Add KV Quantization by @Blaizzy in #401
- Fix Gemma3n multi-task merging and update LM by @Blaizzy in #405
- [gemma3n] Fix vision encoder implementation of EdgeResidual and UniversalInvertedResidual by @neilmehta24 in #410
- fix: Remove unnecessary unicode_escape decoding for Chinese text input by @nicekate in #403
- Add support for Mixed Quant by @Blaizzy in #413
- Fix gemma3n Vision OCR + LM only reponses by @Blaizzy in #414
- Fix generate signature by @Blaizzy in #416
- Add support for audio modality in server by @Blaizzy in #417
- Update server, readme and misc by @Blaizzy in #418
New Contributors
Full Changelog: v0.2.0...v0.3.0