Specializing multimodal vision-language models for quantum computing with Qiskit through synthetic data generation, efficient fine-tuning, and evaluation.
This project addresses a fundamental limitation in quantum computing code assistants: they process only text, ignoring the visual representations that are central to the field—quantum circuit diagrams, Bloch spheres, and measurement histograms. We present:
- Synthetic Dataset Pipeline: Automated extraction and generation of multimodal training data from open-source Qiskit documentation, with executable code verification
- Quantum Assistant Dataset: 8,366 samples (45% multimodal) across 7 quantum computing categories, publicly available under Apache 2.0
- VLM Specialization: Fine-tuning with Parameter-Efficient techniques (LoRA and variants) using ms-swift framework
- Evaluation: Assessment on Qiskit HumanEval benchmarks and multimodal tasks
End-to-end pipeline for generating multimodal quantum computing training data
| Model | QHE | QHE Hard | Syn. FC | Syn. CG | Syn. QA | Text | MM |
|---|---|---|---|---|---|---|---|
| Pass@1 | Pass@1 | Pass@1 | Pass@1 | ROUGE-L | Pass@1 | Pass@1 | |
| Fine-tuned | |||||||
| Qwen3-VL-FT (r32, 2ep) | 43.71 | 28.48 | 56.96 | 44.36 | 38.02 | 45.45 | 63.39 |
| Qwen3-VL-FT (r32, 1ep) | 40.40 | 29.14 | 51.55 | 41.91 | 37.31 | 42.49 | 57.14 |
| Qwen3-VL-FT (r64, 1ep) | 38.41 | 22.52 | 52.84 | 42.89 | 38.24 | 42.66 | 60.71 |
| Specialized | |||||||
| Qwen2.5-Coder-14B-Qiskit† | 49.01 | 25.17 | 47.48 | 25.51 | 19.46 | 36.19 | — |
| Baseline | |||||||
| Qwen3-VL-8B-Instruct | 32.45 | 11.92 | 38.92 | 25.98 | 20.66 | 30.24 | 37.50 |
| InternVL3.5-8B-MPO | 20.53 | 9.27 | 32.47 | 19.61 | 25.81 | 21.85 | 36.16 |
| Ministral-3-8B-Instruct-2512 | 17.88 | 11.26 | 29.12 | 21.81 | 20.50 | 20.98 | 36.61 |
QHE: Qiskit HumanEval (function completion) · QHE Hard: code generation · Syn. FC/CG/QA: Synthetic Function Completion/Code Generation/Question Answering · Text: text-only samples · MM: multimodal samples · †evaluated on text-only samples (55% of synthetic dataset)
Best configuration: rsLoRA with rank=32 on Qwen3-VL-8B-Instruct.
Evaluation across benchmarks showing fine-tuning gains and multimodal advantage
quantum-assistant/
├── src/
│ ├── synthetic_data/ # Dataset generation pipeline
│ ├── finetune/ # Fine-tuning data preparation
│ ├── evaluate/ # Evaluation framework
│ └── models/ # LLM/VLM client utilities
├── data/ # Source documents (documentation, papers)
├── outputs/ # Generated datasets, models, results
├── docs/ # Detailed documentation
│ ├── synthetic.md # Pipeline architecture
│ ├── models.md # Model serving and benchmarking
│ ├── finetune.md # Fine-tuning experiments
│ └── evaluate.md # Evaluation methodology
└── tests/ # Unit tests
# Clone repository
git clone https://github.com/samuellimabraz/quantum-assistant.git
cd quantum-assistant
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .Optional GPU dependencies for fine-tuning:
uv sync --package finetune --extra gpuCreate .env file in project root:
# For synthetic data generation (VLM/LLM endpoints)
VISION_MODEL_BASE_URL=https://api.openai.com/v1
VISION_MODEL_API_KEY=sk-...
VISION_MODEL_NAME=gpt-4o
QUESTION_MODEL_BASE_URL=http://localhost:8000/v1
QUESTION_MODEL_API_KEY=your-key
QUESTION_MODEL_NAME=gpt-oss-120b
# For evaluation
MODEL_BASE_URL=http://localhost:8000/v1
API_KEY=your-key
MODEL_NAME=qwen2.5-coder-14b
# For HuggingFace uploads
HF_TOKEN=hf_...# Run complete pipeline
synthetic-data pipeline --config src/synthetic_data/yaml/config.yaml
# Or individual stages
synthetic-data parse --config src/synthetic_data/yaml/config.yaml
synthetic-data transcribe --config src/synthetic_data/yaml/config.yaml
synthetic-data chunk --config src/synthetic_data/yaml/config.yaml
synthetic-data generate --config src/synthetic_data/yaml/config.yaml
synthetic-data build --config src/synthetic_data/yaml/config.yaml
synthetic-data export --config src/synthetic_data/yaml/config.yaml --hub-id username/datasetSee synthetic_data/README.md for detailed usage and docs/synthetic.md for architecture.
# From HuggingFace Hub (recommended for Colab)
finetune prepare --hub-id samuellimabraz/quantum-assistant --output-dir outputs/finetune
# From local dataset
finetune prepare --dataset-path outputs/final --output-dir outputs/finetuneOutputs ms-swift compatible JSONL format. See finetune/README.md for details.
swift sft \
--model_type qwen3_vl-8b-instruct \
--dataset outputs/finetune/train.jsonl \
--val_dataset outputs/finetune/validation.jsonl \
--train_type lora \
--lora_rank 32 \
--lora_alpha 64 \
--use_rslora true \
--num_train_epochs 1 \
--per_device_train_batch_size 16 \
--gradient_accumulation_steps 2 \
--learning_rate 2e-4 \
--weight_decay 0.05 \
--lora_dropout 0.10 \
--output_dir outputs/models/qwen3-vl-quantumConfiguration based on experiments detailed in docs/finetune.md.
# Qiskit HumanEval benchmark
evaluate qiskit-humaneval \
--dataset path/to/qiskit_humaneval.json \
--model-url http://localhost:8000/v1 \
--model-name qwen3-vl-quantum \
--num-samples 10 \
--k-values "1,5,10"
# Synthetic dataset
evaluate synthetic \
--dataset outputs/final \
--model-url http://localhost:8000/v1 \
--model-name qwen3-vl-quantum \
--vlm \
--split testSee evaluate/README.md for evaluation framework and docs/evaluate.md for methodology.
The Quantum Assistant Dataset is publicly available on HuggingFace:
🤗 samuellimabraz/quantum-assistant
Dataset composition: distribution by task type, category, modality, and test coverage
| Metric | Value |
|---|---|
| Total Samples | 8,366 |
| Multimodal Samples | 3,774 (45%) |
| Train / Val / Test | 5,837 / 1,239 / 1,290 |
| Code Samples with Tests | 5,173 (62%) |
- Function Completion (30%): Complete function body from signature + docstring
- Code Generation (32%): Generate full code from natural language
- Question Answering (38%): Conceptual explanations
circuits_and_gates(34%)quantum_info_and_operators(20%)algorithms_and_applications(17%)hardware_and_providers(10%)transpilation_and_compilation(8%)primitives_and_execution(6%)noise_and_error_mitigation(5%)
from datasets import load_dataset
dataset = load_dataset("samuellimabraz/quantum-assistant")
# Filter by type
code_gen = dataset["train"].filter(lambda x: x["type"] == "code_generation")
# Filter multimodal only
multimodal = dataset["train"].filter(lambda x: x["image"] is not None)Fine-tuned models are available in the HuggingFace collection:
| Model | Configuration | HumanEval Pass@1 | Synthetic Pass@1 |
|---|---|---|---|
| Qwen3-VL-FT (r32, 1ep) | rsLoRA r=32, 1 epoch | 40.40% | 46.61% |
| Qwen3-VL-FT (r32, 2ep) | rsLoRA r=32, 2 epochs | 43.71% | 50.50% |
| Qwen3-VL-FT (r64, 1ep) | rsLoRA r=64, 1 epoch | 38.41% | 47.74% |
Baseline: Qwen3-VL-8B-Instruct (32.45% / 32.29%)
Qiskit HumanEval benchmark results: fine-tuned models outperform baselines by +11-17 pp
Performance heatmap by category showing fine-tuned models vs baselines (Pass@1 %)
- Synthetic Data Module - Dataset generation pipeline
- Fine-tune Module - Data preparation for training
- Evaluate Module - Evaluation framework
- Synthetic Data Pipeline - Architecture, stages, algorithms
- Fine-tuning Experiments - PEFT techniques, hyperparameters
- Evaluation Methodology - Metrics, benchmarks, analysis
- Model Utilities - Serving, batching, benchmarking
If you use this work in your research, please cite:
@misc{braz2025quantumassistant,
title={Quantum Assistant: Especialização de Modelos Multimodais para Computação Quântica},
author={Braz, Samuel Lima and Leite, João Paulo Reus Rodrigues},
year={2025},
institution={Universidade Federal de Itajubá (UNIFEI)},
url={https://github.com/samuellimabraz/quantum-assistant}
}This project is released under the Apache 2.0 License.
- IBM Quantum and Qiskit team for open-source documentation
- UNIFEI (Universidade Federal de Itajubá) for academic support
- Advisor: Prof. João Paulo Reus Rodrigues Leite
- The quantum computing community for educational materials
- Qiskit HumanEval - Evaluation benchmark for quantum code generation
- Qiskit Code Assistant - IBM's text-only quantum code assistant
- Qiskit - Open-source quantum computing framework