Skip to content

Gemma3 to hf mapping error #2867

@jedcheng

Description

@jedcheng

Bug report

After finetuning the model, I used the following command to convert my MaxText Gemma3 Checkpoint to HuggingFace format:

python3 -m MaxText.utils.ckpt_conversion.to_huggingface src/MaxText/configs/base.yml \ model_name='gemma3-4b'     \
hf_access_token={hf token}   \
load_parameters_path={my checkpoint path}    \
base_output_directory=/tmp/gemma3-4B-cpt-hf     \
use_multimodal=false     # tried both false & true \
scan_layers=false   # cannot create scanned checkpoint (another bug)

For testing purposes, I set up a vllm server on the TPU

pip3.11 install vllm-tpu

vllm serve /tmp/gemma3-4B-cpt-hf \
   --disable-log-requests \
    --tensor_parallel_size=4 \
    --api-key {my api key}

Error message from vllm

ValueError: Gemma3 uses `gelu_pytorch_tanh` as the hidden activation function. Please set `hidden_act` and `hidden_activation` to `gelu_pytorch_tanh`.

MaxText Conversion config uses "hidden_activation": "gelu" for text generation

huggingface config for text generation uses gelu_pytorch_tanh

vllm explicitly check for gelu_pytorch_tanh which resulted in the error above

Logs/Output

Environment Information

TPU creation command

export TPU_COUNT=4
gcloud alpha compute tpus tpu-vm create $TPU_ID \
    --zone="${ZONE}" \
    --accelerator-type="v6e-${TPU_COUNT}" \
    --version=v2-alpha-tpuv6e \
    --spot \
    --service-account=${service_account}

Tried installation from source and uv as suggested in the documentation of MaxText.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions