Skip to content

[Bug]: 模型权重加载时出现大量 UNEXPECTED 和 MISSING 参数 #250

@ZhangzrJerry

Description

@ZhangzrJerry

Bug Description

在加载预训练模型 InternVLA-N1-DualVLN 时,日志显示大量参数状态为 UNEXPECTED 或 MISSING。虽然部分 UNEXPECTED 参数在架构不一致时可以忽略,但 MISSING 参数会被重新初始化。

Steps to Reproduce

使用支持sm_120架构的torch版本重新配置环境后,仅对代码进行适配性api修改,然后运行http_internvla_server.py. 在控制台输出以下信息(https://huggingface.co/InternRobotics/InternVLA-N1-wo-daggerhttps://huggingface.co/InternRobotics/InternVLA-N1-DualVLN 都会),

Loading weights: 100%|█████████████████████████████████████████████████████████████| 729/729 [00:06<00:00, 120.42it/s, Materializing param=model.visual.patch_embed.proj.weight]
InternVLAN1ForCausalLM LOAD REPORT from: checkpoints/InternVLA-N1-DualVLN
Key                                                                                                     | Status     |
--------------------------------------------------------------------------------------------------------+------------+-
model.language_model.traj_dit.model.language_model.layers.{0...11}.attn1.norm_q.bias                    | UNEXPECTED |
model.language_model.traj_dit.model.language_model.layers.{0...11}.attn2.to_k.weight                    | UNEXPECTED |
.....
model.action_decoder.weight                                                                             | MISSING    |
model.action_encoder.weight                                                                             | MISSING    |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING       :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

Expected Behavior

期望所有模型参数都能正确加载,没有 MISSING 参数,UNEXPECTED 参数应尽可能少或为零,以确保模型性能与预训练一致。

Screenshots/Videos

No response

Environment

  • OS: Windows 10
  • GPU: RTX5060ti
  • GPU-driver version: 591.74

Release version or Commit ID

#108

Additional Context

#97

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions