Skip to content

Conversation

@yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Dec 9, 2025

User description

Signed-off-by: yiliu30 yi4.liu@intel.com


PR Type

Enhancement


Description

  • Added support for NVFP4 quantization scheme

  • Updated usage instructions and validation checks

  • Modified environment variable settings for NVFP4


Diagram Walkthrough

flowchart LR
  A["Add NVFP4 config"] -- "Update quantize.py" --> B["Modify run_evaluation.sh"]
  B -- "Update usage and validation" --> C["Adjust run_generate.sh"]
  C -- "Set NVFP4 env vars" --> D["Update README.md"]
Loading

File Walkthrough

Relevant files
Enhancement
quantize.py
Add NVFP4 configuration                                                                   

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/quantize.py

  • Added NVFP4 configuration to config_dict
  • Set enable_torch_compile to True
  • Added low_gpu_mem_usage parameter
+7/-1     
run_evaluation.sh
Update evaluation script for NVFP4                                             

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_evaluation.sh

  • Updated usage message to include NVFP4
  • Added NVFP4 condition to set environment variables
  • Updated error message to include NVFP4
+9/-2     
run_generate.sh
Update generation script for NVFP4                                             

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_generate.sh

  • Updated quantization type validation to include NVFP4
  • Added NVFP4 condition to set environment variables
  • Moved common environment variable setting
+10/-3   

Signed-off-by: yiliu30 <yi4.liu@intel.com>
@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Default Values

The new nvfp4 configuration uses default values for iters and fp_layers that are identical to other schemes. Ensure these defaults are appropriate for nvfp4.

"nvfp4": {
    "scheme": "NVFP4",
    "fp_layers": "lm_head,self_attn",
    "iters": 0,
},
Environment Variables

The environment variables set for nvfp4 are different from those for mxfp4 and mxfp8. Verify that these settings are correct and necessary for nvfp4.

elif [[ "$SCHEME" == "nvfp4" ]]; then
    VLLM_AR_MXFP4_MODULAR_MOE=0
    VLLM_MXFP4_PRE_UNPACK_TO_FP8=0
    VLLM_MXFP4_PRE_UNPACK_WEIGHTS=0
    VLLM_ENABLE_STATIC_MOE=0
    VLLM_USE_DEEP_GEMM=0
    VLLM_ENABLE_AR_EXT=0
Error Message

The error message now includes nvfp4 as a valid option. Ensure that all parts of the script correctly handle nvfp4 as a valid input.

echo "Error: Invalid quantization scheme (-s). Must be 'mxfp4', 'nvfp4' or 'mxfp8'."

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Make torch compile configurable

Consider making enable_torch_compile configurable via command-line arguments instead
of hardcoding it.

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/quantize.py [71]

-enable_torch_compile=True,
+enable_torch_compile=args.enable_torch_compile,
Suggestion importance[1-10]: 7

__

Why: Making enable_torch_compile configurable via command-line arguments improves flexibility but does not address a critical issue.

Medium
Add NVFP4 support

Update the command to include the new NVFP4 option.

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/README.md [43]

-bash run_generate.sh -s [mxfp4|mxfp8] -tp [tensor_parallel_size] -m [model_path]
+bash run_generate.sh -s [mxfp4|mxfp8|nvfp4] -tp [tensor_parallel_size] -m [model_path]
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly updates the command to include the new NVFP4 option, improving the documentation's accuracy.

Medium
Add NVFP4 examples

Add examples for NVFP4 evaluation.

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/README.md [71-72]

 bash run_evaluation.sh -s mxfp4 -t piqa,hellaswag,mmlu -tp 8 -b 512 -m /path/to/ds_mxfp4
 bash run_evaluation.sh -s mxfp4 -t gsm8k -tp 8 -b 256 -m /path/to/ds_mxfp4
+bash run_evaluation.sh -s nvfp4 -t piqa,hellaswag,mmlu -tp 8 -b 512 -m /path/to/ds_nvfp4
+bash run_evaluation.sh -s nvfp4 -t gsm8k -tp 8 -b 256 -m /path/to/ds_nvfp4
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly adds examples for NVFP4 evaluation, enhancing the documentation's completeness.

Medium
Remove duplicate comment

Remove the duplicate comment.

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_generate.sh [85-86]

-# Set environment variables based on quantization type
 # Set environment variables based on quantization type
Suggestion importance[1-10]: 5

__

Why: Removing the duplicate comment enhances code readability but offers a minor improvement.

Low

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants