[fix] resume partial tests for inference and serving #1054

zihugithub · 2026-01-09T08:26:05Z

PR Category

CICD

PR Types

Others

PR Description

resume partial tests for inference and serving
Fix the sccache mechanism

CLAassistant · 2026-01-09T08:26:12Z

All committers have signed the CLA.

tengqm · 2026-01-11T02:17:49Z

@zihugithub Looks like GitHub cannot determine your identity from your GIT PR. Can you help double check your email used for the PR (git config user.email) and whether you have signed the CLA using that account?

tengqm · 2026-01-12T12:19:46Z

@zihugithub There are conflicts to be resolved.

tengqm · 2026-01-13T11:38:37Z

.github/workflows/all_tests_common.yml

          { echo 'runs_on<<EOFRUNSON'; echo "$RUNNER_LABELS"; echo 'EOFRUNSON'; } >> $GITHUB_OUTPUT
          { echo 'container_volumes<<EOFVOLUMES'; echo "$VOLUMES"; echo 'EOFVOLUMES'; } >> $GITHUB_OUTPUT
-          { echo 'unit_subsets<<EOFUNITSUBSETS'; echo "$UNIT_SUBSETS"; echo 'EOFUNITSUBSETS'; } >> $GITHUB_OUTPUT
+          { echo 'unit_train_subsets<<EOFUNITSUBSETS'; echo "$UNIT_TRAIN_SUBSETS"; echo 'EOFUNITSUBSETS'; } >> $GITHUB_OUTPUT


why not using the simpler syntax like we did on line 74?

tengqm · 2026-01-13T11:41:34Z

.github/workflows/functional_tests_common.yml

+            "cd ${GITHUB_WORKSPACE}/Megatron-LM-FL"
+            "pip install --no-build-isolation . -vvv"


would this change work?

.github/workflows/functional_tests_common.yml

tengqm · 2026-01-13T11:42:47Z

.github/workflows/functional_tests_common.yml

+          # install FlagGems
+          INSTALL_FLAGGEMS=(
+            "pip install scikit-build scikit-build-core"
+            "pip install git+https://github.com/FlagOpen/FlagGems.git@v3.0"


Suggested change

"pip install git+https://github.com/FlagOpen/FlagGems.git@v3.0"

"pip install git+https://github.com/flagos-ai/FlagGems.git@v3.0"

tengqm · 2026-01-13T11:43:32Z

.github/workflows/functional_tests_common.yml

+            "pip install gymnasium"
+            "pip install dm-tree"


can/shall we pin the package version?

Shall/can we expose these dependencies in a more formal way?
I mean ... these packages are not just required by the workflow, they are required for the software to run. These requirements thus should be declared by the software, not the workflow.

.github/workflows/functional_tests_common.yml

tengqm · 2026-01-20T12:36:42Z

tests/functional_tests/inference/deepseek_r1_distill_qwen-flaggems/conf/7b-tp2.yaml

 experiment:
  exp_name: deepseek_r1_distill_qwen-flaggems
-  exp_dir: tests/functional_tests/test_cases/inference/deepseek_r1_distill_qwen-flaggems/results_test/7b-tp2
+  exp_dir: tests/functional_tests/inference/deepseek_r1_distill_qwen-flaggems/results_test/7b-tp2


Rename tests/functional_tests/ to tests/functional/.
Similarly, rename tests/test_utils/ to tests/utils/.
You get simplicity without losing anything.

@tengqm It must align with the test directory structure of FlagScale, which uses the naming of function_tests. We will stick to this convention unless modification is deemed necessary.

tengqm · 2026-01-20T12:40:19Z

tests/test_utils/runners/check_results.py

+
+    # Compare actual output and golden reference output line by line (ignoring newline character differences)
+    for result_line, gold_value_line in zip(result_lines, gold_value_lines):
+        print(result_line, gold_value_line)


maybe remove line 323?

tengqm · 2026-01-20T12:43:16Z

tests/test_utils/runners/check_results.py

+            # Concatenate the VLLM OpenAI-compatible interface URL
+            url = f"http://localhost:{deploy_config.port}/v1/completions"
+
+            # Set request headers (JSON format)


You don't have to write comments for lines that speak for themselves.

tengqm · 2026-01-20T12:44:49Z

tests/test_utils/runners/check_results.py

+                if key_value.startswith("val-core/openai/gsm8k/reward/mean"):
+                    # Split by colon and extract the numeric part after the colon
+                    value = key_value.split(":")[-1]
+                    # Convert string value to float and add to the result list


There are simply too many unnecessary comments like this in the code.

Darryl233 · 2026-01-22T02:50:41Z

.github/workflows/functional_tests_common.yml


 jobs:
  functional_test:
+    name: functional-${{ inputs.device }}-${{ inputs.task }}-${{ inputs.model }}--${{ inputs.case }}


Too redundant

tengqm · 2026-01-26T11:00:18Z

.github/workflows/functional_tests_inference.yml

+            fi
+          else
+            echo "ℹ️  Running tests with system Python"
+          fi


I'm not sure if it make a lot of senses to use conda in test workflow.

tengqm · 2026-01-26T11:03:35Z

.github/workflows/functional_tests_inference.yml

+          pip uninstall -y flash_attn
+          pip install flash-attn==2.6.3 --no-build-isolation
+          python -c "import flash_attn; import flash_attn.layers; print('FlashAttention loaded successfully, version:', flash_attn.__version__)"


Please don't install individual packages in workflow.
All requirements should be declared in the pyproject.toml file.
Even if some packages are only needed for functional tests, we can declare them as optional dependencies, e.g.

[project.optional-dependencies] functional_test = [ 'foo==1.3.4', ]

We can install them using a one command uv pip install .[functional_test].
The command is reusable anywhere, while the dependency is centrally managed.

This is a special use case that is incompatible with other use case environments, and will be modified later

That is why pyproject.toml has an optional-dependencies section.

.github/workflows/functional_tests_serve.yml

tests/functional_tests/inference/deepseek_r1_distill_qwen-flaggems/results_gold/7b-tp2

tengqm · 2026-01-26T11:11:59Z

tools/install/cuda/install_inference.sh

+    log_info "Installing vllm-FL"
+
+    # Clone repository
+    retry_git_clone "$vllm_url" "$vllm_dir" "$RETRY_COUNT"


This mechanism is somehow fragile.
We are hooking the stability of FlagScale to that of vllm-FL.
We have to anticipate a scenario where FlagScale works fine, but there is a bug in the vllm-FL head.
To isolate errors like this, I'd suggest we pin a release tag of the plugin.

@tengqm We will do it later when the release version of vllm-fl is ready.

it doesn't have to be a formal release. It can be as simple as a tag, such as 20250129 or something.
Later on, when things do not work as expected, we will know where to start.

tengqm · 2026-01-26T11:12:55Z

tools/install/cuda/install_serve.sh

+    log_info "Installing vllm-FL"
+
+    # Clone repository
+    retry_git_clone "$vllm_url" "$vllm_dir" "$RETRY_COUNT"


Well ... again, we replicate the same script, why?

From the current perspective, the server and inference run in the same environment. This is done to distinguish between serve and inference, and the common source code will be installed and stored uniformly in the future

My guess is something like serve is a superset of inference.
They are no conflicts between these two use cases besides this.
Am I understanding this correctly?

…erve

tengqm · 2026-01-27T11:38:04Z

requirements/cuda/serve.txt

+
+# support 0.5b_multiple_instance ci test
+ray==2.49.1
+gymnasium


where is this referenced?

tengqm · 2026-01-27T11:38:46Z

requirements/cuda/serve.txt

+# support 0.5b_multiple_instance ci test
+ray==2.49.1
+gymnasium
+dm-tree


where is this referenced?

tengqm · 2026-01-28T11:12:22Z

.github/workflows/functional_tests_inference.yml

        run: |
          git config --global --add safe.directory $PROJECT_ROOT

+      - name: Check sccache installation and get path


why are we installing sccache?

Mainly used for compiling VLLM

I think I need some more contexts on this to understand it.

@tengqm vLLM will be compiled multiple times in different jobs repeatedly; introducing sccache is to reduce this compilation time.

@Darryl233 Thanks for the clarification. Is the 'compilation' a build time thing or a run time thing?

Darryl233

LGTM

tengqm previously approved these changes Jan 12, 2026

View reviewed changes

zihugithub closed this Jan 13, 2026

zihugithub force-pushed the update_workflows260109 branch from 5d7b7d5 to 06f14b0 Compare January 13, 2026 02:39

zihugithub reopened this Jan 13, 2026

zihugithub dismissed tengqm’s stale review via 949e4a6 January 13, 2026 07:46

tengqm reviewed Jan 13, 2026

View reviewed changes

zihugithub closed this Jan 19, 2026

zihugithub force-pushed the update_workflows260109 branch from 6cf4f3b to b1db41c Compare January 19, 2026 01:56

zihugithub reopened this Jan 19, 2026

tengqm reviewed Jan 20, 2026

View reviewed changes

tengqm previously approved these changes Jan 21, 2026

View reviewed changes

Darryl233 reviewed Jan 22, 2026

View reviewed changes

zihugithub closed this Jan 26, 2026

zihugithub force-pushed the update_workflows260109 branch from c76e1e0 to 32a1a06 Compare January 26, 2026 02:00

ci: Newly add inference/serve tests

2504396

zihugithub reopened this Jan 26, 2026

zihugithub dismissed tengqm’s stale review via 2504396 January 26, 2026 08:37

zihugithub requested a review from aoyulong as a code owner January 26, 2026 08:37

ci: Change sccache cache directory location

330679f

tengqm reviewed Jan 26, 2026

View reviewed changes

zihugithub added 3 commits January 27, 2026 11:43

feat(check_results.py): Add test result detection for inference and s…

10393ca

…erve

chore: optimize inference & serve env dependencies

19de66d

ci: Add sccache configuration

5d5d4dd

tengqm reviewed Jan 28, 2026

View reviewed changes

Darryl233 approved these changes Jan 29, 2026

View reviewed changes

		"cd ${GITHUB_WORKSPACE}/Megatron-LM-FL"
		"pip install --no-build-isolation . -vvv"

	"pip install git+https://github.com/FlagOpen/FlagGems.git@v3.0"
	"pip install git+https://github.com/flagos-ai/FlagGems.git@v3.0"

[fix] resume partial tests for inference and serving #1054

Are you sure you want to change the base?

[fix] resume partial tests for inference and serving #1054

Conversation

zihugithub commented Jan 9, 2026

PR Category

PR Types

PR Description

Uh oh!

CLAassistant commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tengqm commented Jan 11, 2026

Uh oh!

tengqm commented Jan 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Darryl233 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

CLAassistant commented Jan 9, 2026 •

edited

Loading