Skip to content

SimAI_analytical Produces an Empty End-to-End Result File #190

@horser1

Description

@horser1

Issue Description

Hello SimAI Team,

I am currently using the SimAI toolkit to simulate the training performance of a GPT model. My process involves two steps:

Generating a workload file using workload_generator.SimAI_training_workload_generator.

Running the analytical simulator SimAI_analytical with the generated workload file as input.

The workload generation step appears to work correctly. However, when I run the SimAI_analytical command, it prints some message to the console and then exits, but the final endtoend result file is created empty.

I would like to understand if this is expected behavior under certain conditions, or if I might be misusing the tool. Could you please help clarify why the output file might be empty?

Steps to Reproduce

Generate the workload file with the following command:

Bash

python3 -m workload_generator.SimAI_training_workload_generator \
  --frame=Megatron \
  --world_size=16 \
  --tensor_model_parallel_size=2 \
  --pipeline_model_parallel=1 \
  --global_batch=1024 \
  --micro_batch=2 \
  --epoch_num=1 \
  --model_name=gpt_175B \
  --hidden_size=12288 \
  --num_layers=96 \
  --seq_length=4096 \
  --num_attention_heads=96 \
  --vocab_size=50257 \
  --max_position_embeddings=4096 \
  --ffn_hidden_size=11008 \
  --dtype=bfloat16 \
  --swiglu \
  --make_vocab_size_divisible_by=128 \
  --workload_only \
  --output_filename=gpt_175B-t1

Run the analytical simulator with the generated workload:

Bash

./SimAI_analytical \
  -w /root/simai/SimAI/aicb/results/workload/gpt_175B-t1.txt \
  -g 16 \
  -g_p_s 8 \
  -n_p_s 8 \
  -r gpt_175B-t1 \
  -g_type H100 \
  -nic 35.0 \
  -dp_o 0.5 \
  -tp_o 0.7 \
  -ep_o 0.8 \
  -pp_o 0.5

The SimAI_analytical command runs and prints some message to the console (the content of this message was as follows in my test).

id.........(many output)
id: optimizer3 , depen: -1 , wg_comp_time: 0
id: optimizer4 , depen: -1 , wg_comp_time: 0
type: HYBRID_TRANSFORMER_FWD_IN_BCKWD ,num passes: 1 ,lines: 12427 compute scale: 1 ,comm scale: 1
stat path: ./results/mo_bo_gpt_175B_costrate0.1/gpt_175B-t1 ,total rows: 1 ,stat row: 0
CSV path and filename: ./results/mo_bo_gpt_175B_costrate0.1/gpt_175B-t1EndToEnd.csv
SimAI begin run Analytical
pass: 0 finished at time: 74540
workload stats for the job scheduled at NPU offset: 0
{"retcode":0, "info":"Success!", "node_count":1, "nic_type":"cx7", "gpus_pernode":2, "nics_pernode":8.0, "coll_type":"allgather", "cross_nic":0, "nccl_algo":"Ring", "theoretical_bus_bw_GBps":370.800}
warning! a callable is removed before call
SimAI-Analytical finished.

The corresponding endtoend result file is created in the results directory, but it is empty (0 bytes).

Additional Information

Content of the generated workload file (gpt_175B-t1.txt):

gpt_175B-t1.txt

Environment :

SimAI Version: newest master branch in Sep 25 13:46:45 2025 +0800 (e5d125144ea864419d92fc1f15f36e378ee0e2a7)

Operating System: Ubuntu 22.04.4 LTS

Python Version: Python 3.11.12

Thank you for your time and assistance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions