[WIP] Initial DeepSeek reference implementation#861
[WIP] Initial DeepSeek reference implementation#861denys-fridman wants to merge 16 commits intomlcommons:masterfrom
Conversation
|
MLCommons CLA bot: |
7e03fe1 to
18410e3
Compare
44a6723 to
38e318c
Compare
| export GBS=1024 | ||
| # Dataloader: Micro batch size | ||
| export MBS=1 | ||
| export MAX_LR="2e-4" |
| grad_accumulation_steps = mini_batch_size // args.mbs | ||
|
|
||
| logging_configs = { | ||
| mllogger.constants.SEED: args.seed, |
There was a problem hiding this comment.
we need more mllog event like
[ self.mllogger.event(
key=constants.SUBMISSION_BENCHMARK,
value=self.submission_info["submission_benchmark"],
)], detailed list:(https://github.com/mlcommons/training/blob/master/llama2_70b_lora/scripts/mlperf_logging_utils.py)
| @@ -0,0 +1,16 @@ | |||
| git+https://github.com/denys-fridman/logging.git@dfridman/deepseek-v3 # TODO(dfridman): revert to main repo once merged | |||
There was a problem hiding this comment.
I think we need this reverted before merging. There is another TODO in the PR.
| pip install -e . | ||
|
|
||
| ## 2. Megatron-bridge and megatron-core | ||
| ARG MBRIDGE_REVISION=main |
There was a problem hiding this comment.
Can we pin this like NEMORUN_REVISION?
No description provided.