Skip to content

Comments

[WIP] Initial DeepSeek reference implementation#861

Open
denys-fridman wants to merge 16 commits intomlcommons:masterfrom
denys-fridman:dfridman/deepseek-reference-implementation
Open

[WIP] Initial DeepSeek reference implementation#861
denys-fridman wants to merge 16 commits intomlcommons:masterfrom
denys-fridman:dfridman/deepseek-reference-implementation

Conversation

@denys-fridman
Copy link

No description provided.

@denys-fridman denys-fridman requested a review from a team as a code owner January 14, 2026 11:21
@github-actions
Copy link

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
@denys-fridman
You can retrigger this bot by commenting recheck in this Pull Request

@denys-fridman denys-fridman force-pushed the dfridman/deepseek-reference-implementation branch from 7e03fe1 to 18410e3 Compare January 14, 2026 14:16
@denys-fridman denys-fridman force-pushed the dfridman/deepseek-reference-implementation branch from 44a6723 to 38e318c Compare January 21, 2026 09:50
export GBS=1024
# Dataloader: Micro batch size
export MBS=1
export MAX_LR="2e-4"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the final tuned HPs?

grad_accumulation_steps = mini_batch_size // args.mbs

logging_configs = {
mllogger.constants.SEED: args.seed,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need more mllog event like
[ self.mllogger.event(
key=constants.SUBMISSION_BENCHMARK,
value=self.submission_info["submission_benchmark"],
)], detailed list:(https://github.com/mlcommons/training/blob/master/llama2_70b_lora/scripts/mlperf_logging_utils.py)

@@ -0,0 +1,16 @@
git+https://github.com/denys-fridman/logging.git@dfridman/deepseek-v3 # TODO(dfridman): revert to main repo once merged

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this reverted before merging. There is another TODO in the PR.

pip install -e .

## 2. Megatron-bridge and megatron-core
ARG MBRIDGE_REVISION=main

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pin this like NEMORUN_REVISION?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants