Skip to content

Conversation

@xrsrke
Copy link

@xrsrke xrsrke commented Jan 20, 2026

No description provided.

Phuc Nguyen and others added 5 commits January 19, 2026 13:04
- Add activation checkpoint offload module
- Add memory defragmentation utilities
- Add deep memory profiler script
- Add various Kimi 1T training configs (EP64, EP96, EP128, CP2, etc.)
- Add Qwen3 activation offload test configs
- Add slurm launch scripts
- Update DeepSeek V3 model with MoE improvements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove kimi_1t, debug, exp1a training configs, qwen3 test configs,
and root-level debug scripts from git tracking.
@xrsrke xrsrke force-pushed the phuc/kimi1t_training branch from 9cfe11d to e04c0f6 Compare January 20, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants