Add a mini-transformer example #5
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a mini-transformer example demonstrating the framework's capabilities for building and training transformer models with JAX sharding support.
Changes
s["trainer"]["step"]tos["step"]at experiment levelSkipConnection- residual connections with configurable combiner functionRepeated- sequential repetition of a layer with independent parameters per instanceUnembedding- projects from hidden dimension to vocabulary logitsRoPE- rotary position embeddingsparam_dtype,param_sharding, andout_shardingfor mixed precision and distributed training__eq__and__len__methods for proper dict-like behaviorExample usage
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.