Open
Conversation
* Enhance audit pages with textbook-style layout - Add left sidebar with chapter navigation - Add right sidebar for table of contents - Use AuditLayout component to match textbook styling - Improves visual consistency across the site * Add FAQ section to Paper Audit assignment Integrate comprehensive FAQ covering: - Scope and technical focus (primary paper, interface auditing) - Format and objectives (argue vs summarize, Amazon Principle) - Presentation logistics (audit vs presentation, time allocation) Students now have clear guidance on what the audit should accomplish and how it differs from a traditional paper summary. * Fix staging deployment SSH timeout issue - Add stdin redirect (< /dev/null) to properly detach nohup - Add sleep before starting new server to ensure old process is killed - Remove PID echo that could cause hanging This should prevent the SSH action from being terminated with SIGTERM. * Fix PR preview paths for different content types - Detect whether PR contains audit or contributor profile - Generate correct preview URL based on content type - Show appropriate checklist (audit vs contributor) - Extract actual filename for accurate path Fixes incorrect preview URLs for contributor profile PRs like #25. * Fix content type detection in PR preview workflow - Fetch base branch explicitly for git diff - Store FILES output to avoid repeated git diff calls - Use fetch-depth: 0 to ensure full git history available This should properly detect contributor vs audit files. * Add contributor profile: Gyanig Kumar (#25) Co-authored-by: Christoffer Heckman <christoffer.heckman@colorado.edu> * Fix SSH timeout with setsid and explicit exit - Use setsid instead of nohup for better process daemonization - Add explicit exit 0 after starting background process - Move success message before starting background job - Increase sleep to 2s to ensure pkill completes This should prevent the SSH session from hanging and getting terminated. * Add debug output to staging deployment script * Split deployment into two SSH sessions Separate static site deployment from API server startup to avoid SSH timeout. Each SSH session is now shorter and focused on a single task. * Remove unused API server from staging deployment The API server was for a comments/review system that is not currently implemented. Removing it fixes the SSH timeout issue. * Remove unused API server infrastructure The comments/review system was never fully implemented. The frontend components (CommentSidebar, API routes) don't exist and the system is incompatible with static export mode. * Support audits in both staging and production directories Some students may mistakenly place audits in content/textbook/audits/ instead of content/textbook/audits/staging/. Detect both locations. * Update Scratch-1 due date to February 1 * Add contributor profile: Thanushraam Suresh Kumar (#28) --------- Co-authored-by: Gyanig Kumar <gyanig.kumar@gmail.com> Co-authored-by: Thanushraam <45840572+Tr0612@users.noreply.github.com>
- Complete testing infrastructure (public & internal tests) - Solution management scripts (inject/reset) - Sanitization pipeline for public sync - GitHub Actions workflow for automated sync - Complete documentation (setup guides, references) - Example solution for scratch-1 assignment See PRIVATE_REPO_SETUP.md for complete architecture details.
BREAKING CHANGES: - Reorganized scripts into CI-critical (scripts/) vs dev helpers (scripts/dev/) - Enhanced sanitization pipeline with fail-safe validation - Added frontmatter validation for audit MDX files - Implemented Review Mode banner for PR previews Phase 1: Script Consolidation - Created scripts/dev/ for local development helpers - Moved 6 helper scripts to scripts/dev/ - Added README.md to both scripts/ and scripts/dev/ - Clear separation: CI-critical scripts stay in scripts/ Phase 2: Sanitization Pipeline Hardening - scripts/_sanitize_todos.py now fail-safe with exit codes - Added 3-step validation (pre, sanitize, post) - Enhanced error messages with line numbers - sync-to-public.yml now includes pre/post validation - Zero-tolerance for [SOLUTION] leaks Phase 3: Unified Linting for Audits - audit_linter.py validates required frontmatter fields - Checks: title, author, topic, paper (no empty/placeholder values) - Updated vla-audit.yml error messages Phase 4: Next.js Routing Cleanup - Dynamic staging prefix handling with STAGING_PR_NUMBER - Added Review Mode banner to AuditLayout.tsx - Shows PR number in preview banner - Better visual distinction for review vs production Phase 5: Repository Cleanup - Removed vercel.json (GitHub Actions/Pages deployment) - Added SOURCE OF TRUTH section to README.md - Clear repository ownership map - sanitize.sh removes scripts/dev/ Security Improvements: - Fail-safe sanitization (cannot sync if markers remain) - Pre-flight validation before sanitization - Post-flight verification after sanitization - Proper exit codes for all scripts - Repository clarity prevents accidental public pushes Documentation: - Scripts categorized and documented - README.md warns against pushing to public - Audit requirements clearly specified - REFACTOR_SUMMARY.md with complete changelog See REFACTOR_SUMMARY.md for detailed breakdown.
Technical Hardening - Tasks 1, 2, 3, & 4: Documentation (Task 1): - Create PRIVATE_OPERATIONS.md merging all instructor docs - Paper Audit Review Workflow - Server Configuration (Apache, systemd, API) - Deployment Procedures - Troubleshooting Guide - Update README.md with instructor-facing PST guide - Document Shadow CI, solution management, assignment lifecycle Hardened Sanitization (Task 2): - Enhance sanitize.sh: - Add draft block removal for MDX files - Add README.md overwrite step during sync - Renumber steps for clarity - Enhance _sanitize_todos.py: - Add multi-line [SOLUTION] comment detection - Add triple-quoted docstring [SOLUTION] detection - Improve regex patterns with proper ordering - Enhanced fail-safe verification Shadow CI (Task 3): - Create shadow-tester.yml workflow - Triggered by repository_dispatch from public repo - Fetches student code from public PR - Runs internal rigorous tests - Comments Pass/Fail status on public PR Pre-Commit Guard (Task 4): - Add executable pre-commit hook - Scans staged files for [SOLUTION] markers - Blocks commits if leaks detected in public-facing files - Provides clear fix instructions Documentation: - Create HARDENING_COMPLETE.md with full implementation report - Create BEFORE_AFTER.md documenting changes
Task 1: Prune Documentation - Consolidate PRIVATE_OPERATIONS.md, REVIEW_SYSTEM.md, PRIVATE_REPO_SETUP.md into comprehensive INSTRUCTOR.md - Delete 11 obsolete documentation files: - APACHE_CONFIG.md - BEFORE_AFTER.md - DEPLOYMENT_SUCCESS.md - HARDENING_COMPLETE.md - PRIVATE_OPERATIONS.md (merged) - PRIVATE_REPO_SETUP.md (merged) - QUICK_REFERENCE.md - REFACTOR_SUMMARY.md - REVIEW_SYSTEM.md (merged) - SETUP_COMPLETE.md - SYSTEM_COMPLETE.md Task 2: Refactor Solution Management - Rename scripts/manage_solutions.py → scripts/dev_utils.py - Add --verify-clean command: - Scans src/assignments/ for solution code leaks - Compares files against private/solutions/ using difflib - Exits with error if similarity > 80% (prevents accidental commits) - Normalizes code (removes comments/whitespace) for accurate comparison Task 3: Harden Sync Workflow - Update .github/workflows/sync-to-public.yml to use Orphan Push strategy: - git checkout --orphan temp-public-branch - git add -A && git commit - git push public temp-public-branch:main --force - Breaks ALL git history links between private and public repos - Public repo has completely independent history - Update leak detection to check for dev_utils.py instead of manage_solutions.py - Update sanitize.sh to delete dev_utils.py Task 4: Cleanup Public README - Public README already student-centric from previous hardening - No changes needed Benefits: - Single comprehensive instructor guide (INSTRUCTOR.md) - Enhanced solution leak prevention (--verify-clean) - Maximum security via orphan push (no history exposure) - Cleaner repository structure
Create Claude Code Project Skill for solution leak prevention: Task 1: Initialize Skill Structure - Create .claude/skills/vla-guard/ directory - Create .claude/skills/vla-guard/SKILL.md Task 2: Define SKILL.md Logic - Add frontmatter: - name: vla-guard - description: Final audit to prevent solution/internal test leaks - user-invocable: true - Implement 5-step audit process: 1. Identify all solutions (python3 scripts/dev_utils.py --list) 2. Scan for solution content leaks ([SOLUTION] markers) 3. Verify private/ and tests/internal/ not staged 4. Check git log for accidental solution commits 5. Check for sensitive file leaks Task 3: Create Custom Slash Command - Create .claude/commands/pre-flight.md - Invokes /vla-guard skill first - Runs scripts/sanitize.sh only if guard passes - Provides comprehensive pre-flight check before push/PR Features: - Color-coded audit reports (✅/❌/⚠️ ) - Integration with dev_utils.py --verify-clean - Fail-safe: blocks sanitization if audit fails - Clear remediation instructions on failure Usage: /vla-guard - Run security audit only /pre-flight - Run audit + sanitization pipeline Also added: - REFACTOR_COMPLETE.md - Documentation of consolidation work
Create 6 new Claude Code skills for VLA Foundations workflow automation: 1. /test-rigor - Internal grading test runner - Auto-injects solutions before testing - Runs pytest with rigor markers - Generates test reports - Auto-resets to starter code 2. /generate-fixtures - Gold standard fixture generator - Creates reference data for fidelity tests - Uses fixed random seeds (seed=42) - Generates model outputs and checkpoints - Verifies no NaNs in fixtures 3. /grade - Automated student PR grading - Fetches student code from GitHub - Runs public and internal tests - Generates detailed feedback reports - Posts comments on PRs - Updates PR labels 4. /release - Safe assignment publishing workflow - Runs VLA Guard audit (fail-fast) - Executes sanitization pipeline - Creates release tags - Monitors GitHub Actions - Verifies public repo - Checks deployment status 5. /new-assignment - Assignment scaffolding generator - Creates complete directory structure - Generates starter code with TODOs - Generates solution templates - Generates test templates - Creates MDX assignment spec 6. /sync-check - Post-release verification - Clones public repo (read-only) - Scans for solution leaks - Verifies orphan push strategy - Checks deployment status - Generates verification reports Additional changes: - Add command shortcuts in .claude/commands/ - Create directories for reports and releases - Update .gitignore to ignore generated reports - Add comprehensive README.md for skills Benefits: - Automates instructor workflows - Fail-safe protection against leaks - Comprehensive audit trails - Reduces manual errors - Speeds up grading and releases
Create comprehensive guide for AI SWE agents working with student code: - Student workflow (branch, implement, test, submit) - Public testing philosophy - Assignment structure and TODOs - Git hygiene (rebase-only workflow) - Semantic line breaks in MDX - Common issues and solutions - Engineering standards and grading rubric Key sections: - Complete assignment workflow example - Commands useful for students - Testing with public tests - PR submission process - Resources and documentation links Student-focused: - No references to private repo or solutions - Only public tests documented - Clear submission guidelines - Common troubleshooting tips - Resources for help Fixes from template: - Removed manage_solutions.py references (private only) - Removed audit_linter.py references (doesn't exist) - Fixed Google search link placeholders - Added actual file paths - Clarified student permissions (can't merge own PRs)
Create detailed guide for AI SWE agents working in private repo: - Dual-repository architecture explanation - Complete Claude Code skills documentation - Solution management workflow (dev_utils.py) - Testing philosophy (public vs internal) - Sanitization pipeline details - Security boundaries and leak prevention - Typical workflows for instructors - Shadow CI explanation - Orphan push strategy Key sections: - 7 Claude Code skills with usage examples - Commands useful in development - Pre-release checklist - Git hygiene and rebase-only workflow - File map with actual paths (not Google search links) Fixes: - Corrected manage_solutions.py → dev_utils.py - Added missing Claude Code skills section - Included Shadow CI documentation - Added security boundaries section - Removed Google search link placeholders
- Replace backbone_solution.py with corrected version matching student template - Uses combined qkv_proj (not separate q/k/v projections) - Uses F.silu() (SwiGLU) activation - Implements all 4 TODOs correctly - Fixes tensor contiguity issue in loss computation - Add gold standard test fixtures - private/fixtures/scratch1_gold_output.pt - private/fixtures/scratch1_attention_fixture.pt - private/fixtures/scratch1_rmsnorm_fixture.pt - Generator script for reproducible fixtures - Update test infrastructure - Add load_gold_standard and sample_batch fixtures - Mark DINOv2 tests as mastery (skipped for core assignment) - Fix subprocess calls to use sys.executable - Add mastery marker to pytest.ini - Add uv package management - pyproject.toml with torch, numpy, pytest dependencies - uv.lock for reproducible environments - Update claude.md with uv usage instructions Test results: 7 passed, 2 skipped (mastery tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The orphan push strategy was destroying all student work on the public repo. Now uses merge strategy that: - Fetches current public repo state - Merges sanitized content into main - Preserves all student branches and work
Files that document the sanitization process (README, INSTRUCTOR.md, etc.) mention [SOLUTION] markers as part of explaining how the system works. These are not actual solution code and should be excluded from the check. Also excludes: - .claude/ directory (skill definitions) - .github/ directory (workflow definitions)
YAML syntax error on line 224 caused by unquoted multi-line string. Using heredoc pattern for proper bash multi-line string handling.
Previous issue: Averaging 7 joint deltas into single action caused massive information loss. Model achieved only ~4.6 loss (barely better than random ~5.5). New action encoding: - Direction: 8 octants (±X, ±Y, ±Z) → 3 bits - Magnitude: Distance to target, 32 bins → 5 bits - Total: 8 * 32 = 256 discrete actions This is LEARNABLE because: - Model sees state (joint angles + end-effector position) - Can compute error vector toward target - Can predict corresponding action deterministically Additional improvements: - Reduced noise (0.05 → 0.02) for clearer patterns - Stronger gradient component toward target (0.01 → 0.05) - More deterministic motion generation Expected result: Loss should converge significantly below previous 4.6.
## Changes ### PyTorch CUDA 11.8 Support - Updated pyproject.toml to use CUDA 11.8 index - Constrained Python to 3.10-3.13 (CUDA wheels limitation) - Now supports P6000 GPUs (compute capability 6.1) ### Training Improvements - Training completes in 41s on GPU (vs 7+ hours on CPU) - Updated training_run.py to use GPU when available - Removed overly restrictive compute capability check ### Data Generation Fix (Previously Committed) - Structured action encoding: direction (8 octants) + magnitude (32 bins) - Actions now learnable from state (error vector) - Loss improved from ~4.6 to ~1.97 ## Results **Training Performance:** - Loss: 3.27 → 1.96 (consistent convergence) - Time: 41 seconds on P6000 - All 7 internal rigor tests passing **vs Previous (averaged actions):** - Loss: ~4.6 (barely better than random) - Model couldn't learn meaningful patterns ## Test Results ``` 7 passed, 2 deselected (mastery features) - test_training_convergence ✓ - test_attention_gradient_flow ✓ - test_causal_mask_prevents_future_leakage ✓ - test_rmsnorm_numerics ✓ - test_model_output_distribution ✓ - test_loss_computation_correctness ✓ - test_overfitting_on_single_batch ✓ ``` Solution ready for grading student submissions.
**Problem**: Assignment required loss < 1.0, but actual correct implementation achieves ~1.9-2.0. **Changes**: - Updated Pass Level to require "clear convergence" not arbitrary threshold - Added expected loss ranges: Initial ~3-4, Final ~1.9-2.2 - Added FAQ explaining what loss to expect - Emphasize learning trajectory over absolute value **Rationale**: - Action encoding (direction + magnitude) is learnable but not trivial - Random guessing: ~5.5 (log(256)) - Structured learning: ~1.9-2.0 (significant improvement) - Internal tests only verify loss decreases, not absolute threshold This aligns assignment expectations with reality.
Changed 'EOF' to EOF (without quotes) to allow ${TAG_NAME} and
${RELEASE_DATE} to be interpolated in the commit message.
Heredoc inside YAML run block caused parsing errors. Using simple multiline string assignment instead.
Previous approach with multiline string caused YAML parsing errors. Using git commit's multiple -m flag feature instead - each -m creates a new paragraph in the commit message.
Changes: - Verify PUBLIC_REPO_TOKEN is set before proceeding - Use git credential helper instead of embedding token in URL - More secure credential handling - Better error messages if token is missing
The credential helper approach wasn't working in GitHub Actions. Reverting to the simpler, proven approach of embedding the token directly in the remote URL.
Updated to use PUBLIC_REPO_TOKEN_2 as the original token was deleted.
Added: - Show token prefix to verify it's set - Test remote access with ls-remote before pushing - Better error messages for auth failures
The actions/checkout was configuring git with the default GITHUB_TOKEN, which was overriding our PUBLIC_REPO_TOKEN_2 during push operations. Changes: - Set persist-credentials: false on checkout action - Explicitly unset credential helpers before using our token - Set credential.useHttpPath to prevent token reuse across repos This ensures only PUBLIC_REPO_TOKEN_2 is used for pushing to public repo.
…c sync Added removal of: - INSTRUCTOR.md, INSTRUCTOR_GUIDE.md, API_SETUP.md - SETUP_WITH_GH_CLI.md, QUICK_START_SSH.md - .claude/ directory (instructor workflow automation) - .github/workflows/sync-to-public.yml (the sync workflow itself) Added validation checks to ensure these files are removed before push.
Sanitized content from private repository. This release includes updated assignment materials. Changes: - Updated assignment templates - Fixed bugs and improvements - Documentation updates
GitHub Actions doesn't allow both 'paths' and 'paths-ignore' for the same event. Removed 'paths' and kept only 'paths-ignore' which is simpler and more maintainable. Added more ignore patterns: - data/** (dataset files) - pyproject.toml, pytest.ini, uv.lock (Python config) This prevents unnecessary deployments when only assignment code changes.
The merge was failing on rename/delete conflicts where the public repo had moved files (e.g., scripts/dev/complete-setup.sh → complete-setup.sh) but our sanitization deleted them. Improved conflict resolution to: - DU conflicts: Accept deletion (release branch deleted the file) - UU conflicts: Accept release branch version - UA conflicts: Accept deletion This ensures the release branch state (sanitized) is always preferred.
Sanitized content from private repository. This release includes updated assignment materials. Changes: - Updated assignment templates - Fixed bugs and improvements - Documentation updates
Fixed 3 authentication issues: 1. Line 17: Changed from GITHUB_TOKEN to PRIVATE_REPO_TOKEN - Workflow runs in PUBLIC repo, needs PAT to access PRIVATE repo - PRIVATE_REPO_TOKEN must be added to public repo secrets 2. Lines 90 & 114: Changed from PUBLIC_REPO_TOKEN to GITHUB_TOKEN - Commenting on same repo where workflow runs - Default GITHUB_TOKEN has permission to comment on PRs 3. Added pull-requests: write permission - Required for commenting on PRs ACTION REQUIRED: Add PRIVATE_REPO_TOKEN secret to public repo (arpg/vla-foundations): - Go to: https://github.com/arpg/vla-foundations/settings/secrets/actions - Add secret: PRIVATE_REPO_TOKEN - Value: Same PAT as PUBLIC_REPO_TOKEN_2 (with repo access)
Sanitized content from private repository. This release includes updated assignment materials. Changes: - Updated assignment templates - Fixed bugs and improvements - Documentation updates
|
🚀 Preview Deployed Your preview is ready for review! 🔗 Preview URL: https://arpg.github.io/vla-foundations/staging/pulls/35// Next Steps
This preview will be removed when the PR is closed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR completes the implementation of the decoder-only Transformer backbone for Scratch-1. It includes RMSNorm, RoPE, and a fully functional causal self-attention mechanism with proper masking. An end-to-end training loop was built with logging of training loss, validation loss, and perplexity for both. KV cache is also implemented. I also experimented with sinusoidal positional encoding for comparison against RoPE. The accompanying report contains training loss curve visualizations, attention map visualizations, analysis of causal masking behavior, a comparison between RoPE and sinusoidal positional encoding, and an evaluation of KV cache inference speed.