-
Notifications
You must be signed in to change notification settings - Fork 37
feat(ai-runner): implement core module with TDD (Task 2 partial) #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Implement AI test runner foundation following TDD process: - readTestFile(): Read test file contents (any extension) - calculateRequiredPasses(): Ceiling math for threshold calculation Architecture decisions documented: - Agent-agnostic design via configurable agentConfig - Default to Claude Code CLI: `claude -p --output-format json` - Subprocess per run = automatic context isolation - Support for OpenCode and Cursor CLI alternatives Files added: - source/ai-runner.js (core module) - source/ai-runner.test.js (4 passing tests) Next steps documented in epic: - executeAgent() - spawn CLI subprocess - aggregateResults() - aggregate pass/fail - runAITests() - orchestrate parallel runs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused imports (vi, aggregateResults, runAITests) - Add threshold validation (0-100 range check) - Fix test race condition with unique directory names - Fix resource leak by moving file ops into try block - Add tests for threshold validation edge cases Resolves all bug bot comments from PR #394
Implement core AI testing framework modules: - Add executeAgent() with 5-minute default timeout and enhanced error messages - Add aggregateResults() for multi-run pass/fail calculation - Add runAITests() orchestrating parallel test execution - Add test output recording with TAP v13 format - Add browser auto-open for test results - Add slug generation via cuid2 for unique output files - Include comprehensive test coverage (31 tests) Enhanced error handling includes command context, stderr, and stdout previews for debugging. Task 2 and Task 3 complete from epic 2026-01-22-riteway-ai-testing-framework.md
Implement error-causes library pattern for structured error handling Add --agent flag to support claude, opencode, and cursor agents Add getAgentConfig() function with agent name validation Consolidate path imports into single statement Expand test coverage from 40 to 49 TAP tests Document code quality improvements in Task 6
Complete AI testing framework implementation with: - Comprehensive E2E test suite (13 assertions) - Full workflow testing with mock agent - TAP output format verification - AI testing documentation in README - CLI usage examples and agent configuration docs - ESLint configuration update (ES2022 support) - Linter fixes (unused imports, catch parameters) - Vitest exclusion for Riteway/TAP tests All 62 TAP tests + 37 Vitest tests passing Epic: tasks/archive/2026-01-22-riteway-ai-testing-framework.md
🔬 Code Review🎯 RestatementReviewing commit 889429e which implements the AI test runner foundation with:
💡 Initial AnalysisStrengths:
Areas for Improvement: 🪞 Critical Reflection1. Code Structure and OrganizationPositive:
Issues: 2. Adherence to Coding StandardsJavaScript Guide Violations: export const calculateRequiredPasses = ({ runs = 4, threshold = 75 } = {}) => {
return Math.ceil((runs * threshold) / 100);
};Issue: No validation on export const calculateRequiredPasses = ({ runs = 4, threshold = 75 } = {}) => {
if (threshold < 0 || threshold > 100) {
throw new Error('threshold must be between 0 and 100');
}
return Math.ceil((runs * threshold) / 100);
};Status: ✅ Fixed - Good validation added with comprehensive tests. 3. Test Coverage and QualityTDD Compliance: The tests answer the 5 key questions:
Test Isolation Issues: const testDir = join(tmpdir(), 'riteway-test-' + Date.now() + '-' + Math.random().toString(36).slice(2));Status: ✅ Fixed - The addition of Resource Leak Issues: test('reads file contents from path', async () => {
const testDir = join(tmpdir(), 'riteway-test-' + Date.now() + '-' + Math.random().toString(36).slice(2));
let testFile;
try {
mkdirSync(testDir, { recursive: true });
testFile = join(testDir, 'test.sudo');
const contents = 'describe("test", { requirements: ["should work"] })';
writeFileSync(testFile, contents);Issue: Directory creation and file write happen outside try block. If Recommended Fix: test('reads file contents from path', async () => {
const testDir = join(tmpdir(), 'riteway-test-' + Date.now() + '-' + Math.random().toString(36).slice(2));
try {
mkdirSync(testDir, { recursive: true });
const testFile = join(testDir, 'test.sudo');
const contents = 'describe("test", { requirements: ["should work"] })';
writeFileSync(testFile, contents);
// ... rest of test
} finally {
rmSync(testDir, { recursive: true, force: true });
}
});Move ALL file operations inside the try block to ensure cleanup on any failure. 4. Performance ConsiderationsGood:
Concerns: 5. Security Review (OWASP Top 10)A01:2021 – Broken Access Control:
A02:2021 – Cryptographic Failures:
A03:2021 – Injection:
A04:2021 – Insecure Design:
A05:2021 – Security Misconfiguration:
A06:2021 – Vulnerable and Outdated Components:
A07:2021 – Identification and Authentication Failures:
A08:2021 – Software and Data Integrity Failures:
A09:2021 – Security Logging and Monitoring Failures:
A10:2021 – Server-Side Request Forgery:
6. Architectural PatternsExcellent Decisions:
Adherence to Epic Requirements:
7. Documentation QualityJSDoc Comments: /**
* Read the contents of a test file.
* @param {string} filePath - Path to the test file
* @returns {Promise<string>} File contents
*/
export const readTestFile = (filePath) => readFile(filePath, 'utf-8');Good: Clear, minimal documentation following JavaScript guide principles. /**
* Calculate the number of passes required to meet the threshold.
* Uses ceiling to ensure threshold is met or exceeded.
* @param {Object} options
* @param {number} [options.runs=4] - Total number of test runs
* @param {number} [options.threshold=75] - Required pass percentage (0-100)
* @returns {number} Number of passes required
* @throws {Error} If threshold is not between 0 and 100
*/Excellent: Documents parameters, defaults, return values, and error conditions. 🔭 Broader ContextIntegration Readiness:
Comparison to Epic:
⚖️ Severity AssessmentCritical Issues: None High Severity:
Medium Severity:
Low Severity:
💬 Actionable RecommendationsMust Fix Before Merge:
Should Address Soon:
Nice to Have:
SummaryThis is high-quality TDD implementation that follows project standards well. The core functions are solid, well-tested, and properly documented. The architectural decisions align perfectly with the epic vision. Key Strengths:
Required Fixes:
Recommended Enhancements:
Overall Assessment: ✅ APPROVE with minor fixes required The implementation is on the right track and ready to proceed once the resource leak is addressed. The threshold validation issue was already fixed, and the race condition was resolved with the random suffix addition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| throw new Error('threshold must be between 0 and 100'); | ||
| } | ||
| return Math.ceil((runs * threshold) / 100); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero runs parameter causes false-positive test results
Medium Severity
The calculateRequiredPasses() function validates threshold but not runs. When runs is 0 or negative, Array.from({ length: runs }) creates an empty array, requiredPasses becomes 0, and passed: 0 >= 0 evaluates to true. This causes the test suite to report success without executing any tests. Additionally, Math.round(results.passCount / results.totalRuns * 100) in runAICommand produces NaN due to division by zero.
Additional Locations (1)
| OutputError: { | ||
| code: 'OUTPUT_ERROR', | ||
| message: 'Test output recording failed' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused OutputError type and handler are dead code
Low Severity
The OutputError error type is defined in errorCauses and destructured, and a handler is implemented in handleAIRunnerErrors, but it is never thrown anywhere in the codebase. All errors from recordTestOutput are caught by the generic catch block in runAICommand and wrapped as AITestError instead. This leaves dead code that adds maintenance burden and could confuse future developers about the error handling design.
Additional Locations (1)
For filename disambiguation, we should use a 5-digit unique slug generated by cuid2
CRITICALUnit tests need to run in isolation, and the AI agent needs to judge the results. We need the agent to intelligently extract each individual test to run in the subagents in isolation from each other, otherwise, AI attention mechanism creates shared mutable state between tests, and the tests don't run in proper isolation. Proposal: we write the test scripts in SudoLang, and have a pre-processing agent extract each test in isolation from the rest of the tests, inserting the prompt under test at the top, and appending JUST ONE assertion per call. The appended prompt gets sent to the sub-agent. Sample code thinking: Ideal test script looks like this: Ideal TransformationWe'll need the LLM to call a sub-agent to extract tests, which would produce this shape of ideal transformation on a per test basis: The output would be the AI agent's response to the user, which we can then make SudoLang Assert TypeIsolationIn order to invoke each assertion in isolation, we need to extract these into smaller prompts which only contain the context needed for the assertion we care about in the moment: export const runAITests = async ({
filePath,
runs = 4,
threshold = 75,
timeout = 300000,
agentConfig = {
command: 'claude',
args: ['-p', '--output-format', 'json', '--no-session-persistence']
}
}) => {
const tests = asyncPipe(
readTestFile,
extractTests // split the context so it only contains what we need for a single assertion
)(filePath);
const responses = Promise.all(tests.map((prompt) => {
return Array.from({ length: runs }, () => executeAgent({ agentConfig, prompt, timeout }))
}));
const scores = Promise.all(responses.map((responseSet) => {
/* process and score all the responses for prompt coherence */
})
return aggregateResults({ runResults, scores, threshold, runs });
}; |
Implement AI test runner foundation following TDD process:
Architecture decisions documented:
claude -p --output-format jsonFiles added:
Next steps documented in epic:
Note
Introduces an AI prompt testing workflow and CLI, plus supporting docs and agent rules.
riteway aiCLI: Parallel, thresholded AI prompt runs with agent selection (claude|opencode|cursor), structured errors, and help outputsource/ai-runner.js(file read, agent exec, aggregation) andsource/test-output.js(TAP output, date/slug naming, browser open)bin/ritewaywithaisubcommand, argument parsing,getAgentConfig(), and error-causes handling; adds comprehensive unit and E2E testsriteway ai, new AI agent guidelines/rules underai/,plan.mdepic completion,vitest.config.jsupdate@paralleldrive/cuid2andopenWritten by Cursor Bugbot for commit 3dc4912. This will update automatically on new commits. Configure here.