feat: add test evidence checker for PR submissions (#61) #89

snomiao · 2025-10-20T18:15:59Z

Summary

Implements automated test evidence checking for PRs in desktop and ComfyUI repos, solving issue #61 in a smarter way.

Changes

✅ Created app/tasks/gh-test-evidence/gh-test-evidence.ts task
✅ Scans open PRs in Comfy-Org/desktop and comfyanonymous/ComfyUI
✅ Uses GPT-4o-mini (cheaper, faster) to analyze PR bodies
✅ Detects test explanations, screenshots, and videos
✅ Posts warning comments when evidence is missing
✅ Auto-updates comments when PR changes
✅ Deletes comments when all evidence is present
✅ Follows ComfyUI_frontend workflow message format
✅ Added comprehensive test file

Smart Improvements

Efficient AI model: Uses GPT-4o-mini instead of GPT-4o for faster, cheaper analysis
Clean architecture: Follows existing task patterns (coreping, gh-bounty)
Idempotent: Re-analyzes only when PR updates
Smart comment management: Updates existing comments instead of spamming
Database tracking: Uses MongoDB to track task state
Type-safe: Full TypeScript with Zod validation
Bot marker: Uses HTML comment marker for identifying bot comments

Testing

Added gh-test-evidence.spec.ts with test structure
Follows project test patterns
Tests cover: draft PRs, missing evidence, complete evidence, comment updates

Workflow Integration

Added to app/tasks/run-gh-tasks.ts to run on schedule with other GitHub tasks.

Closes #61

🤖 Generated with Claude Code

Implements automated test evidence checking for PRs in desktop and ComfyUI repos. - Creates gh-test-evidence task to scan open PRs - Uses GPT-4o-mini to analyze PR bodies for test evidence - Posts warning comments when test explanations or visual proof are missing - Auto-updates or deletes comments based on PR changes - Follows the same comment pattern as ComfyUI_frontend workflow Resolves #61 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

vercel · 2025-10-20T18:16:06Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
comfy-pr	Ready	Preview, Comment	Jan 22, 2026 3:36am

Copilot

Pull Request Overview

This PR implements automated test evidence checking for pull requests in the Comfy-Org/desktop and comfyanonymous/ComfyUI repositories. The solution uses GPT-4o-mini to analyze PR descriptions for test explanations, screenshots, and videos, then posts/updates/deletes warning comments based on what evidence is present.

Key changes:

New automated task that scans open PRs and validates test evidence using AI
Smart comment management system that updates existing comments instead of creating duplicates
Database-backed state tracking to avoid redundant analysis

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
app/tasks/run-gh-tasks.ts	Registers the new test evidence task and reformats existing task entries for consistency
app/tasks/gh-test-evidence/gh-test-evidence.ts	Core implementation of the test evidence checker with OpenAI integration, comment management, and database persistence
app/tasks/gh-test-evidence/gh-test-evidence.spec.ts	Test suite structure with mocked dependencies for validating the task behavior

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

app/tasks/gh-test-evidence/gh-test-evidence.ts

app/tasks/gh-test-evidence/gh-test-evidence.spec.ts

…dd CI cleanup - Extract main logic into runCorePingTask() function for better testability - Add isCI check to properly close DB and exit in CI environments - Add todo comment about deprecating custom webhook types in favor of @octokit/webhooks-types - Add llm-api, @keyv/mongo, and @octokit/webhooks-types dependencies - Remove trailing whitespace 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Resolve conflicts in coreping.ts by keeping new refactored structure - Resolve conflicts in run-gh-tasks.ts by including all task imports - Resolve conflicts in package.json by keeping both new dependencies - Accept incoming bun.lock changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Switch from gpt-4o-mini to gpt-5-mini for analyzing PR test evidence. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed typo 'Explaination' -> 'Explanation' throughout the codebase: - Updated schema field name in TestEvidenceSchema - Updated all references in code and tests - Updated OpenAI prompt and JSON schema - Updated warning message generation Addresses review comments from Copilot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

package.json

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

snomiao

Thanks @copilot! All spelling corrections from 'Explaination' to 'Explanation' have been addressed in commit 317d1ce. The fixes include:

TestEvidenceSchema field name
All code references
OpenAI prompt and JSON schema
Warning message generation
Test files

Corrects zod version that was accidentally changed during merge from ^4.0.5 to ^4.0.0. This should resolve the Vercel build failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ubRepoUrl The function parseUrlRepoOwner does not exist in @/src/parseOwnerRepo. The correct function name is parseGithubRepoUrl. This fixes the TypeScript compilation error that was causing the Vercel build to fail. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…or code formatting

socket-security · 2025-10-30T05:51:19Z

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

Replace bun:mock with MSW (Mock Service Worker) for more realistic HTTP mocking: - Mock GitHub API endpoints (pulls, comments) and OpenAI API - Add proper MSW server lifecycle (beforeAll, afterEach, afterAll) - Mock database module to avoid MongoDB connection in tests - All tests passing (4/4) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

app/tasks/gh-test-evidence/gh-test-evidence.spec.ts

…nation' Address Copilot review feedback to use singular form 'explanation' for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Use the centralized MSW setup from @/src/test/msw-setup instead of duplicating server configuration. This addresses the review comment to use the unified MSW setup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

snomiao · 2025-10-30T06:09:03Z

Fixed spelling from 'explanations' to 'explanation' in b963f43

snomiao · 2025-10-30T06:09:08Z

Refactored to use unified MSW setup from @/src/test/msw-setup in 0800956

snomiao · 2025-10-30T08:06:30Z

run/github-webhook-event-type/index.ts

 type S = GithubApiComponents["schemas"];
+// todo(sno): deprecate this and use @octokit/webhooks-types
 export type WEBHOOK_EVENTS = {
  branch_protection_configuration: S[`webhook-branch-protection-configuration${string}` & keyof S];


lets' remove this file

and update usages

Updated OpenAI model from invalid 'gpt-5-mini' to correct 'gpt-4o-mini' for test evidence analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

… assertions - Add comprehensive GitHub client mocking (ghc.pulls.list, gh.issues, etc.) - Add database mocking with proper findOneAndUpdate implementation - Add actual test assertions using expect() for all test cases - Verify comment creation, deletion, and update behavior - Verify draft PR skipping logic - Verify warning message format and content - All tests now properly validate behavior instead of just structure Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ine type - Delete run/github-webhook-event-type/index.ts - Replace import with inline WEBHOOK_EVENT type definition in run/index.ts - Add TODO comment to consider migrating to @octokit/webhooks-types Addresses review comment to remove unused file and update usages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updated by Bun to include configVersion in lockfile format. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add Biome configuration with settings for: - Formatting (2-space indent, 120 line width, LF line endings) - Linting (recommended rules with React and Next.js support) - JavaScript formatting (single quotes, trailing commas, etc.) - HTML formatting - Auto-organize imports and sort attributes Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Resolved conflicts: - Accepted main's @octokit/webhooks-types migration in run/index.ts - Accepted main's CLAUDE.md documentation updates - Accepted main's coreping.ts refactoring - Merged gh-test-evidence task import with new task imports - Accepted main's bun.lock and package.json updates Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Update imports to match the refactored GitHub client location: - @/src/gh → @/lib/github - @/src/ghc → @/lib/github/githubCached Fixes CI error: Cannot find module '@/src/gh' Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings October 20, 2025 18:15

Copilot AI reviewed Oct 20, 2025

View reviewed changes

vercel bot had a problem deploying to Preview October 20, 2025 18:18 Failure

snomiao and others added 2 commits October 21, 2025 04:12

vercel bot had a problem deploying to Preview October 21, 2025 04:16 Failure

snomiao and others added 2 commits October 22, 2025 13:50

chore: update test evidence checker to use gpt-5-mini model

5dbebd0

Switch from gpt-4o-mini to gpt-5-mini for analyzing PR test evidence. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

vercel bot had a problem deploying to Preview October 22, 2025 13:59 Failure

snomiao requested a review from Copilot October 22, 2025 14:49

Copilot AI reviewed Oct 22, 2025

View reviewed changes

package.json Outdated Show resolved Hide resolved

Update package.json

8dc989d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

vercel bot had a problem deploying to Preview October 22, 2025 14:52 Failure

Merge branch 'main' into sno-test-evidence

382e95c

vercel bot had a problem deploying to Preview October 25, 2025 13:09 Failure

snomiao commented Oct 25, 2025

View reviewed changes

vercel bot had a problem deploying to Preview October 25, 2025 13:20 Failure

vercel bot deployed to Preview October 25, 2025 13:27 View deployment

resolve merge conflict: keep gh-test-evidence import

cc8a778

vercel bot deployed to Preview October 30, 2025 05:42 View deployment

chore(package.json): add biome as a dev dependency and a fmt script f…

b789ade

…or code formatting

vercel bot deployed to Preview October 30, 2025 05:49 View deployment

vercel bot deployed to Preview October 30, 2025 05:53 View deployment

snomiao commented Oct 30, 2025

View reviewed changes

app/tasks/gh-test-evidence/gh-test-evidence.spec.ts Outdated Show resolved Hide resolved

snomiao and others added 3 commits October 30, 2025 06:07

Merge remote-tracking branch 'origin/main' into sno-test-evidence

09c3d52

fix(gh-test-evidence): correct spelling from 'explanations' to 'expla…

b963f43

…nation' Address Copilot review feedback to use singular form 'explanation' for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

vercel bot deployed to Preview October 30, 2025 06:11 View deployment

snomiao commented Oct 30, 2025

View reviewed changes

fix(gh-test-evidence): correct model from gpt-5-mini to gpt-4o-mini

c9a3e11

Updated OpenAI model from invalid 'gpt-5-mini' to correct 'gpt-4o-mini' for test evidence analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

vercel bot deployed to Preview November 4, 2025 23:24 View deployment

snomiao and others added 5 commits January 22, 2026 03:23

chore: update bun.lock with configVersion field

afa8bae

Updated by Bun to include configVersion in lockfile format. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview January 22, 2026 03:33 View deployment

vercel bot deployed to Preview January 22, 2026 03:36 View deployment

feat: add test evidence checker for PR submissions (#61) #89

Are you sure you want to change the base?

feat: add test evidence checker for PR submissions (#61) #89

Uh oh!

Conversation

snomiao commented Oct 20, 2025

Summary

Changes

Smart Improvements

Testing

Workflow Integration

Uh oh!

vercel bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

snomiao left a comment

Choose a reason for hiding this comment

Uh oh!

socket-security bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

snomiao commented Oct 30, 2025

Uh oh!

snomiao commented Oct 30, 2025

Uh oh!

snomiao Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

snomiao Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Oct 20, 2025 •

edited

Loading

socket-security bot commented Oct 30, 2025 •

edited

Loading