fix: Improve agent accuracy to decrease hallucinations#1110
fix: Improve agent accuracy to decrease hallucinations#1110justinegeffen merged 4 commits intomasterfrom
Conversation
Implement comprehensive solution to prevent editorial agents from reporting hallucinated findings to users. ## Problem Editorial agents (voice-tone, terminology, punctuation) were hallucinating findings with 67-100% hallucination rates: - Fabricating quotes that don't exist in files - Citing wrong line numbers - Reporting issues in non-existent files ## Solution ### 1. Enhanced agent prompt template - Two-step process: extract quotes, then analyze - Strict format requiring exact quotes with context - Self-verification checklist (6 questions) - Prohibition on using training data/memory - Confidence scoring (HIGH only) - Common hallucination patterns to avoid ### 2. Verification script (verify-agent-findings.py) - Validates quotes exist at claimed line numbers - Filters out 100% of hallucinations in testing - Supports markdown and JSON input formats - Fuzzy matching for minor formatting differences - Checks nearby lines (±2) for off-by-one errors - Outputs only verified findings with statistics ### 3. Updated voice-tone agent - Applied new anti-hallucination template - Now requires exact quotes for every finding - Added mandatory two-step extraction/analysis ## Testing Tested on hallucinated findings from voice-tone agent: - 3 fake findings submitted - 3 hallucinations detected (100%) - 0 findings output to user (correct) ## Files - .claude/agents/AGENT-PROMPT-TEMPLATE.md (new) - .claude/agents/voice-tone.md (updated) - .github/scripts/verify-agent-findings.py (new, 270 lines) - .github/scripts/verify-agent-findings.sh (placeholder) - .github/scripts/README.md (updated with verification docs) ## Next steps - Update remaining agents (terminology, punctuation) - Integrate verification into docs-review.yml workflow - Add JSON output to agents for easier parsing Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Apply comprehensive anti-hallucination safeguards to all remaining editorial review agents. ## Agents updated ### 1. terminology.md - Added critical anti-hallucination rules - Required exact quotes with context for all findings - Added two-step extraction/analysis process - Added 6-point self-verification checklist ### 2. punctuation.md - Added critical anti-hallucination rules - Updated output format to require exact quotes - Added mandatory verification before submission - Focus on verifiable issues only ### 3. clarity.md - Added critical anti-hallucination rules - Updated output format with exact quotes and context - Required HIGH confidence for all findings - Added self-verification checklist ### 4. docs-fix.md - Added anti-hallucination rules for fix application - Required verification that issues exist before fixing - Mandatory read-before-fix process - Prevents fixing hallucinated issues ## Consistency All agents now share: - Identical anti-hallucination rule structure - Consistent two-step extraction/analysis process - Same self-verification checklist - Training data prohibition - HIGH confidence requirement - Exact quote + context format ## Impact These updates ensure all editorial agents: - Only report issues that actually exist in files - Can be verified against source content - Maintain user trust - Work with the verification script pipeline ## Testing plan Next steps: 1. Test each agent on real files 2. Verify findings with verify-agent-findings.py 3. Monitor hallucination rates 4. Integrate into CI workflow Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
gavinelder
left a comment
There was a problem hiding this comment.
I just run this on an upgrade guidance doc I am writing and the results were significantly better.
Prior to this change it was attempting to add pre-req sections not relevant as it's an planning guide vs a technical how to along with adding content which was not relevant and adding content which was factually incorrect and trying to disagree with upstream release notes.
Brilliant, thanks for the validation and checking. It's been working far better locally so I think it's good to merge and then iterate on. :) |
Solution deployed:
Proven effectiveness:
This can be run locally or triggered in a PR using the slash command
/editorial-review