Show expected vs actual worldview content in read eval reports #12

tcdent · 2026-01-19T22:09:46Z

Add generated_worldview_content field to EvalResult to capture CLI output
Update report to display both expected (predefined) and actual (CLI-generated)
worldview content when CLI mode is used
Include generated content in JSON results for analysis
Update agent TASK_INSTRUCTIONS to encode only explicitly stated facts,
avoiding supplementary knowledge from training data for token efficiency

- Add generated_worldview_content field to EvalResult to capture CLI output - Update report to display both expected (predefined) and actual (CLI-generated) worldview content when CLI mode is used - Include generated content in JSON results for analysis - Update agent TASK_INSTRUCTIONS to encode only explicitly stated facts, avoiding supplementary knowledge from training data for token efficiency

- Create evals/common/ for shared code (config, llm_clients) - Create evals/read_eval/ for read evaluation (tests LLM response to context) - Keep evals/write_eval/ for write evaluation (tests document generation) - Update all imports to use new module paths - Remove expected worldview content from report (only show CLI-generated) - Add proper __init__.py exports for all submodules This structure prepares for adding additional evaluation types in the future.

claude added 2 commits January 19, 2026 22:07

tcdent merged commit 10dff82 into main Jan 19, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show expected vs actual worldview content in read eval reports #12

Show expected vs actual worldview content in read eval reports #12

Uh oh!

tcdent commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Show expected vs actual worldview content in read eval reports #12

Show expected vs actual worldview content in read eval reports #12

Uh oh!

Conversation

tcdent commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants