Skip to content

Conversation

@tcdent
Copy link
Owner

@tcdent tcdent commented Jan 19, 2026

  • Add generated_worldview_content field to EvalResult to capture CLI output
  • Update report to display both expected (predefined) and actual (CLI-generated)
    worldview content when CLI mode is used
  • Include generated content in JSON results for analysis
  • Update agent TASK_INSTRUCTIONS to encode only explicitly stated facts,
    avoiding supplementary knowledge from training data for token efficiency

- Add generated_worldview_content field to EvalResult to capture CLI output
- Update report to display both expected (predefined) and actual (CLI-generated)
  worldview content when CLI mode is used
- Include generated content in JSON results for analysis
- Update agent TASK_INSTRUCTIONS to encode only explicitly stated facts,
  avoiding supplementary knowledge from training data for token efficiency
- Create evals/common/ for shared code (config, llm_clients)
- Create evals/read_eval/ for read evaluation (tests LLM response to context)
- Keep evals/write_eval/ for write evaluation (tests document generation)
- Update all imports to use new module paths
- Remove expected worldview content from report (only show CLI-generated)
- Add proper __init__.py exports for all submodules

This structure prepares for adding additional evaluation types in the future.
@tcdent tcdent merged commit 10dff82 into main Jan 19, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants