Skip to content

Comments

feat(ci): add initial eval suites#238

Merged
Hunter Lovell (hntrl) merged 4 commits intomainfrom
hunter/evals
Feb 19, 2026
Merged

feat(ci): add initial eval suites#238
Hunter Lovell (hntrl) merged 4 commits intomainfrom
hunter/evals

Conversation

@hntrl
Copy link
Member

Summary

Add a LangSmith-backed eval framework for deepagents using the langsmith/vitest integration. This provides a reusable harness (runAgent, custom matchers, AgentTrajectory types) and an initial suite of evals covering file operations, subagent delegation, and system prompt customization.

Port of https://github.com/langchain-ai/deepagents/pull/1400/changes

Changes

New: Eval harness (libs/deepagents/src/evals/index.ts)

  • runAgent() helper that invokes a deepagent with a user query + optional pre-seeded files and returns a structured AgentTrajectory
  • Custom vitest matchers: toHaveAgentSteps, toHaveToolCallRequests, toHaveToolCallInStep, toHaveFinalTextContaining
  • All matchers log feedback to LangSmith automatically for experiment tracking
  • Shared default agent instance using anthropic:claude-sonnet-4-5-20250929

New: Eval suites

  • File operations (11 tests) — read, write, parallel read/write, edit/replace, ls directory, grep, glob, derived output, and avoiding unnecessary tool calls
  • Subagents (2 tests) — delegating to a named custom subagent and to the general-purpose subagent
  • System prompt (1 test) — verifying custom system prompt behavior

Config changes

  • vitest.config.ts — new eval mode with 120s timeout, langsmith/vitest/reporter, and *.eval.test.ts include pattern; default mode now excludes *.eval.test.ts
  • package.json — new test:eval script; added @langchain/anthropic and langsmith dev dependencies
  • eslint.config.ts — disabled @typescript-eslint/no-empty-object-type globally and for **/evals/** files

@changeset-bot
Copy link

changeset-bot bot commented Feb 18, 2026

⚠️ No Changeset found

Latest commit: 6f2c5c9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@hntrl Hunter Lovell (hntrl) enabled auto-merge (squash) February 19, 2026 00:06
@hntrl Hunter Lovell (hntrl) merged commit 073d206 into main Feb 19, 2026
13 of 14 checks passed
@hntrl Hunter Lovell (hntrl) deleted the hunter/evals branch February 19, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant