Agent Evaluation Field Notes

Agent Evaluation Field Notes is a compact public collection of scorecard templates, trajectory replay checklists, RAG guardrail smoke tests, small datasets, and demo launch pages for evaluating tool-using AI agents.

The goal is practical repeatability. These artifacts are intentionally small enough to copy into a real engineering workflow and concrete enough to support review during agent rollout decisions.

Public Anchors

What Is Included

  • Operational scorecard patterns for tool-using agents.

  • Trajectory replay notes for debugging agent regressions.

  • RAG guardrail smoke tests for prompt injection and vector poisoning checks.

  • Starter artifacts for cloud, notebook, data, docs, and developer platforms.

  • Small public datasets that can be reused in demos and documentation.

Repository Layout

starter-artifacts/scorecard-api

FastAPI scorecard endpoint that returns structured evaluation examples.

starter-artifacts/static-site

Static public landing page for the field-notes surface.

starter-artifacts/notebook

Notebook-style markdown walkthrough for scorecard analysis.

starter-artifacts/dataset

Small CSV and metadata files for platform submissions.

starter-artifacts/docs

Reusable documentation notes and submission copy.