promptfoo/promptfoo
Promptfoo is a CLI and library that eliminates trial-and-error from LLM application development. Instead of manually testing prompts and hoping they work, you define test cases with expected outputs, run them against multiple models simultaneously, and get a pass/fail report with a visual dashboard. The tool supports every major provider — OpenAI, Anthropic, Azure, Bedrock, Ollama, and dozens more — so you can compare model performance side-by-side without rewriting code. Define your evaluations in YAML, run them from the terminal, and view results in a browser-based comparison UI. What sets promptfoo apart from other eval frameworks is its red-teaming capability. Beyond functional testing, it scans your LLM apps for security vulnerabilities: prompt injection, jailbreaks, PII leakage, and harmful content generation. This makes it both a quality assurance tool and a security scanner in one package. The developer experience is polished. Local-first execution means your data never leaves your machine. Built-in caching speeds up repeated runs. CI/CD integration lets you block deployments when prompt quality drops. And the PR review feature automatically flags LLM security issues in pull requests. With 16.4K GitHub stars, 398 releases, and active maintenance (latest release March 12, 2026), promptfoo has become the de facto standard for teams that take LLM output quality seriously.
Why It Matters
Most teams ship LLM features with zero systematic testing — they eyeball outputs and pray. Promptfoo turns prompt engineering from guesswork into a measurable engineering discipline with automated evaluations, regression testing, and security scanning. It's the missing QA layer for the AI-native stack.