Zoonk.AI.Evals (Zoonk v0.1.0-dev)
View SourceLocal evaluation system for AI prompts.
This module provides functionality to evaluate AI prompts across multiple models to detect regressions and test new models as they become available.
Summary
Functions
Evaluate a model for a specific prompt.
It's meant to test model capabilities for a specific prompt. This will usually test a small set of test cases (e.g. 3-5).
Examples
iex> evaluate_model(:recommend_courses, "openai/gpt-4.1")
:ok
Evaluate a prompt.
It's meant to test a prompt's performance across a larger set of test cases (e.g. 20-50).
Examples
iex> evaluate_prompt(:recommend_courses, "openai/gpt-4.1")
:ok