Zoonk.AI.Evals (Zoonk v0.1.0-dev)

View Source

Local evaluation system for AI prompts.

This module provides functionality to evaluate AI prompts across multiple models to detect regressions and test new models as they become available.

Summary

Functions

Evaluate a model for a specific prompt.

Evaluate a prompt.

Functions

evaluate_model(prompt, model)

@spec evaluate_model(atom(), String.t()) :: :ok

Evaluate a model for a specific prompt.

It's meant to test model capabilities for a specific prompt. This will usually test a small set of test cases (e.g. 3-5).

Examples

iex> evaluate_model(:recommend_courses, "openai/gpt-4.1")
:ok

evaluate_prompt(prompt, model)

@spec evaluate_prompt(atom(), String.t()) :: :ok

Evaluate a prompt.

It's meant to test a prompt's performance across a larger set of test cases (e.g. 20-50).

Examples

iex> evaluate_prompt(:recommend_courses, "openai/gpt-4.1")
:ok