Zoonk.AI.Evals.EvalCase behaviour (Zoonk v0.1.0-dev)

Behaviour for AI evaluation cases.

Callbacks

model()

Test cases for evaluating models.

prompt()

Test cases for evaluating prompts.

@callback model() :: [map()]

Test cases for evaluating models.

Create a list of 3-5 test cases that can be used to evaluate a prompt across different models.

@callback prompt() :: [map()]

Test cases for evaluating prompts.

Create an extensive list of test cases that can be used to evaluate a prompt in different languages and contexts.

We run this when we make a prompt change to check for regressions in the prompt's performance.