# Benchmark Policy

Confluence should benchmark open models before they are integrated into production research workflows.

## Required Evaluation Axes

- retrieval relevance
- reranking accuracy
- extraction precision
- hallucination rate
- citation faithfulness
- latency
- memory footprint
- governance fit

## Safety Rule

No model in this registry is a treatment authority. Models are support tools for evidence retrieval, organization, and non-clinical reasoning.

## Promotion Rule

A candidate may move from `benchmark` to `approved` only if it:

- improves a defined workflow metric
- stays within acceptable cost and latency bounds
- does not degrade citation faithfulness
- does not encourage unsupported clinical claims