# Benchmark Policy Confluence should benchmark open models before they are integrated into production research workflows. ## Required Evaluation Axes - retrieval relevance - reranking accuracy - extraction precision - hallucination rate - citation faithfulness - latency - memory footprint - governance fit ## Safety Rule No model in this registry is a treatment authority. Models are support tools for evidence retrieval, organization, and non-clinical reasoning. ## Promotion Rule A candidate may move from `benchmark` to `approved` only if it: - improves a defined workflow metric - stays within acceptable cost and latency bounds - does not degrade citation faithfulness - does not encourage unsupported clinical claims