| --- |
| license: apple-amlr |
| base_model: |
| - mistralai/Mistral-7B-Instruct-v0.2 |
| tags: |
| - rag |
| - compression |
| - retrieval |
| - generation |
| --- |
| |
| # CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning |
|
|
| <div align="center"> |
| <img src="clara_logo.jpg" width="300"/> |
| </div> |
|
|
| <div align="center"> |
| <a href="https://arxiv.org/abs/2511.18659"><img src="https://img.shields.io/badge/arXiv-2511.18659-b31b1b.svg" alt="arXiv"></a> |
| <a href="https://arxiv.org/abs/2511.18659"><img src="https://img.shields.io/badge/Paper-PDF-red.svg" alt="Paper"></a> |
| <a href="https://github.com/apple/ml-clara"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg" alt="GitHub"></a> |
| </div> |
|
|
|
|
| # CLaRa-7B-Base (Compression-16 & 128) |
|
|
| The CLaRa-7B-Base model is our foundational unified RAG model with built-in semantic document compression (16× and 128x). |
| It provides a base compressor + generator capable of producing answers directly from compressed document representations. |
|
|
| **Training recipe:** Trained using QA-guided semantic compression and paraphrase consistency objectives. |
| **Benchmarks:** Strong baseline performance across multi-hop QA tasks under a 16× compression ratio. |
|
|
| --- |
|
|
| ## More details and usage examples: |
|
|
| Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) |
| GitHub: https://github.com/apple/ml-clara |
|
|
| --- |
|
|
| ## Example Usage |
|
|
| ```python |
| from transformers import AutoModel |
| |
| unirag = AutoModel.from_pretrained( |
| "/mnt/ceph_rbd/model/CLaRa-7B-Base/compression-16", |
| trust_remote_code=True |
| ).to("cuda") |
| |
| documents = [ |
| [ |
| "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...", |
| "Hagsatera is a genus of orchids native to Mexico and Guatemala...", |
| "Alsobia is a genus of flowering plants native to Mexico and Central America..." |
| ] |
| ] |
| |
| questions = [""] |
| |
| out = unirag.generate_from_paraphrase( |
| questions=questions, |
| documents=documents, |
| max_new_tokens=64 |
| ) |
| |
| print("Generated answer:", out) |