Instructions to use apple/CLaRa-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use apple/CLaRa-7B-Instruct with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("apple/CLaRa-7B-Instruct", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apple-amlr | |
| base_model: | |
| - mistralai/Mistral-7B-Instruct-v0.2 | |
| tags: | |
| - rag | |
| - compression | |
| - retrieval | |
| - instruction-tuned | |
| - generation | |
| library_name: transformers | |
| # CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning | |
| <div align="center"> | |
| <img src="clara_logo.jpg" width="300"/> | |
| </div> | |
| <div align="center"> | |
| <a href="https://arxiv.org/abs/2511.18659"><img src="https://img.shields.io/badge/arXiv-2511.18659-b31b1b.svg" alt="arXiv"></a> | |
| <a href="https://arxiv.org/abs/2511.18659"><img src="https://img.shields.io/badge/Paper-PDF-red.svg" alt="Paper"></a> | |
| <a href="https://github.com/apple/ml-clara"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg" alt="GitHub"></a> | |
| </div> | |
| # CLaRa-7B-Instruct (Compression-16 & 128) | |
| The **CLaRa-7B-Instruct** model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x). | |
| It supports instruction-following QA directly from compressed document representations. | |
| **Training recipe:** Instruction tuning on QA-style tasks built on top of the base semantic compression model. | |
| **Benchmarks:** Strong instruction-following performance under 16× compression. | |
| --- | |
| ## More details and usage examples: | |
| Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) | |
| GitHub: https://github.com/apple/ml-clara | |
| Video (from @Fahd Mirza): https://youtu.be/al2VoAKn8GU?si=Q8bq7QNMaTvcArwa | |
| --- | |
| ## Example Usage (Instruction-Tuned Inference) | |
| ```python | |
| from transformers import AutoModel | |
| unirag = AutoModel.from_pretrained( | |
| "/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16", | |
| trust_remote_code=True | |
| ).to("cuda") | |
| documents = [ | |
| [ | |
| "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...", | |
| "Hagsatera is a genus of flowering plants from the orchid family...", | |
| "Alsobia is a genus of flowering plants in the family Gesneriaceae..." | |
| ] | |
| ] | |
| questions = [ | |
| "Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?" | |
| ] | |
| # Instruction-tuned usage | |
| out = unirag.generate_from_text( | |
| questions=questions, | |
| documents=documents, | |
| max_new_tokens=64 | |
| ) | |
| print("Generated answer:", out) |