Instructions to use apple/CLaRa-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use apple/CLaRa-7B-Instruct with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("apple/CLaRa-7B-Instruct", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: unknown | |
| base_model: | |
| - mistralai/Mistral-7B-Instruct-v0.2 | |
| tags: | |
| - rag | |
| - compression | |
| - retrieval | |
| - instruction-tuned | |
| - generation | |
| library_name: transformers | |
| # CLaRa-7B-Instruct (Compression-16 & 128) | |
| The **CLaRa-7B-Instruct** model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x). | |
| It supports instruction-following QA directly from compressed document representations. | |
| **Training recipe:** Instruction tuning on QA-style tasks built on top of the base semantic compression model. | |
| **Benchmarks:** Strong instruction-following performance under 16× compression. | |
| --- | |
| ## More details and usage examples: | |
| Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) | |
| GitHub: https://github.com/apple/ml-clara | |
| --- | |
| ## Example Usage (Instruction-Tuned Inference) | |
| ```python | |
| from transformers import AutoModel | |
| unirag = AutoModel.from_pretrained( | |
| "/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16", | |
| trust_remote_code=True | |
| ).to("cuda") | |
| documents = [ | |
| [ | |
| "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...", | |
| "Hagsatera is a genus of flowering plants from the orchid family...", | |
| "Alsobia is a genus of flowering plants in the family Gesneriaceae..." | |
| ] | |
| ] | |
| questions = [ | |
| "Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?" | |
| ] | |
| # Instruction-tuned usage | |
| out = unirag.generate_from_text( | |
| questions=questions, | |
| documents=documents, | |
| max_new_tokens=64 | |
| ) | |
| print("Generated answer:", out) |