apple
/

CLaRa-7B-Base

Model card Files Files and versions

CLaRa-7B-Base / README.md

probejie's picture

Update README.md

663c0c2 verified 7 months ago

|

2.04 kB

	---
	license: apple-amlr
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	tags:
	- rag
	- compression
	- retrieval
	- generation
	---

	# CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

	<div align="center">
	<img src="clara_logo.jpg" width="300"/>
	</div>

	<div align="center">
	<a href="https://arxiv.org/abs/2511.18659"><img src="https://img.shields.io/badge/arXiv-2511.18659-b31b1b.svg" alt="arXiv"></a>
	<a href="https://arxiv.org/abs/2511.18659"><img src="https://img.shields.io/badge/Paper-PDF-red.svg" alt="Paper"></a>
	<a href="https://github.com/apple/ml-clara"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg" alt="GitHub"></a>
	</div>


	# CLaRa-7B-Base (Compression-16 & 128)

	The CLaRa-7B-Base model is our foundational unified RAG model with built-in semantic document compression (16× and 128x).
	It provides a base compressor + generator capable of producing answers directly from compressed document representations.

	Training recipe: Trained using QA-guided semantic compression and paraphrase consistency objectives.
	Benchmarks: Strong baseline performance across multi-hop QA tasks under a 16× compression ratio.

	---

	## More details and usage examples:

	Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659)
	GitHub: https://github.com/apple/ml-clara

	---

	## Example Usage

	```python
	from transformers import AutoModel

	unirag = AutoModel.from_pretrained(
	"/mnt/ceph_rbd/model/CLaRa-7B-Base/compression-16",
	trust_remote_code=True
	).to("cuda")

	documents = [
	[
	"Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
	"Hagsatera is a genus of orchids native to Mexico and Guatemala...",
	"Alsobia is a genus of flowering plants native to Mexico and Central America..."
	]
	]

	questions = [""]

	out = unirag.generate_from_paraphrase(
	questions=questions,
	documents=documents,
	max_new_tokens=64
	)

	print("Generated answer:", out)