FreeChunker: A Cross-Granularity Chunking Framework
Paper • 2510.20356 • Published
How to use XiaSheng/FreeChunk-nomic with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("XiaSheng/FreeChunk-nomic", trust_remote_code=True)
model = AutoModel.from_pretrained("XiaSheng/FreeChunk-nomic", trust_remote_code=True)How to use XiaSheng/FreeChunk-nomic with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("XiaSheng/FreeChunk-nomic", trust_remote_code=True)
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]FreeChunker is a training-free embedding optimization method that dynamically chunks text to improve retrieval performance. This repository contains the FreeChunker model initialized with nomic-ai/nomic-embed-text-v1.5 embeddings.
nomic-ai/nomic-embed-text-v1.5 sentence embeddings.pip install torch transformers sentence-transformers numpy
from transformers import AutoModel
import torch
# 1. Load Model (UnifiedEncoder)
model = AutoModel.from_pretrained("XiaSheng/FreeChunk-nomic", trust_remote_code=True)
# 2. Build Vector Store from Text
text = "Your text..."
model.build_vector_store(text)
# 3. Query with Post-Aggregation (Default)
query = "Your query..."
results = model.query(query, top_k=1, aggregation_mode='post')
print(f"Query: {query}")
print(f"Result: {results}")
model.safetensors: The FreeChunker model weights.encoder.py: High-level interface (UnifiedEncoder) for end-to-end usage.sentenizer.py: Helper for text splitting and backbone embedding.aggregator.py: Helper for aggregating retrieved results.configuration_freechunker.py & modeling_freechunker.py: Model definition.If you use this model in your research, please cite:
@article{zhang2025freechunker,
title={FreeChunker: A Cross-Granularity Chunking Framework},
author={Zhang, Wenxuan and Jiang, Yuan-Hao and Wu, Yonghe},
journal={arXiv preprint arXiv:2510.20356},
year={2025}
}
Base model
nomic-ai/nomic-embed-text-v1.5
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("XiaSheng/FreeChunk-nomic", trust_remote_code=True) model = AutoModel.from_pretrained("XiaSheng/FreeChunk-nomic", trust_remote_code=True)