--- license: apache-2.0 language: en tags: - sentence-transformers - citation-recommendation - academic - feature-extraction library_name: sentence-transformers pipeline_tag: sentence-similarity datasets: - custom --- # MiniLM Citation v4 A sentence-transformer model fine-tuned for academic citation recommendation. Given a passage of academic writing, this model finds the most relevant papers to cite. ## Model Details - **Base model**: [microsoft/MiniLM-L6-v2](https://huggingface.co/microsoft/MiniLM-L6-v2) (via all-MiniLM-L6-v2) - **Dimensions**: 384 (Matryoshka: 128/256/384) - **Training**: CachedMultipleNegativesRankingLoss with hard negatives - **Training data**: 64K citation context → cited paper pairs from academic papers ## Performance Evaluated on a benchmark of 3,420 citation contexts across 100 source papers: | Method | MRR | R@1 | R@10 | |--------|-----|-----|------| | MiniLM-FT v4 (neural) | 0.400 | 27.5% | 72.0% | | MiniLM-FT v4 (hybrid) | 0.428 | 30.2% | 74.8% | | Cloud model (hybrid) | 0.550 | 38.5% | 88.5% | For best results, use with [inCite](https://github.com/galenphall/incite) in hybrid mode (neural + BM25 fusion). ## Usage ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("galenphall/minilm-citation-v4") # Embed a query (your writing) query_embedding = model.encode("The relationship between CO2 emissions and temperature...") # Embed papers as: "title. authors. year. journal. abstract" paper_embedding = model.encode("Global Warming Effects. Smith and Jones. 2023. Nature. We study...") ``` Or use with inCite directly: ```bash pip install incite incite setup incite recommend "your text here" -k 10 ``` ## Citation ```bibtex @software{incite2025, author = {Hall, Galen}, title = {inCite: Local-First Citation Recommendation}, year = {2025}, url = {https://github.com/galenphall/incite}, license = {Apache-2.0} } ```