--- language: - en license: cc-by-nc-4.0 tags: - mteb - sentence-transformers - embedding - text-embedding - ogma - axiotic - matryoshka - small-model model-index: - name: ogma-large results: - task: type: Classification dataset: type: mteb/AmazonCounterfactualClassification name: MTEB AmazonCounterfactualClassification config: default split: test revision: 1f7e6a9d6fa6e64c53d146e428565640410c0df1 metrics: - type: accuracy value: 72.85 - task: type: Classification dataset: type: mteb/AmazonPolarityClassification name: MTEB AmazonPolarityClassification config: default split: test revision: e2d317d38cd51312af73b3d32a06d1a08b442046 metrics: - type: accuracy value: 83.51 - task: type: Classification dataset: type: mteb/AmazonReviewsClassification name: MTEB AmazonReviewsClassification config: default split: test revision: 6b5d328eaae8ef408dd7d775040245cf86f92e9d metrics: - type: accuracy value: 39.85 - task: type: Clustering dataset: type: mteb/BiorxivClusteringP2P name: MTEB BiorxivClusteringP2P config: default split: test revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 metrics: - type: v_measure value: 34.84 - task: type: Clustering dataset: type: mteb/BiorxivClusteringS2S name: MTEB BiorxivClusteringS2S config: default split: test revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 metrics: - type: v_measure value: 27.02 - task: type: Retrieval dataset: type: mteb/CQADupstackAndroidRetrieval name: MTEB CQADupstackAndroidRetrieval config: default split: test revision: 9be4c0e46342e8e3aff577a89b9a1ec9bc6b4af3 metrics: - type: ndcg_at_10 value: 38.98 - task: type: Retrieval dataset: type: mteb/CQADupstackEnglishRetrieval name: MTEB CQADupstackEnglishRetrieval config: default split: test revision: ad9991cb51e31e31e430383c75ffb2885547b5f0 metrics: - type: ndcg_at_10 value: 39.78 - task: type: Retrieval dataset: type: mteb/CQADupstackGamingRetrieval name: MTEB CQADupstackGamingRetrieval config: default split: test revision: 4885aa143210c98657558c04aaf3dc47cfb54340 metrics: - type: ndcg_at_10 value: 48.24 - task: type: Retrieval dataset: type: mteb/CQADupstackGisRetrieval name: MTEB CQADupstackGisRetrieval config: default split: test revision: 5003b3064772da1887988e05400cf3806fe491f2 metrics: - type: ndcg_at_10 value: 33.09 - task: type: Retrieval dataset: type: mteb/CQADupstackMathematicaRetrieval name: MTEB CQADupstackMathematicaRetrieval config: default split: test revision: 90fceea13679c63fe563ded68f3b6f06e50061de metrics: - type: ndcg_at_10 value: 25.36 - task: type: Retrieval dataset: type: mteb/CQADupstackPhysicsRetrieval name: MTEB CQADupstackPhysicsRetrieval config: default split: test revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4 metrics: - type: ndcg_at_10 value: 38.02 - task: type: Retrieval dataset: type: mteb/CQADupstackProgrammersRetrieval name: MTEB CQADupstackProgrammersRetrieval config: default split: test revision: 6184bc1440d2dbc7612be22b50686b8826d22b32 metrics: - type: ndcg_at_10 value: 36.42 - task: type: Retrieval dataset: type: mteb/CQADupstackRetrieval name: MTEB CQADupstackRetrieval config: default split: test revision: '1' metrics: - type: ndcg_at_10 value: 33.61 - task: type: Retrieval dataset: type: mteb/CQADupstackStatsRetrieval name: MTEB CQADupstackStatsRetrieval config: default split: test revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a metrics: - type: ndcg_at_10 value: 28.07 - task: type: Retrieval dataset: type: mteb/CQADupstackTexRetrieval name: MTEB CQADupstackTexRetrieval config: default split: test revision: 46989137a86843e03a6195de44b09deda022eec7 metrics: - type: ndcg_at_10 value: 23.29 - task: type: Retrieval dataset: type: mteb/CQADupstackUnixRetrieval name: MTEB CQADupstackUnixRetrieval config: default split: test revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53 metrics: - type: ndcg_at_10 value: 32.78 - task: type: Retrieval dataset: type: mteb/CQADupstackWebmastersRetrieval name: MTEB CQADupstackWebmastersRetrieval config: default split: test revision: 160c094312a0e1facb97e55eeddb698c0abe3571 metrics: - type: ndcg_at_10 value: 32.9 - task: type: Retrieval dataset: type: mteb/CQADupstackWordpressRetrieval name: MTEB CQADupstackWordpressRetrieval config: default split: test revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4 metrics: - type: ndcg_at_10 value: 26.42 - task: type: Retrieval dataset: type: mteb/ClimateFEVER name: MTEB ClimateFEVER config: default split: test revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380 metrics: - type: ndcg_at_10 value: 24.91 - task: type: Retrieval dataset: type: mteb/DBPedia name: MTEB DBPedia config: default split: test revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659 metrics: - type: ndcg_at_10 value: 37.55 - task: type: Classification dataset: type: mteb/EmotionClassification name: MTEB EmotionClassification config: default split: test revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 metrics: - type: accuracy value: 48.29 - task: type: Retrieval dataset: type: mteb/FEVER name: MTEB FEVER config: default split: test revision: bea83ef9e8fb933d90a2f1d5515737465d613e12 metrics: - type: ndcg_at_10 value: 59.78 - task: type: Retrieval dataset: type: mteb/HotpotQA name: MTEB HotpotQA config: default split: test revision: ab518f4d6fcca38d87c25209f94beba119d02014 metrics: - type: ndcg_at_10 value: 55.46 - task: type: Retrieval dataset: type: mteb/MSMARCO name: MTEB MSMARCO config: default split: test revision: c5a29a104738b98a9e76336939199e264163d4a0 metrics: - type: ndcg_at_10 value: 0 - task: type: Classification dataset: type: mteb/MTOPIntentClassification name: MTEB MTOPIntentClassification config: default split: test revision: 2992d820f31312593c49a4890430aadadb0f0039 metrics: - type: accuracy value: 64.35 - task: type: Clustering dataset: type: mteb/MedrxivClusteringP2P name: MTEB MedrxivClusteringP2P config: default split: test revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 metrics: - type: v_measure value: 32.32 - task: type: Clustering dataset: type: mteb/MedrxivClusteringS2S name: MTEB MedrxivClusteringS2S config: default split: test revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 metrics: - type: v_measure value: 29.07 - task: type: Reranking dataset: type: mteb/MindSmallReranking name: MTEB MindSmallReranking config: default split: test revision: 227478e3235572039f4f7661840e059f31ef6eb1 metrics: - type: map value: 30.61 - task: type: Retrieval dataset: type: mteb/NFCorpus name: MTEB NFCorpus config: default split: test revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 metrics: - type: ndcg_at_10 value: 31.98 - task: type: Retrieval dataset: type: mteb/NQ name: MTEB NQ config: default split: test revision: b774495ed302d8c44a3a7ea25c90dbce03968f31 metrics: - type: ndcg_at_10 value: 54.65 - task: type: Retrieval dataset: type: mteb/QuoraRetrieval name: MTEB QuoraRetrieval config: default split: test revision: e4e08e0b7dbe3c8700f0daef558ff32256715259 metrics: - type: ndcg_at_10 value: 61.89 - task: type: Clustering dataset: type: mteb/RedditClustering name: MTEB RedditClustering config: default split: test revision: 24640382cdbf8abc73003fb0fa6d111a705499eb metrics: - type: v_measure value: 44.56 - task: type: Clustering dataset: type: mteb/RedditClusteringP2P name: MTEB RedditClusteringP2P config: default split: test revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 metrics: - type: v_measure value: 54.14 - task: type: Retrieval dataset: type: mteb/SCIDOCS name: MTEB SCIDOCS config: default split: test revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88 metrics: - type: ndcg_at_10 value: 17.07 - task: type: STS dataset: type: mteb/SICK-R name: MTEB SICK-R config: default split: test revision: 20a6d6f312dd54037fe07a32d58e5e168867909d metrics: - type: cosine_spearman value: 82.07 - task: type: STS dataset: type: mteb/STS12 name: MTEB STS12 config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cosine_spearman value: 78.29 - task: type: STS dataset: type: mteb/STS13 name: MTEB STS13 config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cosine_spearman value: 85.41 - task: type: STS dataset: type: mteb/STS14 name: MTEB STS14 config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cosine_spearman value: 82.62 - task: type: STS dataset: type: mteb/STS15 name: MTEB STS15 config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cosine_spearman value: 86.73 - task: type: STS dataset: type: mteb/STS16 name: MTEB STS16 config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cosine_spearman value: 83.84 - task: type: STS dataset: type: mteb/STSBenchmark name: MTEB STSBenchmark config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cosine_spearman value: 87.32 - task: type: Reranking dataset: type: mteb/SciDocsRR name: MTEB SciDocsRR config: default split: test revision: 39b8377811871075eed9de3b8a7e21aaa6acb3d8 metrics: - type: map value: 75.52 - task: type: Retrieval dataset: type: mteb/SciFact name: MTEB SciFact config: default split: test revision: d56462d0e63a25450459c4f213e49ffdb866f7f9 metrics: - type: ndcg_at_10 value: 63.03 - task: type: PairClassification dataset: type: mteb/SprintDuplicateQuestions name: MTEB SprintDuplicateQuestions config: default split: test revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 metrics: - type: cosine_ap value: 94.59 - task: type: Clustering dataset: type: mteb/StackExchangeClustering name: MTEB StackExchangeClustering config: default split: test revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 metrics: - type: v_measure value: 51.77 - task: type: Clustering dataset: type: mteb/StackExchangeClusteringP2P name: MTEB StackExchangeClusteringP2P config: default split: test revision: 815ca46b2622cec33ccafc3735d572c266efdb44 metrics: - type: v_measure value: 34.23 - task: type: Reranking dataset: type: mteb/StackOverflowDupQuestions name: MTEB StackOverflowDupQuestions config: default split: test revision: 5debda000fe8e27ebb5c123d38081f92e1847a59 metrics: - type: map value: 45.15 - task: type: Summarization dataset: type: mteb/SummEval name: MTEB SummEval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cosine_spearman value: 30.93 - task: type: Retrieval dataset: type: mteb/TRECCOVID name: MTEB TRECCOVID config: default split: test revision: bb9466bac8153a0349341eb1b22e06409e78ef4e metrics: - type: ndcg_at_10 value: 72.99 - task: type: Retrieval dataset: type: mteb/Touche2020 name: MTEB Touche2020 config: default split: test revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f metrics: - type: ndcg_at_10 value: 28.12 - task: type: Classification dataset: type: mteb/ToxicConversationsClassification name: MTEB ToxicConversationsClassification config: default split: test revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de metrics: - type: accuracy value: 65.79 - task: type: Classification dataset: type: mteb/TweetSentimentExtractionClassification name: MTEB TweetSentimentExtractionClassification config: default split: test revision: d604517c81ca91fe16a244d1248fc021f9ecee7a metrics: - type: accuracy value: 62.34 - task: type: Clustering dataset: type: mteb/TwentyNewsgroupsClustering name: MTEB TwentyNewsgroupsClustering config: default split: test revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 metrics: - type: v_measure value: 41.53 - task: type: PairClassification dataset: type: mteb/TwitterSemEval2015 name: MTEB TwitterSemEval2015 config: default split: test revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 metrics: - type: cosine_ap value: 71.88 - task: type: PairClassification dataset: type: mteb/TwitterURLCorpus name: MTEB TwitterURLCorpus config: default split: test revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf metrics: - type: cosine_ap value: 85.53 --- # ogma-large **32.37M parameter text embedding model** by [Axiotic AI](https://axiotic.ai), achieving **57.38 average** on MTEB English (66/66 tasks). 9-layer transformer, 512 hidden dim, mean pooling — strongest overall model. ## Highlights - **57.38 MTEB average** on the standard 66-task MTEB English benchmark - **Matryoshka embeddings** — dimensions [32, 64, 128, 256] for flexible storage/compute tradeoffs - **Symmetric routing** — task tokens `[QRY]`, `[DOC]`, `[SYM]`; **recommended: `[QRY]`/`[QRY]`** (highest MTEB), with `[SYM]` everywhere as the next-best alternative. `[DOC]` is exposed for downstream fine-tuning and is **not recommended at inference**. - **1024 token context** — handles longer passages than typical small models - **HuggingFace Hub** — load directly, no local package installation needed ## Quick Start ```python import torch from huggingface_hub import snapshot_download import sys, yaml # Download model from HuggingFace model_path = snapshot_download("axiotic/ogma-large") sys.path.insert(0, model_path) from ogma_model import OgmaModel from config import OgmaConfig, TaskToken from tokenizer import OgmaTokenizer # Load model with open(f"{model_path}/config.yaml") as f: cfg = yaml.safe_load(f) config = OgmaConfig.from_dict(cfg) model = OgmaModel(config) state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True) model.load_state_dict(state) model.eval() # Load tokenizer tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json") # Encode text sentences = ["The quick brown fox", "A fast auburn canine"] enc = tokenizer.batch_encode(sentences, max_length=1024) ids = torch.tensor(enc["input_ids"]) mask = torch.tensor(enc["attention_mask"]) with torch.no_grad(): embs = model.encode(ids, mask, task=TaskToken.SYM) # Cosine similarity sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0) print(f"Similarity: {sim.item():.4f}") print(f"Shape: {embs.shape}") # (2, 256) ``` ## Retrieval (Symmetric Routing) Ogma is trained for **symmetric routing** — encode queries and documents with the **same** task token. **The recommended route is `[QRY]`/`[QRY]`** (both sides use `TaskToken.QRY`); this benchmarked highest on MTEB. `[SYM]` everywhere is the next-best symmetric alternative — try it on your data if you want to compare. **`[DOC]` is not recommended at inference** — it is exposed for downstream fine-tuning, not as an asymmetric query/document route. ```python queries = ["What is machine learning?"] documents = ["ML is a subset of AI...", "The weather is sunny today"] q_enc = tokenizer.batch_encode(queries, max_length=1024) d_enc = tokenizer.batch_encode(documents, max_length=1024) with torch.no_grad(): # Symmetric: both queries and documents use TaskToken.QRY (not a typo). # Swap TaskToken.QRY → TaskToken.SYM on both sides to try the SYM route instead. q_embs = model.encode(torch.tensor(q_enc["input_ids"]), torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY) d_embs = model.encode(torch.tensor(d_enc["input_ids"]), torch.tensor(d_enc["attention_mask"]), task=TaskToken.QRY) scores = q_embs @ d_embs.T print(f"Relevance scores: {scores}") ``` ## Matryoshka Dimensionality Reduction ```python full = model.encode(ids, mask, task=TaskToken.SYM) # (256d) small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1) # (32d) ``` ## Architecture | Component | Details | |-----------|---------| | Parameters | 32.37M | | Layers | 9 | | Hidden dim | 512 | | Output dim | 256 | | Heads | 8 | | Max seq len | 1024 | | Matryoshka | [32, 64, 128, 256] | | Pooling | Mean | | Positional | RoPE | | FFN | SwiGLU | | Tokenizer | SentencePiece Unigram (30K) | ## MTEB Results (66/66 tasks) | Category | ogma-large | |----------|------------| | Classification | 68.4 | | Clustering | 41.6 | | PairClassification | 84.0 | | Reranking | 53.1 | | Retrieval | 43.7 | | STS | 83.7 | | Summarization | 30.9 | | **Overall** | **57.38** | Benchmarked with MTEB v2.10.7 on the standard 66-task English benchmark using category averaging (same methodology as the MTEB leaderboard). ## Ogma Model Family | Model | Params | MTEB-66 | Best For | |-------|--------|---------|----------| | [ogma-large](https://huggingface.co/axiotic/ogma-large) | 32.37M | 57.38 | Maximum quality | | [ogma-base](https://huggingface.co/axiotic/ogma-base) | 13.32M | 56.54 | General purpose | | [ogma-small](https://huggingface.co/axiotic/ogma-small) | 8.60M | 55.79 | Best sub-10M | | [ogma-mini](https://huggingface.co/axiotic/ogma-mini) | 3.51M | 51.42 | Edge deployment | | [ogma-micro](https://huggingface.co/axiotic/ogma-micro) | 2.32M | 49.77 | Extreme edge | ## License This model is licensed under [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/). Commercial use requires a separate license from Axiotic AI. CC-BY-NC-4.0