SentenceTransformer

This model was finetuned with Unsloth.

based on unsloth/Qwen3-Embedding-4B

This is a sentence-transformers model finetuned from unsloth/Qwen3-Embedding-4B. It maps sentences & paragraphs to a 2560-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: unsloth/Qwen3-Embedding-4B
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 2560 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 2560, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '37 ku-ta-ni 34 {túg}šu-ru-tum a-na lá-qé-pí-im áp-qí-id IGI i-dí-{d}IŠKUR IGI i-dí-a-šur DUMU sú-e-ta-ta 7 TÚG.HI.A ša li-wi-tim IGI a-šùr-SIPA a-dí-šu-um',
    "37 -textiles (and) 34 dark textiles I entrusted to Lā-qēpum in the presence of Iddin-Adad and of Iddin-Aššur, son of Suettata. 7 textiles for wrapping I gave him in the presence of Aššur-rē'ī.",
    'To Ešarra and Ab-šalim from Ennam-Aššur: 10 shekels of silver and an undergarment sealed by me is for Ešarra. 10 shekels of silver and 2 sashes are for Ab-šalim and the girl. 2 shekels of silver is for <big_gap> sister Ištar-lamassī <big_gap>',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 2560]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7541, 0.0110],
#         [0.7541, 1.0000, 0.0221],
#         [0.0110, 0.0221, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9124
spearman_cosine 0.8625

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,137 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 13 tokens
    • mean: 229.79 tokens
    • max: 579 tokens
    • min: 7 tokens
    • mean: 137.38 tokens
    • max: 442 tokens
  • Samples:
    anchor positive
    1 ma-na KÙ.BABBAR big_gap 4 {túg}ku-ta-ni big_gap ni-ik-na-x- big_gap KÙ.BABBAR a-ha-ma big_gap 2 GÍN KÙ.BABBAR big_gap ša áb-na-tim kà-ú-nam big_gap áb-na-tim big_gap uk-ta-in big_gap KÙ.BABBAR i-za-az big_gap ŠU.NÍGIN 1 ma-na 2 GÍN KÙ.BABBAR i li-bi big_gap IGI pì-lá-ah- big_gap IGI a-šur-na-da šu-ma KÙ.BABBAR a-na big_gap lá iš-ta-qá-al big_gap iš-tù ha-mu-uš-tim ša a-šur-be-el-a-wa-tim 1 ma-na-um 3 GÍN.TA ṣí-ib-tám ú-ṣa-áb i-na ITU.KAM a-ma-nu-šu-um 1 mina of silver 4 kutānu-textiles silver; further, 2 shekels of silver of the stones confirm He has confirmed the stones. The silver stands ready. In all: 1 mina 2 shekels of silver is owed by Witnessed by Pilah- , by Aššur-nādā. If he has not paid the silver in I shall count interest for him reckoned from the week of Aššur-bēl-awātim at the rate 3 shekels per mina per month.
    ŠU.NÍGIN KÙ.BABBAR-pì-kà 15 ma-na 10 GÍN lu ša AN.NA ú ṣú-ba-tí-kà ku-nu-ki-ni ṣí-li-a na-áš-a-ku-um Total of your silver: 15 minas 10 shekels, Ṣilliya brings you under our seal - both that from the tin and that from your textiles.
    1 ma-na 7.5 GÍN KÙ.BABBAR ṣa-ru-pá-am i-ṣé-er a-mur-IŠTAR DUMU da-da e-la-ma i-šu iš-tù ha-muš-tim ša a-la-hi-im ú {d}MAR.TU-ba-ni a-na 11 ha-am-ša-tim i-ša-qal šu-ma lá iš-qú-ul 1½ GÍN.TA ṣí-ib-tám a-na ma-na-im i-na ITU.1.KAM ú-ṣa-áb ITU.KAM ša sà-ra-tim li-mu-um ša qá-té DINGIR-šu-GAL DUMU ba-zi-a IGI im-dí-lim DUMU šu-lá-ba-an IGI e-me-me-i DUMU a-zu-ta-a 1 mina 7.5 shekels of refined silver Āmur-Ištar, son of Dada, owes to Elamma. From the week of Ali-ahum and Amurrum-bāni he will pay in 11 weeks; if he does not pay he will add 1.5 shekel as interest per mina per month. Month II, eponymy of the successor of Ilšu-rabi, son of Baziya. In the presence of Imdī-ilum, son of Šu-Labān, of Ememe'i, son of Azutaya.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_pin_memory: False
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss akkadian_val_spearman_cosine
0.1389 5 2.402 -
0.2778 10 2.3992 -
0.4167 15 2.1648 -
0.5556 20 1.8975 -
0.6944 25 1.4115 0.7776
0.8333 30 1.0211 -
0.9722 35 0.6742 -
1.1111 40 0.4176 -
1.25 45 0.2966 -
1.3889 50 0.2419 0.8580
1.5278 55 0.2028 -
1.6667 60 0.1523 -
1.8056 65 0.1445 -
1.9444 70 0.106 -
2.0833 75 0.0906 0.8614
2.2222 80 0.1198 -
2.3611 85 0.0625 -
2.5 90 0.1019 -
2.6389 95 0.0474 -
2.7778 100 0.0945 0.8625
2.9167 105 0.1227 -

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.9.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Thermostatic/qwen3-4b-embeddings-akkadian

Finetuned
(6)
this model

Papers for Thermostatic/qwen3-4b-embeddings-akkadian

Evaluation results