---
language:
- multilingual
license: cc-by-nc-4.0
tags:
- sentence-transformers
- cross-encoder
- reranker
- generated_from_trainer
- dataset_size:13717
- loss:BinaryCrossEntropyLoss
base_model: jinaai/jina-reranker-v2-base-multilingual
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
- map
- mrr@10
- ndcg@10
model-index:
- name: cometadata/jina-reranker-v2-multilingual-affiliations
results:
- task:
type: cross-encoder-reranking
name: Cross Encoder Reranking
dataset:
name: affiliation val
type: affiliation-val
metrics:
- type: map
value: 0.9294
name: Map
- type: mrr@10
value: 0.9294
name: Mrr@10
- type: ndcg@10
value: 0.9564
name: Ndcg@10
---
# cometadata/jina-reranker-v2-multilingual-affiliations
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [jinaai/jina-reranker-v2-base-multilingual](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
## Model Details
### Model Description
- **Model Type:** Cross Encoder
- **Base model:** [jinaai/jina-reranker-v2-base-multilingual](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual)
- **Maximum Sequence Length:** 1024 tokens
- **Number of Output Labels:** 1 label
- **Language:** multilingual
- **License:** cc-by-nc-4.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cometadata/jina-reranker-v2-multilingual-affiliations")
# Get scores for pairs of texts
pairs = [
["Centre sur le handicap et l'intégration, School of Economics and Political Science Université de Saint‐Gall", "College of Saint Benedict and Saint John's University, Collegeville, MN, United States"],
['Swiss Federal Institute of Technology (ETH) Zurich, Institute of Quantum Electronics, Laser Spectroscopy and Sensing Laboratory, Hoenggerberg, HPF D19, CH-8093\u2009Zurich, Switzerland', 'Laboratory of Crystallography, ETH Zurich, CH-8093 Zurich, Switzerland'],
['Swiss Federal Institute of Technology (ETH) Zurich, Institute of Quantum Electronics, Laser Spectroscopy and Sensing Laboratory, Hoenggerberg, HPF D19, CH-8093\u2009Zurich, Switzerland', "Laboratoire d'Electrochimie Physique et Analytique, École Polytechnique Fédérale de Lausanne Station 6, CH-1015 Lausanne, Switzerland"],
['Institute for Advanced Study, Technische Universität München 2 , Lichtenbergstr. 2a, D-85748 Garching, Germany', 'Department of Surgery, Technical University of Munich, School of Medicine, Munich, Germany'],
['Institute for Advanced Study, Technische Universität München 2 , Lichtenbergstr. 2a, D-85748 Garching, Germany', 'Lehrstuhl für BioMolekulare Optik, Ludwig-Maximilians-Universität München, Oettingenstrasse 67, 80538 München (Germany)'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
"Centre sur le handicap et l'intégration, School of Economics and Political Science Université de Saint‐Gall",
[
"College of Saint Benedict and Saint John's University, Collegeville, MN, United States",
'Laboratory of Crystallography, ETH Zurich, CH-8093 Zurich, Switzerland',
"Laboratoire d'Electrochimie Physique et Analytique, École Polytechnique Fédérale de Lausanne Station 6, CH-1015 Lausanne, Switzerland",
'Department of Surgery, Technical University of Munich, School of Medicine, Munich, Germany',
'Lehrstuhl für BioMolekulare Optik, Ludwig-Maximilians-Universität München, Oettingenstrasse 67, 80538 München (Germany)',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
```
## Evaluation
### Metrics
#### Cross Encoder Reranking
* Dataset: `affiliation-val`
* Evaluated with [CrossEncoderRerankingEvaluator](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
```json
{
"at_k": 10,
"always_rerank_positives": true
}
```
| Metric | Value |
|:------------|:---------------------|
| map | 0.9294 (-0.0706) |
| mrr@10 | 0.9294 (-0.0706) |
| **ndcg@10** | **0.9564 (-0.0436)** |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 13,717 training samples
* Columns: query, document, and label
* Approximate statistics based on the first 1000 samples:
| | query | document | label |
|:--------|:-----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:------------------------------------------------|
| type | string | string | int |
| details |
Department of Otolaryngology-Head and Neck Surgery; National Defense Medical College; Saitama Japan | . Department of Otolaryngology-Head and Neck Surgery, National Defense Medical College, Japan. | 1 |
| Department of Otolaryngology-Head and Neck Surgery; National Defense Medical College; Saitama Japan | EOG Resources, Inc | 0 |
| School of Science and Engineering The Chinese University of Hong Kong,Shenzhen,China | School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China, | 1 |
* Loss: [BinaryCrossEntropyLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
```json
{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 2,421 evaluation samples
* Columns: query, document, and label
* Approximate statistics based on the first 1000 samples:
| | query | document | label |
|:--------|:-------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:------------------------------------------------|
| type | string | string | int |
| details | Centre sur le handicap et l'intégration, School of Economics and Political Science Université de Saint‐Gall | College of Saint Benedict and Saint John's University, Collegeville, MN, United States | 0 |
| Swiss Federal Institute of Technology (ETH) Zurich, Institute of Quantum Electronics, Laser Spectroscopy and Sensing Laboratory, Hoenggerberg, HPF D19, CH-8093 Zurich, Switzerland | Laboratory of Crystallography, ETH Zurich, CH-8093 Zurich, Switzerland | 1 |
| Swiss Federal Institute of Technology (ETH) Zurich, Institute of Quantum Electronics, Laser Spectroscopy and Sensing Laboratory, Hoenggerberg, HPF D19, CH-8093 Zurich, Switzerland | Laboratoire d'Electrochimie Physique et Analytique, École Polytechnique Fédérale de Lausanne Station 6, CH-1015 Lausanne, Switzerland | 0 |
* Loss: [BinaryCrossEntropyLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
```json
{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `learning_rate`: 2e-05
- `warmup_ratio`: 0.1
- `bf16`: True
- `load_best_model_at_end`: True
- `push_to_hub`: True
- `hub_model_id`: cometadata/jina-reranker-v2-multilingual-affiliations
#### All Hyperparameters