Sentence Similarity
Transformers
Safetensors
French
camembert
fill-mask
passage-retrieval
Eval Results (legacy)
text-embeddings-inference
Instructions to use antoinelouis/splade-max-camembert-base-mmarcoFR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use antoinelouis/splade-max-camembert-base-mmarcoFR with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("antoinelouis/splade-max-camembert-base-mmarcoFR") model = AutoModelForMaskedLM.from_pretrained("antoinelouis/splade-max-camembert-base-mmarcoFR") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -99,8 +99,8 @@ with BM25 negatives.
|
|
| 99 |
#### Implementation
|
| 100 |
|
| 101 |
The model is initialized from the [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) checkpoint and optimized via a combination of the InfoNCE
|
| 102 |
-
ranking loss with a temperature of 0.05 and the FLOPS regularization loss with quadratic increase of lambda until step 33k after which it remains constant with lambda_q
|
| 103 |
-
|
| 104 |
of 2e-5 with warm up along the first 4000 steps and linear scheduling. The maximum sequence lengths for questions and passages length were fixed to 32 and 128 tokens.
|
| 105 |
Relevance scores are computed with the cosine similarity.
|
| 106 |
|
|
|
|
| 99 |
#### Implementation
|
| 100 |
|
| 101 |
The model is initialized from the [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) checkpoint and optimized via a combination of the InfoNCE
|
| 102 |
+
ranking loss with a temperature of 0.05 and the FLOPS regularization loss with quadratic increase of lambda until step 33k after which it remains constant with lambda_q=3e-4
|
| 103 |
+
and lambda_d=1e-4. The model is fine-tuned on one 80GB NVIDIA H100 GPU for 100k steps using the AdamW optimizer with a batch size of 128, a peak learning rate
|
| 104 |
of 2e-5 with warm up along the first 4000 steps and linear scheduling. The maximum sequence lengths for questions and passages length were fixed to 32 and 128 tokens.
|
| 105 |
Relevance scores are computed with the cosine similarity.
|
| 106 |
|