cotu-legal-retriever
Collection
cotu-legal-retriever is a family of models optimized for Vietnamese legal retrieval tasks. • 10 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This is the converted tokenizer for tencent/KaLM-Embedding-Gemma3-12B-2511 to make it compatible with transformers>=5.0.0 (and sentence-transformers>=5.3.0).
To load the model with sentence-transfomrers you can use:
import sentence_transformers.sentence_transformer.modules as st_modules
from sentence_transformers import (
SentenceTransformer,
)
transformer_module = st_modules.Transformer(
"tencent/KaLM-Embedding-Gemma3-12B-2511",
tokenizer_name_or_path="minhnguyent546/KaLM-Embedding-Gemma3-12B-2511-tokenizer-for-transformers-v5",
model_kwargs={
"torch_dtype": "bfloat16",
"attn_implementation": "flash_attention_2",
},
)
pooling_module = st_modules.Pooling(
transformer_module.get_embedding_dimension(),
pooling_mode="lasttoken",
include_prompt=True,
)
normalize_module = st_modules.Normalize()
model = SentenceTransformer(modules=[transformer_module, pooling_module, normalize_module])