Sentence Similarity
sentence-transformers
Safetensors
modernbert
feature-extraction
dense
Generated from Trainer
dataset_size:800640
loss:MultipleNegativesRankingLoss
text-embeddings-inference
Instructions to use Shuu12121/Owl-ph2-len2048 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Shuu12121/Owl-ph2-len2048 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Shuu12121/Owl-ph2-len2048") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -64,7 +64,7 @@ model = SentenceTransformer("Shuu12121/Owl-ph2-len2048")
|
|
| 64 |
|
| 65 |
### Training Dataset
|
| 66 |
|
| 67 |
-
This model was trained on the **Owl corpus**, a dataset constructed for code search and code-text retrieval.
|
| 68 |
The training set contains approximately **100,000 samples per language**, resulting in **800,640 training pairs** in total.
|
| 69 |
|
| 70 |
### Training Hyperparameters
|
|
@@ -72,3 +72,23 @@ The training set contains approximately **100,000 samples per language**, result
|
|
| 72 |
* **Learning rate:** 1e-5
|
| 73 |
* **Epochs:** 1
|
| 74 |
* **Loss:** MultipleNegativesRankingLoss
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
### Training Dataset
|
| 66 |
|
| 67 |
+
This model was trained on the [**Owl corpus**](https://huggingface.co/collections/Shuu12121/codesearch-datasets), a dataset constructed for code search and code-text retrieval.
|
| 68 |
The training set contains approximately **100,000 samples per language**, resulting in **800,640 training pairs** in total.
|
| 69 |
|
| 70 |
### Training Hyperparameters
|
|
|
|
| 72 |
* **Learning rate:** 1e-5
|
| 73 |
* **Epochs:** 1
|
| 74 |
* **Loss:** MultipleNegativesRankingLoss
|
| 75 |
+
|
| 76 |
+
## Integrations
|
| 77 |
+
|
| 78 |
+
### Owl-CLI
|
| 79 |
+
|
| 80 |
+
This model is used as the embedding model in **[Owl-CLI](https://github.com/Shun0212/Owl-CLI)**, a command-line tool for semantic code search.
|
| 81 |
+
|
| 82 |
+
Owl-CLI indexes source code at the **function level**, generates dense embeddings using this model, and performs **vector similarity search** to retrieve relevant code for natural language queries.
|
| 83 |
+
|
| 84 |
+
Key features of Owl-CLI include:
|
| 85 |
+
|
| 86 |
+
- **Semantic code search** using dense embeddings
|
| 87 |
+
- **Function-level indexing** with file paths and line numbers
|
| 88 |
+
- **Automatic indexing** on first search
|
| 89 |
+
- **Differential embedding cache** to avoid re-embedding unchanged files
|
| 90 |
+
- **JSON output** for tool integration
|
| 91 |
+
- **MCP server support** for integration with AI coding agents (e.g., Claude Code)
|
| 92 |
+
|
| 93 |
+
Repository:
|
| 94 |
+
https://github.com/Shun0212/Owl-CLI
|