--- tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:800640 - loss:MultipleNegativesRankingLoss base_model: Shuu12121/Owl-ph2-base-len2048 pipeline_tag: sentence-similarity library_name: sentence-transformers --- # Shuu12121/Owl-ph2-len2048 🦉 ``` ██████╗ ██╗ ██╗██╗ ██████╗ ██╗ ██╗ ██╔═══██╗██║ ██║██║ ██╔════╝ ██║ ██║ ,______, ██║ ██║██║ █╗ ██║██║ ██████╗ ██║ ██║ ██║ ( O v O ) ██║ ██║██║███╗██║██║ ╚═════╝ ██║ ██║ ██║ / V \ ╚██████╔╝╚███╔███╔╝███████╗ ╚██████╗ ███████╗ ██║ /( )\ ╚═════╝ ╚══╝╚══╝ ╚══════╝ ╚═════╝ ╚══════╝ ╚═╝ ^^ ^^ ``` ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [Shuu12121/Owl-ph2-base-len2048](https://huggingface.co/Shuu12121/Owl-ph2-base-len2048) - **Maximum Sequence Length:** 1024 tokens (2048 tokens during pretraining) - **Output Dimensionality:** 768 - **Similarity Function:** Cosine Similarity This model is a SentenceTransformer variant of **Shuu12121/Owl-ph2-base-len2048**. It was trained on the **Owl corpus** for **code search** and **code-text retrieval**. The training data consists of roughly **100,000 samples per language** (**800,640 pairs** in total), and the model was trained for **1 epoch** with a **learning rate of 1e-5**. ### Model Sources - **Base model:** [Shuu12121/Owl-ph2-base-len2048](https://huggingface.co/Shuu12121/Owl-ph2-base-len2048) - **Sentence Transformers:** [Sentence Transformers Documentation](https://sbert.net) ### Full Model Architecture ```text SentenceTransformer( (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Intended Uses This model is intended for: * code search * code-text retrieval * semantic similarity * dense embedding generation for source code and natural language ## Usage ### Direct Usage (Sentence Transformers) ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("Shuu12121/Owl-ph2-len2048") ``` ## Training Details ### Training Dataset This model was trained on the [**Owl corpus**](https://huggingface.co/collections/Shuu12121/codesearch-datasets), a dataset constructed for code search and code-text retrieval. The training set contains approximately **100,000 samples per language**, resulting in **800,640 training pairs** in total. ### Training Hyperparameters * **Learning rate:** 1e-5 * **Epochs:** 1 * **Loss:** MultipleNegativesRankingLoss --- ## Integrations ### 🦉 Owl-CLI — Semantic Code Search in Your Terminal > **Repository:** [https://github.com/Shun0212/Owl-CLI](https://github.com/Shun0212/Owl-CLI) **Owl-ph2-len2048** is the embedding backbone of **[Owl-CLI](https://github.com/Shun0212/Owl-CLI)**, a command-line tool for semantic code search powered by dense retrieval. Owl-CLI indexes your codebase at the **function level**, encodes each function using this model, and performs **vector similarity search** to find relevant code for natural language queries — directly from your terminal. #### Key Features | Feature | Description | |---|---| | Semantic search | Natural language → relevant functions via dense embeddings | | Function-level indexing | Indexed with file paths and line numbers | | Differential cache | Only re-embeds changed files | | JSON output | Easy integration with other tools and scripts | | MCP server support | Plug into AI coding agents (e.g., Claude Code, Cursor) | #### Example: Query Routing ![example-routing](https://raw.githubusercontent.com/Shun0212/Owl-CLI/main/docs/images/example-routing.png) #### Example: Interactive Session ![example-session](https://raw.githubusercontent.com/Shun0212/Owl-CLI/main/docs/images/example-session.png) #### Quick Start ```bash # Install git clone https://github.com/Shun0212/Owl-CLI.git # Index your codebase and search owl search "function that handles authentication" # JSON output for tool integration owl search "parse config file" --json # Start MCP server for AI agent integration owl mcp ``` For full documentation and installation instructions, see the [Owl-CLI repository](https://github.com/Shun0212/Owl-CLI).