--- license: apache-2.0 language: - en base_model: JusteLeo/Qwen3-0.6B-T5-xxl tags: - split - encoder - embedding - Text Generation --- # Qwen3-0.6B-T5-xxl-split ## Model Description This repository provides the components of the `Qwen3-0.6B-T5-xxl` model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications. Both components are provided in **float32** format to ensure maximum precision for downstream tasks like quantization. ## Repository Contents - **/qwen_body/**: Contains the fine-tuned `Qwen3-0.6B` model body. This is a standard Hugging Face model directory. The model weights are in `float32`. - **/projection_head/**: Contains the fine-tuned projection head as a single `projection_head.pth` file. This is a PyTorch state dictionary. ## How to Use To use these components, you need to load them separately and then combine them in a two-step inference process. ```python import torch from torch import nn from transformers import AutoTokenizer, AutoModel import numpy as np # --- 1. Load Components --- device = "cuda" # Load the model body body_model = AutoModel.from_pretrained("./qwen_body").to(device) tokenizer = AutoTokenizer.from_pretrained("./qwen_body") # Load the projection head # First, re-create the architecture input_dim = body_model.config.hidden_size # 1024 hidden_dim = 2048 output_dim = 4096 head_model = nn.Sequential( nn.Linear(input_dim, hidden_dim), nn.GELU(), nn.Dropout(0.1), nn.Linear(hidden_dim, output_dim) ).to(device) # Then, load the saved weights head_model.load_state_dict(torch.load("./projection_head/projection_head.pth")) body_model.eval() head_model.eval() # --- 2. Create a unified inference function --- def get_final_embedding(text: str): # a) Tokenize the input text inputs = tokenizer(text, return_tensors="pt").to(device) # b) Get the base embedding from the body model with torch.no_grad(): outputs_body = body_model(**inputs) last_hidden_state = outputs_body.last_hidden_state # c) Perform mean pooling attention_mask = inputs['attention_mask'] mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float() sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1) sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9) pooled_embedding = sum_embeddings / sum_mask # d) Pass the pooled embedding through the projection head with torch.no_grad(): final_embedding = head_model(pooled_embedding) return final_embedding # --- 3. Test the pipeline --- prompt = "A high-tech laboratory with glowing vials and holographic displays." embedding = get_final_embedding(prompt) print("Inference successful!") print(f"Output shape: {embedding.shape}") # Expected output shape: (1, 4096) ``` ## License This repository is licensed under the **Apache license 2.0**.