--- pipeline_tag: text-generation library_name: transformers language: - en license: apache-2.0 base_model: Qwen/Qwen3-30B-A3B tags: - text2sql - clickhouse - qwen3 - moe - fine-tuned --- # Ekaya-30B-A3B-community Fine-tuned [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) for structured SQL generation with ClickHouse. ## What's Different This model produces **structured JSON output** instead of free-form text: ```json { "output": { "sql": { "query": "SELECT COUNT(*) FROM users", "dialect": "clickhouse", "tables_used": ["users"], "confidence": 0.95, "requires_review": false } } } ``` | Metric | Base Qwen3-30B-A3B | This Model | |--------|-------------------|------------| | Valid JSON | 57% | **100%** | | All fields present | 0% | **100%** | | Correct types | 0% | **100%** | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "ekaya-inc/Ekaya-30B-A3B-community", torch_dtype="auto", device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("ekaya-inc/Ekaya-30B-A3B-community") prompt = """<|im_start|>system You are an expert SQL assistant for ClickHouse. Output ONLY valid JSON. <|im_end|> <|im_start|>user Schema: CREATE TABLE users (id UInt64, name String, email String) Question: Count all users <|im_end|> <|im_start|>assistant """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Technical Specifications | Parameter | Value | |-----------|-------| | Base model | Qwen3-30B-A3B (30B total, 3B active via MoE) | | Fine-tuning method | QLoRA (r=32, alpha=64) | | Quantization | 4-bit NF4 during training | | Training samples | 2,907 synthetic text2sql examples | | Domains | 8 diverse schemas (e-commerce, healthcare, finance, etc.) | | Hardware | NVIDIA DGX Spark (Blackwell) | | Training time | ~9 hours | ### LoRA Target Modules Both attention and expert FFN layers: - Attention: `q_proj`, `k_proj`, `v_proj`, `o_proj` - Expert FFN: `gate_proj`, `up_proj`, `down_proj` (all 128 experts) ## Limitations - **SQL correctness not validated**: Format is correct, but query logic may have errors - **No security training**: Not trained to detect SQL injection or enforce RLS - **Confidence not calibrated**: The `confidence` field is not yet meaningful - **ClickHouse only**: Trained specifically for ClickHouse dialect ## Intended Use This is a **community preview** for developers exploring embeddable text2sql models. Production use should wait for: - SQL quality validation - Security awareness training - Confidence calibration ## License Apache 2.0 (same as base model)