# LumynaX MaramaRoute Product Blueprint ## One-Sentence Product MaramaRoute is the AbteeX AI Labs conversational CLI for selecting, pulling, running, and routing the LumynaX model family published under AbteeXAILab on Hugging Face. ## Core User Jobs | User | Job | MaramaRoute Response | | --- | --- | --- | | Developer | See all LumynaX models without reading separate model cards first. | `MaramaRoute catalog` and `MaramaRoute models`. | | Developer | Pick a model and talk to it from the terminal. | `MaramaRoute chat` with model search, pull prompt, and chat commands. | | Local operator | Download a selected model artifact. | `MaramaRoute pull `. | | Local operator | Run a pulled GGUF model. | `MaramaRoute run "prompt"`. | | SovereignCode | Select a resident coding model for a governed workspace. | Code task route with NZ residency and JSON/tool gates. | | Model publisher | Add new LumynaX releases without changing every client. | Registry compiler and aliases. | | Enterprise tenant | Restrict models by region, license, runtime, and sensitivity. | Tenant policy packs and allowlists. | ## Product Pillars 1. AbteeX first: the model list is based on AbteeXAILab Hugging Face releases. 2. Sovereign default: New Zealand residency is the default route constraint. 3. Model provenance: every selectable model carries repo, artifact, license, runtime, modality, context, and validation metadata. 4. Deterministic audit: decisions are explainable and repeatable for the same registry and request. 5. Runtime independence: route selection is separate from the backend that runs llama.cpp, Transformers, embeddings, speech, or multimodal models. ## Minimum Local Loop ```text install package -> open MaramaRoute chat -> choose or search a runnable LumynaX model -> pull selected Hugging Face artifact when missing -> chat with local GGUF model or route request -> return selected model, fallbacks, scores, and audit metadata ``` ## Model Alias Strategy | Alias | Intended Use | Route Bias | | --- | --- | --- | | `lumynax/auto` | General application calls. | Best resident general model. | | `lumynax/code` | Coding agents and repo work. | Coder tags, JSON support, tool support. | | `lumynax/reasoning` | Planning, analysis, evaluation. | Reasoning tags and stronger quality rank. | | `lumynax/multimodal` | Image plus text requests. | Multimodal runtime and policy-permitted residency. | | `lumynax/local` | Sensitive tenant work. | Local GGUF or resident runtime only. | ## Runtime Adapters | Adapter | Purpose | First Implementation | | --- | --- | --- | | llama.cpp local | Run GGUF files from the MaramaRoute cache. | `llama-cpp-python` in `MaramaRoute run`. | | llama.cpp HTTP | Run GGUF models behind a local or tenant endpoint. | Backend URL configured in `gateway.local.json`. | | Transformers | Run safetensors or multimodal packages. | Python worker with model cache and VRAM guard. | | Embeddings | Serve retrieval models. | Embedding route adapter after text route is stable. | | Speech | Serve Whisper/Kokoro-style packages. | Separate speech endpoints after text route is stable. | | Hosted LumynaX | Private hosted runtime. | Tenant auth, quotas, and audit export. | ## Commercial Controls | Control | Why it matters | | --- | --- | | API keys | Required for IDEs and internal apps. | | Tenant quotas | Prevent runaway local or hosted compute spend. | | Model allowlists | Keep restricted tenants away from unsuitable models. | | Route metadata | Lets customers prove why a model was used. | | Prompt retention flag | Supports privacy-sensitive deployments by default. | | Registry signing | Prevents silent model substitution. | ## First Non-Negotiables - Do not silently route restricted NZ data to a non-resident model. - Do not pick a model that lacks required modality, JSON, or tool support. - Do not hide rejection reasons from route metadata. - Do not retain prompts by default for high-sensitivity routes. - Do not let runtime adapters substitute models outside the route decision.