Text Generation
GGUF
llama.cpp
quantized
llama-cpp
qwen3.5-moe
mixture-of-experts
agents-a1
nvfp4
mtp
speculative-decoding
mmproj
multimodal
vision
qwen3vl
imatrix
conversational
Instructions to use LordNeel/Agents-A1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use LordNeel/Agents-A1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LordNeel/Agents-A1-GGUF", filename="agents-a1-IQ4_XS-MTP-graft-headQ6.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use LordNeel/Agents-A1-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf LordNeel/Agents-A1-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf LordNeel/Agents-A1-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LordNeel/Agents-A1-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LordNeel/Agents-A1-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Use Docker
docker model run hf.co/LordNeel/Agents-A1-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use LordNeel/Agents-A1-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LordNeel/Agents-A1-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LordNeel/Agents-A1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LordNeel/Agents-A1-GGUF:Q4_K_M
- Ollama
How to use LordNeel/Agents-A1-GGUF with Ollama:
ollama run hf.co/LordNeel/Agents-A1-GGUF:Q4_K_M
- Unsloth Studio
How to use LordNeel/Agents-A1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LordNeel/Agents-A1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LordNeel/Agents-A1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LordNeel/Agents-A1-GGUF to start chatting
- Pi
How to use LordNeel/Agents-A1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "LordNeel/Agents-A1-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use LordNeel/Agents-A1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default LordNeel/Agents-A1-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use LordNeel/Agents-A1-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf LordNeel/Agents-A1-GGUF:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "LordNeel/Agents-A1-GGUF:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use LordNeel/Agents-A1-GGUF with Docker Model Runner:
docker model run hf.co/LordNeel/Agents-A1-GGUF:Q4_K_M
- Lemonade
How to use LordNeel/Agents-A1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LordNeel/Agents-A1-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Agents-A1-GGUF-Q4_K_M
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| from __future__ import annotations | |
| import argparse | |
| import json | |
| import os | |
| import shutil | |
| from pathlib import Path | |
| def link_or_copy(src: Path, dst: Path) -> str: | |
| if dst.exists() or dst.is_symlink(): | |
| dst.unlink() | |
| try: | |
| os.link(src, dst) | |
| return "hardlink" | |
| except OSError: | |
| try: | |
| dst.symlink_to(src) | |
| return "symlink" | |
| except OSError: | |
| shutil.copy2(src, dst) | |
| return "copy" | |
| def main() -> None: | |
| parser = argparse.ArgumentParser( | |
| description="Build an Agents-A1 HF snapshot with the MTPLX MTP sidecar indexed." | |
| ) | |
| parser.add_argument("--source", type=Path, required=True) | |
| parser.add_argument("--donor-index", type=Path, required=True) | |
| parser.add_argument("--mtp-sidecar", type=Path, required=True) | |
| parser.add_argument("--output", type=Path, required=True) | |
| parser.add_argument("--donor-repo", default="wang-yang/Agents-A1-MTPLX-Q4") | |
| args = parser.parse_args() | |
| source = args.source.resolve() | |
| output = args.output.resolve() | |
| donor_index_path = args.donor_index.resolve() | |
| mtp_sidecar = args.mtp_sidecar.resolve() | |
| if not (source / "config.json").exists(): | |
| raise FileNotFoundError(source / "config.json") | |
| if not (source / "model.safetensors.index.json").exists(): | |
| raise FileNotFoundError(source / "model.safetensors.index.json") | |
| if not donor_index_path.exists(): | |
| raise FileNotFoundError(donor_index_path) | |
| if not mtp_sidecar.exists(): | |
| raise FileNotFoundError(mtp_sidecar) | |
| output.mkdir(parents=True, exist_ok=True) | |
| linked: dict[str, str] = {} | |
| skip_names = {"config.json", "model.safetensors.index.json", "mtp.safetensors"} | |
| for item in sorted(source.iterdir()): | |
| if item.name in skip_names: | |
| continue | |
| dst = output / item.name | |
| if item.is_dir(): | |
| if dst.exists(): | |
| shutil.rmtree(dst) | |
| shutil.copytree(item, dst, symlinks=True) | |
| linked[item.name] = "copytree" | |
| elif item.is_file() or item.is_symlink(): | |
| linked[item.name] = link_or_copy(item.resolve(), dst) | |
| mtp_link_method = link_or_copy(mtp_sidecar, output / "mtp.safetensors") | |
| config = json.loads((source / "config.json").read_text()) | |
| config["mtp_num_hidden_layers"] = 1 | |
| text_config = config.setdefault("text_config", {}) | |
| text_config["mtp_num_hidden_layers"] = 1 | |
| config["_mtp_graft_provenance"] = { | |
| "donor_repo": args.donor_repo, | |
| "sidecar_file": "mtp.safetensors", | |
| "method": "HF safetensors sidecar indexed before llama.cpp Qwen3.5-MoE conversion", | |
| } | |
| (output / "config.json").write_text(json.dumps(config, indent=2, sort_keys=True) + "\n") | |
| base_index = json.loads((source / "model.safetensors.index.json").read_text()) | |
| donor_index = json.loads(donor_index_path.read_text()) | |
| base_weight_map = dict(base_index.get("weight_map", {})) | |
| donor_weight_map = donor_index.get("weight_map", {}) | |
| mtp_keys = sorted(k for k in donor_weight_map if k.startswith("mtp.")) | |
| if not mtp_keys: | |
| raise ValueError(f"No mtp.* keys found in donor index {donor_index_path}") | |
| overlapping = sorted(k for k in mtp_keys if k in base_weight_map) | |
| if overlapping: | |
| raise ValueError(f"MTP keys already present in base index, first overlap: {overlapping[:5]}") | |
| for key in mtp_keys: | |
| base_weight_map[key] = "mtp.safetensors" | |
| metadata = dict(base_index.get("metadata", {})) | |
| total_size = metadata.get("total_size") | |
| if isinstance(total_size, int): | |
| metadata["total_size"] = total_size + mtp_sidecar.stat().st_size | |
| metadata["mtp_sidecar_total_size"] = mtp_sidecar.stat().st_size | |
| metadata["mtp_sidecar_keys"] = len(mtp_keys) | |
| metadata["mtp_sidecar_repo"] = args.donor_repo | |
| merged_index = { | |
| "metadata": metadata, | |
| "weight_map": dict(sorted(base_weight_map.items())), | |
| } | |
| (output / "model.safetensors.index.json").write_text( | |
| json.dumps(merged_index, indent=2, sort_keys=True) + "\n" | |
| ) | |
| report = { | |
| "source": str(source), | |
| "output": str(output), | |
| "donor_repo": args.donor_repo, | |
| "donor_index": str(donor_index_path), | |
| "mtp_sidecar": str(mtp_sidecar), | |
| "mtp_sidecar_link_method": mtp_link_method, | |
| "source_entries_linked": linked, | |
| "base_weight_count": len(base_index.get("weight_map", {})), | |
| "mtp_weight_count": len(mtp_keys), | |
| "merged_weight_count": len(base_weight_map), | |
| "config_mtp_num_hidden_layers": config.get("mtp_num_hidden_layers"), | |
| "text_config_mtp_num_hidden_layers": text_config.get("mtp_num_hidden_layers"), | |
| } | |
| (output / "mtp_snapshot_report.json").write_text( | |
| json.dumps(report, indent=2, sort_keys=True) + "\n" | |
| ) | |
| print(json.dumps(report, indent=2, sort_keys=True)) | |
| if __name__ == "__main__": | |
| main() | |