AICoven Llama 3.2 3B — MCP Tool Calling

A LoRA fine-tuned version of Llama 3.2 3B Instruct (4-bit MLX) optimized for MCP (Model Context Protocol) tool calling in the AICoven app.

This model runs 100% on-device on Apple Silicon via MLX.

Model Details

Base Model mlx-community/Llama-3.2-3B-Instruct-4bit
Architecture LlamaForCausalLM
Quantization 4-bit (group size 64)
Fine-tuning LoRA (rank 16, alpha 32, 16 layers)
Framework MLX / mlx-lm
Size ~1.8 GB

Training

Fine-tuned on Apple M4 (24GB) using mlx-lm LoRA with gradient checkpointing.

  • Dataset: 177 synthetic examples in Chat-ML format
    • 114 single tool calls
    • 32 multi-turn (tool → result → response chains)
    • 31 no-tool / conversational (negative examples to reduce over-triggering)
  • Tool Coverage: 30 MCP tools across GitHub, Google Workspace (Gmail, Calendar, Drive, Docs, Sheets, Slides), Slack, Notion, Trello, TickTick, GA4, and more
  • Hyperparameters: lr=1e-4, batch_size=1, 150 iterations, max_seq_length=3072
  • Training Loss: Converged from 1.004 → 0.010 (val: 0.013)
  • Peak Memory: 6.4 GB

Evaluation

Tested on a 50-example novel test set (prompts never seen during training):

Metric Result
Accuracy 86% (43/50)
Refusals 0%
Format Errors 2% (1/50)
Unambiguous Tool Selection 94%+

The model outputs strict JSON tool calls without markdown code fences or conversational fluff.

Intended Use

This model is designed for the AICoven macOS/iOS app to provide local, private AI agent capabilities. It selects and invokes MCP tools based on user requests, supporting:

  • Single tool calls (e.g., "What time is it in Tokyo?")
  • Multi-step reasoning chains (e.g., "Find Python files in Documents and count them")
  • Graceful no-tool responses for conversational queries

How to Use

from mlx_lm import load, generate

model, tokenizer = load("aicoven/Llama-3.2-3B-Instruct-4bit-MCP-LoRA")
response = generate(model, tokenizer, prompt="What's the weather like?", max_tokens=256)

Limitations

  • Optimized specifically for AICoven's tool schema; may not generalize to arbitrary tool-calling formats
  • 3B parameter model — best for well-defined tool selection, not open-ended reasoning
  • Requires Apple Silicon (M1+) for MLX inference

License

This model inherits the Llama 3.2 Community License from Meta.

Built with Llama.

Downloads last month
20
Safetensors
Model size
0.5B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aicoven/Llama-3.2-3B-Instruct-4bit-MCP-LoRA