How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Delentia/delentia-slm-jitna-scribe"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Delentia/delentia-slm-jitna-scribe",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/Delentia/delentia-slm-jitna-scribe:Q4_K_M
Quick Links

Delentia SLM β€” The Scribe (slm-jitna-scribe)

The Scribe is a specialized context compression LoRA adapter in the Delentia OS 1+4 Pillar Architecture. It solves the problem of context window saturation.

Core Mechanics

  1. Recursive Summarization: Condenses long historical chat context into a structured, minimal TOON representation.
  2. Noise Reduction: Filters out colloquial conversational elements, keeping only actionable parameters.

Technical Specifications

  • Base Model: unsloth/Meta-Llama-3.1-8B-bnb-4bit
  • Format: GGUF Q4_K_M (Quantized via llama.cpp)
  • Primary Metrics:
    • TOON v0.2 Compliance: $\ge 90%$
    • Token Savings: $\ge 15%$
Downloads last month
15
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Delentia/delentia-slm-jitna-scribe

Quantized
(237)
this model

Spaces using Delentia/delentia-slm-jitna-scribe 2

Collection including Delentia/delentia-slm-jitna-scribe