Instructions to use lunahr/CeluneNorm-0.6B-v1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lunahr/CeluneNorm-0.6B-v1.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lunahr/CeluneNorm-0.6B-v1.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("lunahr/CeluneNorm-0.6B-v1.1")
model = AutoModelForMultimodalLM.from_pretrained("lunahr/CeluneNorm-0.6B-v1.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lunahr/CeluneNorm-0.6B-v1.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lunahr/CeluneNorm-0.6B-v1.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lunahr/CeluneNorm-0.6B-v1.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lunahr/CeluneNorm-0.6B-v1.1

SGLang

How to use lunahr/CeluneNorm-0.6B-v1.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lunahr/CeluneNorm-0.6B-v1.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lunahr/CeluneNorm-0.6B-v1.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lunahr/CeluneNorm-0.6B-v1.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lunahr/CeluneNorm-0.6B-v1.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lunahr/CeluneNorm-0.6B-v1.1 with Docker Model Runner:
```
docker model run hf.co/lunahr/CeluneNorm-0.6B-v1.1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A newer version of this model is available: lunahr/CeluneNorm-0.6B-v2.0-ctx1024

Model Card for CeluneNorm-0.6B-v1.1

Model Details

Model Description

CeluneNorm is a lightweight text normalization model designed for TTS and general preprocessing pipelines.

It converts poorly formatted input into clean, readable text while preserving the original meaning.

Example:

Input: this is a badly formed sentence
Output: This is a badly formed sentence.

The model is conservative by design:

It does not rewrite sentences
It avoids changing meaning
It preserves domain-specific tokens (e.g. URLs, commands, names)

Usage

The model expects input in the following format:

YOUR INPUT<NORM>

It will generate the normalized version of the input.

Inference example:

from transformers import pipeline, AutoTokenizer

model_id = "lunahr/CeluneNorm-0.6B-v1.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
    "text-generation",
    model=model_id,
    tokenizer=model_id,
    device="cuda:0",  # "cpu" for CPU-only, slower
)

def normalize(text: str) -> str:
    history = [
        {"role": "user", "content": text}
    ]
    prompt = tokenizer.apply_chat_template(history, tokenize=False)

    out = pipe(
        prompt,
        max_new_tokens=512,
        do_sample=False,
        return_full_text=False,
    )

    return out[0]["generated_text"].strip()

# example
print(normalize("if i type something more complicated into celune it will fix it"))

Caution: CeluneNorm only works reliably on sequences below 128 tokens. Longer inputs may cause problems.

Key Characteristics

Deterministic (no sampling required)
Preserves structure and intent
Handles mixed text (natural language + technical content)
Conservative punctuation (prefers . over ! unless explicit)
Supports multi-sentence normalization when boundaries are clear

Developed by: https://huggingface.co/lunahr
Model type: Causal Language Model
Language(s): English
License: MIT
Base model: Qwen/Qwen3-0.6B-Base

Limitations

This model is not intended to be a full grammar correction system.

Possible limitations include:

May miss some punctuation or casing corrections
May be conservative with contractions (e.g. there s → unchanged)
May preserve ambiguous casing when intent is unclear
Does not expand slang or rewrite informal language

The model prioritizes safety and meaning preservation over aggressive correction.

Training Details

Dataset

Trained on: https://huggingface.co/datasets/lunahr/normalization-data-mixed

The dataset includes a mix of:

Formal text (Wikipedia-style)
Conversational text (PersonaChat)
Synthetic edge cases
Quoted text handling

This combination helps the model generalize across both clean and noisy inputs.

Training Procedure

Fine-tuned from Qwen3-0.6B-Base
Hardware: Kaggle dual NVIDIA T4 (FP16)
Training time: ~1.5 hours
Epochs: 3

Training configuration highlights:

Learning rate: 8e-5
Gradient clipping: 1.0
Warmup: 200 steps (~10%)

Metrics

Final training loss: 0.08841
Mean token accuracy: 97.53%

These metrics reflect token-level accuracy; real-world normalization quality is slightly lower but more representative (~90–95% human-level correctness).