Instructions to use lunahr/CeluneNorm-0.6B-v2.0-ctx1024 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lunahr/CeluneNorm-0.6B-v2.0-ctx1024 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lunahr/CeluneNorm-0.6B-v2.0-ctx1024") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("lunahr/CeluneNorm-0.6B-v2.0-ctx1024") model = AutoModelForMultimodalLM.from_pretrained("lunahr/CeluneNorm-0.6B-v2.0-ctx1024") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lunahr/CeluneNorm-0.6B-v2.0-ctx1024 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lunahr/CeluneNorm-0.6B-v2.0-ctx1024" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lunahr/CeluneNorm-0.6B-v2.0-ctx1024", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/lunahr/CeluneNorm-0.6B-v2.0-ctx1024
- SGLang
How to use lunahr/CeluneNorm-0.6B-v2.0-ctx1024 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lunahr/CeluneNorm-0.6B-v2.0-ctx1024" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lunahr/CeluneNorm-0.6B-v2.0-ctx1024", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lunahr/CeluneNorm-0.6B-v2.0-ctx1024" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lunahr/CeluneNorm-0.6B-v2.0-ctx1024", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use lunahr/CeluneNorm-0.6B-v2.0-ctx1024 with Docker Model Runner:
docker model run hf.co/lunahr/CeluneNorm-0.6B-v2.0-ctx1024
Model Card for CeluneNorm-0.6B-v2.0-ctx1024
Model Details
Model Description
CeluneNorm is a lightweight text normalization model designed for TTS and general preprocessing pipelines.
It converts poorly formatted input into clean, readable text while preserving the original meaning.
Example:
- Input:
this is a badly formed sentence - Output:
This is a badly formed sentence.
The model is conservative by design:
- It does not rewrite sentences
- It avoids changing meaning
- It preserves domain-specific tokens (e.g. URLs, commands, names)
Update
Version 2.0 improves performance on longer contexts compared to version 1.3.
It is recommended to use version 2.0 if you require to normalize inputs of more than 128 tokens, up to 1024 tokens.
The model should retain the base capabilities of version 1.3 across longer context windows.
Usage
The model expects input in the following format:
YOUR INPUT<NORM>
It will generate the normalized version of the input.
Inference example:
from transformers import pipeline, AutoTokenizer
model_id = "lunahr/CeluneNorm-0.6B-v2.0-ctx1024"
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
"text-generation",
model=model_id,
tokenizer=model_id,
device="cuda:0", # "cpu" for CPU-only, slower
)
def normalize(text: str) -> str:
history = [
{"role": "user", "content": text}
]
prompt = tokenizer.apply_chat_template(history, tokenize=False)
out = pipe(
prompt,
max_new_tokens=1024,
do_sample=False,
return_full_text=False,
)
return out[0]["generated_text"].strip()
# example
print(normalize("if i type something more complicated into celune it will fix it"))
Key Characteristics
- Deterministic (no sampling required)
- Preserves structure and intent
- Handles mixed text (natural language + technical content)
- Conservative punctuation (prefers
.over!unless explicit) - Supports multi-sentence normalization when boundaries are clear
- Supports long-context normalization of up to 1024 tokens
- Developed by: https://huggingface.co/lunahr
- Model type: Causal Language Model
- Language(s): English
- License: MIT
- Base model: Qwen/Qwen3-0.6B-Base
Limitations
This model is not intended to be a full grammar correction system.
Possible limitations include:
- May miss some punctuation or casing corrections
- May be conservative with contractions (e.g.
there s→ unchanged) - May preserve ambiguous casing when intent is unclear
- Does not expand slang or rewrite informal language
- May miss some characters at extended context lengths
The model prioritizes safety and meaning preservation over aggressive correction.
Training Details
Dataset
Trained on: https://huggingface.co/datasets/lunahr/normalization-data-mixed
The dataset includes a mix of:
- Formal text (Wikipedia-style)
- Conversational text (PersonaChat)
- Synthetic edge cases
- Quoted text handling
This combination helps the model generalize across both clean and noisy inputs.
This version was also tuned on an additional 10k rows of casing data to improve accuracy.
Long context coherence was trained with 5k rows of 1024-context-length data to stop the model from looping on itself.
Training Procedure
- Fine-tuned from Qwen3-0.6B-Base
- Hardware: Kaggle dual NVIDIA T4 (FP16, casted to BF16 for output model)
- Training time: ~1.5 hours + ~5 minutes (casing CFT) + ~40 minutes (long-context CFT)
- Epochs: 3 + 1 (casing CFT) + 1 (long context CFT)
Training configuration highlights:
- Learning rate: 8e-5
- Gradient clipping: 1.0
- Warmup: 200 steps (~10%)
Metrics
- Final training loss: 0.08841 (0.06989 for casing CFT, 0.1516 for long-context CFT)
- Mean token accuracy: 97.53% (99.77% for casing CFT, 98.03% for long-context CFT)
These metrics reflect token-level accuracy; real-world normalization quality is slightly lower but more representative (~90–95% human-level correctness).
- Downloads last month
- 215