Instructions to use oliverguhr/fullstop-punctuation-multilang-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use oliverguhr/fullstop-punctuation-multilang-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="oliverguhr/fullstop-punctuation-multilang-large")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large") model = AutoModelForTokenClassification.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large") - Inference
- Notebooks
- Google Colab
- Kaggle
GGUF for fullstop-punctuation in CrispASR (drop-in --punc-model for any ASR backend)
Thanks for the multilingual punctuation model! It's wired into CrispASR as a drop-in --punc-model post-processor for any ASR backend.
src/fireredpunc.cpp (despite the name, also handles the XLM-R-large fullstop-punc family — same dispatch path on the CTC + label head). 24L XLM-R-large encoder + 6-class punctuation head. Q4_K is the recommended default at ~254 MB; F16 is 1.6 GB and produces identical output on JFK.
This is the default punctuation post-processor for the CrispASR CTC backends that don't emit punctuation natively (Wav2Vec2 / HuBERT / Data2Vec / FastConformer-CTC / OmniASR-CTC / FireRed-ASR). The CN+EN sibling cstr/fireredpunc-GGUF (BERT-base) is preferred for Mandarin.
Also reachable from the Python / Rust / Dart bindings as crispasr.PuncModel.
Pre-quantised GGUFs (MIT): cstr/fullstop-punc-multilang-GGUF
./build/bin/crispasr --backend wav2vec2 \
-m wav2vec2-xlsr-de-q4_k.gguf -f audio.wav \
--punc-model fullstop-punc-q4_k.gguf
(All quants tested identical on JFK — quantisation is essentially free for this model.)