litagin/Galgame_Speech_ASR_16kHz
Viewer • Updated • 3.75M • 743 • 45
How to use AkitoP/whisper-large-v3-japense-phone_accent with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="AkitoP/whisper-large-v3-japense-phone_accent") # Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM
processor = AutoProcessor.from_pretrained("AkitoP/whisper-large-v3-japense-phone_accent")
model = AutoModelForMultimodalLM.from_pretrained("AkitoP/whisper-large-v3-japense-phone_accent")This is a Whisper model designed to transcribe Japanese speech into Katakana with pitch accent annotations. The model is built upon the whisper-large-v3-turbo and has been fine-tuned using a subset (1/20) of the Galgame-Speech dataset, as well as the jsut-5000 dataset.
We are currently seeking Japanese pitch accent annotated datasets. If you have such data, please reach out!
Base model
openai/whisper-large-v3