Replace FP16 with INT8 k-means palettization (2.2 MB, -48% size, PESQ/STOI identical)

1af4db9 verified about 2 months ago

2.58 kB

license: apache-2.0
tags:
  - speech-enhancement
  - denoising
  - coreml
  - apple-silicon
  - deepfilternet
  - int8
  - palettization
base_model: Rikorose/DeepFilterNet3
library_name: coreml
pipeline_tag: audio-to-audio

DeepFilterNet3 — CoreML INT8

Real-time speech enhancement for Apple Silicon. Removes background noise from speech audio. Runs on Neural Engine via CoreML.

2.1M params, INT8 k-means palettization, 2.2 MB
48 kHz native, 10 ms frames
Requires macOS 14+ / iOS 17+

Quality

Measured on 30 VoiceBank-DEMAND test clips via Python CoreMLBackend (replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter post-processing intact).

Variant	PESQ	STOI	SI-SDR	Size
PyTorch FP32 (reference)	2.900	0.947	18.19	—
CoreML FP16	2.901	0.947	18.19	4.2 MB
CoreML INT8 (this repo)	2.907	0.947	18.11	2.2 MB

INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR −0.07 dB, STOI identical) while cutting size by 48%.

Latency (M2 Max)

Duration	Time	RTF
5 s	0.65 s	0.13
10 s	1.2 s	0.12
20 s	4.8 s	0.24

Files

File	Size	Description
`DeepFilterNet3.mlmodelc`	2.2 MB	Pre-compiled CoreML model (runs on Neural Engine)
`auxiliary.npz`	126 KB	ERB filterbank, Vorbis window, normalization states

Usage

Add speech-swift to Package.swift:

.package(url: "https://github.com/soniqo/speech-swift", branch: "main")

Then denoise:

import SpeechEnhancement

let enhancer = try await SpeechEnhancer.fromPretrained()
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)

CLI:

swift run audio denoise noisy.wav --output clean.wav

Source

Base model: Rikorose/DeepFilterNet3 (Apache-2.0)

License

Model weights: Apache-2.0 / MIT dual license
CoreML conversion: Apache-2.0

Reference

DeepFilterNet3 paper

aufklarer
/

DeepFilterNet3-CoreML