Fine-tuned VGG-16 Model for Gunshot Detection

This is a fine-tuned VGG-16 model for detecting gunshots in audio recordings. The model was trained on a dataset of audio clips labeled as either "gunshot" or "background".

Model Details

Fine-tuned by: Ranabir Saha
Fine-tuned on: Tropical forest gunshot classification training audio dataset from Automated detection of gunshots in tropical forests using convolutional neural networks (Katsis et al. 2022)
Dataset Source: https://doi.org/10.17632/x48cwz364j.3
Input: 4-second .wav audio files, converted to 224x224x3 mel-spectrograms during preprocessing
Output: Binary classification (Gunshot/Background)

Training

The model was trained using the following parameters:

Base Model: VGG-16 pre-trained on ImageNet
Optimizer: Adam (initial learning rate=0.0001, fine-tuning learning rate=1e-5)
Loss Function: Categorical cross-entropy
Metrics: Accuracy, Precision, Recall
Batch Size: 32
Initial Training: Up to 25 epochs with early stopping (patience=5) on validation loss
Fine-tuning: Last 8 layers unfrozen, up to 10 epochs with early stopping (patience=5)
Class Weights: Balanced to handle class imbalance

Usage

To use this model for inference, you can load it from the Hugging Face Hub and pass preprocessed mel-spectrograms generated from .wav files as input.

Example

import numpy as np
import tensorflow as tf
import librosa
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(repo_id="ranvir-not-found/vgg16-sda_gunshot-detection", filename="vgg16_model.keras")
model = tf.keras.models.load_model(model_path)

# Function to load and preprocess .wav file
def load_and_preprocess_wav(file_path):
    # Load and process audio
    audio_data, sr = librosa.load(file_path, sr=None)
    mel_spectrogram = librosa.feature.melspectrogram(
        y=audio_data, sr=sr, n_mels=224, fmax=4000
    )
    mel_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)
    # Normalize
    spec_min = np.min(mel_spectrogram)
    spec_max = np.max(mel_spectrogram)
    if spec_max > spec_min:
        mel_spectrogram = 255 * (mel_spectrogram - spec_min) / (spec_max - spec_min)
    else:
        mel_spectrogram = np.zeros_like(mel_spectrogram)
    mel_spectrogram = mel_spectrogram.astype(np.float32)
    # Resize to 224x224
    mel = tf.image.resize(mel_spectrogram[..., np.newaxis], (224, 224))
    # Repeat to create 3 channels
    mel = tf.repeat(mel, 3, axis=-1)
    # Apply VGG-16 preprocessing
    mel = tf.keras.applications.vgg16.preprocess_input(mel)
    return mel

# Example usage
wav_path = "path/to/your/audio.wav"
input_data = load_and_preprocess_wav(wav_path)
input_data = tf.expand_dims(input_data, axis=0)  # Add batch dimension
predictions = model.predict(input_data)
class_names = ['gunshot', 'background']
predicted_class = class_names[np.argmax(predictions[0])]
print(f"Predicted class: {predicted_class}, Probabilities: {predictions[0]}")

Evaluation

The model was evaluated on a validation set, and the following metrics were computed:

Classification Report (including Accuracy, Precision, Recall) The model was optimized for recall on the 'gunshot' class.

For more details, please refer to the training script and logs.

Downloads last month: -; Downloads are not tracked for this model. How to track