File size: 3,991 Bytes

27e4f2b

---
language: en
license: mit
tags:
- image-classification
- coffee
- hybrid-model
- cnn
- transformer
- resnet
- vision-transformer
datasets:
- USK-Coffee
metrics:
- accuracy
- precision
- recall
- f1
library_name: pytorch
---

# Green Arabica Coffee Bean Classification (R50-V Hybrid Model)

## Model Description

This model is a hybrid deep learning architecture combining **ResNet-50** (CNN-based) and **Vision Transformer** for classifying green arabica coffee beans into four quality classes. The model achieved **91.88% accuracy** on the USK-Coffee dataset, outperforming previous single-model approaches.

**Model Type:** Hybrid CNN-Transformer  
**Architecture:** ResNet-50 + Vision Transformer (R50-V)  
**Task:** Multi-class Image Classification  
**Classes:** 4 (Peaberry, Longberry, Premium, Defect)

## Model Architecture

The hybrid model processes input images through two parallel branches:

1. **CNN Branch (ResNet-50)**: Extracts local visual features such as texture, edges, and patterns
2. **Transformer Branch (Vision Transformer)**: Captures global context and long-range dependencies

Features from both branches are concatenated and passed through a classification head.

## How to Use

### Installation

```bash
pip install torch torchvision pillow huggingface_hub
```

### Code

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.models import (
    resnet50, vit_b_16,
    ResNet50_Weights, ViT_B_16_Weights
)
from huggingface_hub import hf_hub_download

class HybridModel(nn.Module):
    def __init__(self, num_classes=4):
        super(HybridModel, self).__init__()
        self.resnet = resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
        resnet_in_features = self.resnet.fc.in_features
        self.resnet.fc = nn.Identity()

        self.vit = vit_b_16(weights=ViT_B_16_Weights.IMAGENET1K_V1)
        vit_in_features = self.vit.heads.head.in_features
        self.vit.heads = nn.Identity()

        self.fc = nn.Sequential(
            nn.Linear(resnet_in_features + vit_in_features, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x_resnet, x_vit = x
        local_features = self.resnet(x_resnet)
        global_features = self.vit(x_vit)
        combined_features = torch.cat((local_features, global_features), dim=1)
        return self.fc(combined_features)

def load_model():
    device = "cuda" if torch.cuda.is_available() else "cpu"

    model_path = hf_hub_download(
        repo_id="kelvinandreas/Green_Arabica_Coffee_Bean_Classification_R50-V",
        filename="best_model.pth"
    )

    model = HybridModel(num_classes=4)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.to(device)
    model.eval()

    return model, device

def preprocess_image(image_path, device):
    image = Image.open(image_path).convert("RGB")

    preprocess_resnet = ResNet50_Weights.IMAGENET1K_V1.transforms()
    preprocess_vit = ViT_B_16_Weights.IMAGENET1K_V1.transforms()

    x_resnet = preprocess_resnet(image).unsqueeze(0).to(device)
    x_vit = preprocess_vit(image).unsqueeze(0).to(device)

    return x_resnet, x_vit

def predict(image_path):
    CLASS_LABELS = ['Defect', 'Longberry', 'Peaberry', 'Premium']

    model, device = load_model()
    x_resnet, x_vit = preprocess_image(image_path, device)

    with torch.no_grad():
        logits = model((x_resnet, x_vit))

    probs = F.softmax(logits, dim=1)
    pred_idx = torch.argmax(probs, dim=1).item()
    confidence = probs[0][pred_idx].item()

    return {
        "prediction": CLASS_LABELS[pred_idx],
        "confidence_percent": round(confidence * 100, 2)
    }

image_path = "your_coffee_bean_image.jpg"
result = predict(image_path)
print(result)
```

## Evaluation Results

| Metric | Score |
|--------|-------|
| **Accuracy** | **91.88%** |
| **Precision** | 92.12% |
| **Recall** | 91.88% |
| **F1-Score** | 91.85% |