Update README.md
Browse files
README.md
CHANGED
|
@@ -35,6 +35,7 @@ The model is part of a larger pipeline that generates contextual music based on
|
|
| 35 |
|
| 36 |
**Limitations:**
|
| 37 |
- Limited to 4 specific scene categories (cafe, gym, library, outdoor)
|
|
|
|
| 38 |
- Trained on relatively small dataset extracted from videos
|
| 39 |
- May not generalize well to significantly different scene compositions
|
| 40 |
- Performance may degrade on low-quality or heavily edited images
|
|
@@ -81,13 +82,14 @@ Training was conducted over 3 epochs with consistent loss reduction:
|
|
| 81 |
|
| 82 |
| Epoch | Training Loss | Status |
|
| 83 |
|:-----:|:-------------:|:------:|
|
| 84 |
-
| 1 | 0.
|
| 85 |
-
| 2 | 0.
|
| 86 |
-
| 3 | 0.
|
| 87 |
|
| 88 |
Note: Formal validation metrics were not computed during training. Model was validated qualitatively on held-out images.
|
| 89 |
|
| 90 |
## Usage
|
|
|
|
| 91 |
|
| 92 |
### Loading the model
|
| 93 |
|
|
@@ -145,4 +147,6 @@ ResNet-18 Structure:
|
|
| 145 |
|
| 146 |
This model was developed as part of a course project (24-679) exploring multimodal AI systems.
|
| 147 |
It serves as the visual classification component in an image-to-music generation pipeline that combines scene recognition,
|
| 148 |
-
metadata extraction, weather context, and music synthesis.
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
**Limitations:**
|
| 37 |
- Limited to 4 specific scene categories (cafe, gym, library, outdoor)
|
| 38 |
+
- Limited to Carnegie Mellon University (CMU) campus
|
| 39 |
- Trained on relatively small dataset extracted from videos
|
| 40 |
- May not generalize well to significantly different scene compositions
|
| 41 |
- Performance may degrade on low-quality or heavily edited images
|
|
|
|
| 82 |
|
| 83 |
| Epoch | Training Loss | Status |
|
| 84 |
|:-----:|:-------------:|:------:|
|
| 85 |
+
| 1 | 0.3395 | ✓ |
|
| 86 |
+
| 2 | 0.0111 | ✓ |
|
| 87 |
+
| 3 | 0.0041 | ✓ |
|
| 88 |
|
| 89 |
Note: Formal validation metrics were not computed during training. Model was validated qualitatively on held-out images.
|
| 90 |
|
| 91 |
## Usage
|
| 92 |
+
This can be used to classify any input image into one of four classifiers: Library, Cafe, Gym, Outdoor.
|
| 93 |
|
| 94 |
### Loading the model
|
| 95 |
|
|
|
|
| 147 |
|
| 148 |
This model was developed as part of a course project (24-679) exploring multimodal AI systems.
|
| 149 |
It serves as the visual classification component in an image-to-music generation pipeline that combines scene recognition,
|
| 150 |
+
metadata extraction, weather context, and music synthesis.
|
| 151 |
+
|
| 152 |
+
AI- ChatGPT, Claude were used in the creation of this model and dataset
|