--- base_model: Omnifact/conditional-detr-resnet-101-dc5 datasets: - Voxel51/fisheye8k library_name: transformers license: mit tags: - generated_from_trainer pipeline_tag: object-detection model-index: - name: fisheye8k_Omnifact_conditional-detr-resnet-101-dc5 results: [] --- # fisheye8k_Omnifact_conditional-detr-resnet-101-dc5 This model is a fine-tuned version of [Omnifact/conditional-detr-resnet-101-dc5](https://huggingface.co/Omnifact/conditional-detr-resnet-101-dc5) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It is part of the **Mcity Data Engine** project. This model was presented in the paper [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614). ## Model description This model is a fine-tuned object detection model specifically designed for identifying objects within fisheye camera data, particularly relevant for **Intelligent Transportation Systems (ITS)**. It is a key artifact of the **Mcity Data Engine**, an open-source system that provides a complete data-based development cycle—from data acquisition to model deployment—for continuously improving machine learning models. The Mcity Data Engine focuses on addressing the challenge of detecting **rare and novel long-tail classes** in large amounts of unlabeled data through an **open-vocabulary data selection process**. This model checkpoint demonstrates the application of this iterative improvement framework to enhance perception capabilities in complex transportation environments. ## Intended uses & limitations ### Intended uses * **Object detection** in fisheye camera imagery within Intelligent Transportation Systems (ITS). * Identifying both common and **long-tail object classes** such as vehicles (Bus, Bike, Car, Truck) and Vulnerable Road Users (Pedestrian). * Integration into **iterative model improvement pipelines** using the Mcity Data Engine framework. * Research and development in autonomous driving and roadside perception, particularly for data-centric AI approaches. ### Limitations * Performance may vary on datasets significantly different from the training distribution (Fisheye8K), especially for camera types other than fisheye. * While designed for open-vocabulary data selection, the model's generalization to entirely novel or highly obscured objects may require further iterative data enrichment and fine-tuning. * Optimal performance is achieved when integrated within the continuous data improvement loop enabled by the Mcity Data Engine. ## Training and evaluation data This model was fine-tuned on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. The Fisheye8K dataset is specifically curated for object detection in fisheye camera images, capturing diverse urban and suburban scenarios relevant to intelligent transportation. The data originates from vehicle fleets and roadside perception systems, providing a rich source for training robust object detection models. ## Usage You can use this model directly with the Hugging Face `transformers` library for object detection. ```python from transformers import pipeline from PIL import Image import requests from io import BytesIO # Load the object detection pipeline model_id = "mcity-data-engine/fisheye8k_Omnifact_conditional-detr-resnet-101-dc5" detector = pipeline("object-detection", model=model_id) # Example image (replace with your fisheye image or a relevant ITS image) # This example uses a generic image. For best results, use an image from the model's domain. url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/conditional_detr_image.png" response = requests.get(url) image = Image.open(BytesIO(response.content)).convert("RGB") # Perform inference predictions = detector(image) # Print detected objects for pred in predictions: print(f"Label: {pred['label']}, Score: {pred['score']:.2f}, Box: {pred['box']}") # Example output format: # [{'box': {'xmin': 10, 'ymin': 20, 'xmax': 100, 'ymax': 120}, 'score': 0.98, 'label': 'Car'}] ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 0 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 36 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:|\ | 1.0147 | 1.0 | 5288 | 1.5035 |\ | 0.9144 | 2.0 | 10576 | 1.4618 |\ | 0.8685 | 3.0 | 15864 | 1.3823 |\ | 0.8375 | 4.0 | 21152 | 1.5128 |\ | 0.7715 | 5.0 | 26440 | 1.5045 |\ | 0.7664 | 6.0 | 31728 | 1.6914 |\ | 0.7073 | 7.0 | 37016 | 1.6101 |\ | 0.6966 | 8.0 | 42304 | 1.6175 | ### Framework versions - Transformers 4.48.3 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.21.0 ## Links * **Paper**: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614) * **Project Documentation**: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/) * **GitHub Repository**: [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine) * **Google Colab Demo**: [Mcity Data Engine Web Demo](https://colab.research.google.com/github/mcity/mcity_data_engine/blob/main/fish_eye_8k_colab.ipynb) ## Acknowledgements Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. ## Citation If you use the Mcity Data Engine in your research, feel free to cite the project: ```bibtex @article{bogdoll2025mcitydataengine, title={Mcity Data Engine}, author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory}, journal={GitHub. Note: https://github.com/mcity/mcity_data_engine}, year={2025} } ```