--- base_model: jozhang97/deta-swin-large datasets: - Voxel51/fisheye8k library_name: transformers tags: - generated_from_trainer pipeline_tag: object-detection license: mit model-index: - name: fisheye8k_jozhang97_deta-swin-large results: [] --- # fisheye8k_jozhang97_deta-swin-large This model is a fine-tuned version of [jozhang97/deta-swin-large](https://huggingface.co/jozhang97/deta-swin-large) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It was developed as part of the **Mcity Data Engine** project, an open-source system designed for iterative model improvement through open-vocabulary data selection. It achieves the following results on the evaluation set: - Loss: 17.9701 **Paper**: [Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://huggingface.co/papers/2504.21614) **Project Page**: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/) **Code**: [GitHub Repository](https://github.com/mcity/mcity_data_engine) ## Model description This model is a key component of the **Mcity Data Engine**, a comprehensive, open-source system for the complete data-based development cycle of machine learning models. It specifically targets challenges in Intelligent Transportation Systems (ITS), where the goal is to detect rare and novel classes in vast amounts of unlabeled data, such as those generated by vehicle fleets and roadside perception systems. This `fisheye8k_jozhang97_deta-swin-large` model is an object detection model fine-tuned using the Mcity Data Engine's methodologies. It focuses on identifying specific object categories relevant to ITS, trained on data from fisheye cameras. The engine facilitates iterative model improvements by intelligently selecting and labeling data, especially for long-tail classes. ## Intended uses & limitations **Intended Uses**: This model is primarily intended for object detection tasks within Intelligent Transportation Systems (ITS). It is designed to identify objects such as `Bus`, `Bike`, `Car`, `Pedestrian`, and `Truck` in visual data, particularly from fisheye camera perspectives, as part of the iterative data selection and model training processes facilitated by the Mcity Data Engine. It serves as a practical demonstration and artifact of the engine's capabilities. **Limitations**: As a model fine-tuned on a specific dataset (Fisheye8K), its performance may vary when applied to datasets with significantly different characteristics, environmental conditions, or object distributions. Its optimal utility is achieved when integrated within the broader Mcity Data Engine framework for continuous improvement and adaptation to novel classes. ## Training and evaluation data This model was fine-tuned on the [Voxel51/fisheye8k](https://huggingface.co/datasets/Voxel51/fisheye8k) dataset. This dataset is crucial for the model's application in Intelligent Transportation Systems, providing data from fisheye cameras. The training process leverages the open-vocabulary data selection capabilities of the Mcity Data Engine to identify and incorporate relevant samples, including rare and long-tail classes. The model detects the following classes: `Bus`, `Bike`, `Car`, `Pedestrian`, `Truck`. ## Sample Usage You can use this model directly with the Hugging Face `transformers` library for object detection: ```python import torch from transformers import AutoImageProcessor, AutoModelForObjectDetection from PIL import Image import requests # Load an example image (replace with your fisheye image if available) # This example uses a standard COCO image for demonstration purposes. url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw).convert("RGB") # Load image processor and model from the Hugging Face Hub model_name = "jozhang97/fisheye8k_jozhang97_deta-swin-large" image_processor = AutoImageProcessor.from_pretrained(model_name) model = AutoModelForObjectDetection.from_pretrained(model_name) # Process image and get predictions inputs = image_processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) # Post-process outputs to get bounding boxes, labels, and scores target_sizes = torch.tensor([image.size[::-1]]) # (height, width) for post-processing results = image_processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0] # Print detected objects print("Detected objects:") for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): box = [round(i, 2) for i in box.tolist()] print( f" Detected {model.config.id2label[label.item()]} " f"with confidence {round(score.item(), 3)} at location {box}" ) ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 0 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 36 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 13.7551 | 1.0 | 5288 | 17.5573 | | 12.6537 | 2.0 | 10576 | 17.4879 | | 12.023 | 3.0 | 15864 | 17.6520 | | 11.4167 | 4.0 | 21152 | 18.5138 | | 10.8161 | 5.0 | 26440 | 17.7264 | | 10.5346 | 6.0 | 31728 | 17.9145 | | 10.1203 | 7.0 | 37016 | 17.9701 | ### Framework versions - Transformers 4.48.3 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.21.0 ## Acknowledgements Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support! ## Citation If you use the Mcity Data Engine in your research, feel free to cite the project: ```bibtex @article{bogdoll2025mcitydataengine, title={Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection}, author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory}, journal={arXiv preprint arXiv:2504.21614}, year={2025} } ```