Skip to content

csahithi/object-detection-and-tracking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Object Detection and Tracking

Object detection and tracking are important computer vision tasks to identify and track objects within video footage or images. Deep Learning models such as SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once) are trained to recognize specific objects in varying environments, ensuring accurate identification in real-time applications. Object tracking, on the other hand, focuses on maintaining the identity of these detected objects as they move across multiple frames. Algorithms like SORT (Simple Online and Realtime Tracking) and DeepSORT are frequently used to track objects and address challenges like obstruction, overlap, and changes in movement patterns. This project aims to develop a real-time object detection and tracking system to count pedestrians and vehicles in traffic footage. Leveraging advanced deep learning models like YOLOv7, the system will automatically detect people and cars within video streams, ensuring accurate identification across various traffic environments. The system will deliver reliable performance under different lighting and weather conditions by fine-tuning these models on specific datasets. The tracking module will utilize the DeepSORT algorithm to maintain object identities across multiple frames, ensuring accurate counting as people and cars move through a defined region of interest (ROI). This system will support real-time traffic monitoring, providing valuable insights for urban planners and traffic managers to enhance road safety, reduce congestion, and optimize traffic flow.

Implementation

Data Preparation

The project utilizes two prominent datasets:

  1. COCO dataset - Features over 200,000 labeled images with annotations for 80 object categories, including cars and people, making it suitable for training models to recognize objects in various traffic scenarios. Its diverse urban scenes help the model generalize across different lighting conditions and interactions.
  2. PASCAL VOC dataset - Provides additional labeled images focused on 20 object categories, with annotations for both detection and segmentation tasks. Together, these datasets form a solid foundation for developing the object detection and tracking system, enhancing accuracy and effectiveness in real-time traffic analysis. We will use the COCO and VOC datasets to train and fine-tune the YOLOv7 model. These datasets contain labeled data with bounding boxes for cars, trucks, buses, and other vehicle types, providing robust training samples for the model.

Model Training

The YOLOv7 model will initially undergo pre-training on the COCO dataset, followed by fine-tuning with the VOC dataset to enhance accuracy for vehicle-specific object categories. Transfer learning methods will be used to adjust the pre-trained model for the traffic video domain. The fine-tuning process will concentrate on important object classes, including cars, trucks, and buses.

Object Detection and Tracking

• Detection: YOLOv7’s bounding box predictions will identify vehicles in each video frame. We’ll utilize three scales to detect objects of different sizes, which is crucial for handling the varying distances of vehicles from the camera. • Tracking: The SORT algorithm, combined with a Deep Association Metric (Deep SORT), will be utilized to link detected vehicles across frames. This approach ensures seamless tracking of each vehicle as it moves through the scene, allowing for precise counting.

Counting Mechanism

The system will be optimized for real-time performance, allowing efficient object detection, tracking, and counting to operate smoothly.

How to run the project

cd yolov7 `
python detect_count_and_track.py --weights models/yolov7.pt --conf 0.55 --source data/test1.mp4 --view-img --nosave --no-trace

Related Work

Traffic surveillance now-a-days is leveraging object detection and tracking with multiple state-ofthe- art models such as Fast R-CNN, SSD, and YOLOv3. These are proven to be effective in real-time scenarios too. To balance the trade-off between speed and accuracy, recent works have also used YOLOv3 and hence making it apt for the real-time applications. Also, there are quite a few datasets that are known for evaluating the performance of deep learning models in object detection such as the COCO dataset and VOC dataset. The COCO dataset is known for its large-scale object detection, segmentation, and captioning tasks and the VOC dataset is known for object classification and detection. Key studies include:

  1. Redmon et al. (2018) introduced YOLOv3, which achieves state-of-the-art results by predicting bounding boxes at three different scales for better detection of small objects. This is better or ideal for vehicle detection.
  2. Bochkovskiy et al. (2020) improved on YOLOv3 with YOLOv4 where the accuracy and the speed of detection significantly increased, but YOLOv3 remains a great choice for real-time tracking due to its optimal trade-off between speed and resource usage.
  3. Other approaches, such as those using SSD (Single Shot Detector) or Fast R-CNN, provide significantly higher accuracy in some scenarios but are slower than YOLOv3, which makes them less suitable for real-time applications. The system will be optimized for real-time performance, allowing efficient object detection, tracking, and counting to operate smoothly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •