This code repository focuses on detecting fires from both visible light and infrared images.
- A multi-modal model, including infrared and visible light.
- The lightweight model is trained by using YOLOv8n through the D-Fire data set and our own processed flame2 data set, with excellent performance and small parameter amount, which is easy to deploy.
- We used our own frame matching method to optimize the identification of inter-frame leaks in infrared fire point detection. Links to relevant papers will be provided in the future.
We used the YOLOv8 model for training on the self-made infrared dataset and the D-Fire dataset, which can realize multi-modal fire point recognition and smoke detection. While ensuring high accuracy, we adjusted the relevant hyperparameters to ensure that the model also has high efficiency and inference speed.
The details of the training model and the test model are given in the README.md file in the train_models.
The dataset of the visible light model of this project is the processed and data-enhanced D-Fire dataset, and we provide the relevant data-augmented code part, the unmodified D-Fire dataset is here. You can also download the processed dataset directly.
Visible light dataset:link:https://pan.baidu.com/s/14C1ePeKg6NYoMlIfsJ9lEg
password:0n87
The infrared dataset is obtained by extracting the infrared video from the FLAME2 dataset through frame extraction and binarization labeling, and we provide the relevant code and also provide the processed dataset.
Infrared dataset:link:https://pan.baidu.com/s/1kl5r-iN5jHN2gYKWQHTdew
password:jx3r
ultralytics==8.0.136
streamlit==1.24.0
py-cpuinfo
opencv-python==4.8.1.78
numpy==1.24.3
matplotlib==3.7.4
albumentations==1.3.1
torchvision==0.16.0
The specific process is as follows:
- The video data is cut into pictures every 10 frames in advance, and then processed.
- Firstly, the original image is converted into a grayscale image, and then binarization is used to display the fire point in the image.
- The position of the fire point in the image is obtained through edge detection, and then labeled data is obtained by adjacent box fusion as Dataset v1.
- Model v1 is trained using Dataset v1, and the knowledge transfer ability of Model v1 is used to recognize the original image.
- The recognition result is used as Dataset v2 to train Model v2, introducing a Frame Matching Algorithm.
- Dataset v3 is obtained after processing Dataset v2 with the Frame Matching Algorithm, and finally the final Model v3 is obtained by training Dataset v3.
- Read annotation data of two frames simultaneously: the previous frame and the current picture.
- Use the area target indicated by the position of the annotation box in the previous frame as the matching target in the current picture.
- Traverse and match the target frame in the previous frame image to obtain all targets in the previous frame picture.
- Filter the position in the current frame using the IoU algorithm to remove duplicate marked boxes and retain missing boxes.
- Sequentially perform this process to utilize spatial information from previous and subsequent frame images.
- By reversing the input sequence of video images and repeating the process, an image dataset with similar target information in the previous and subsequent frames can be obtained.
| visible light Detection | mAP50 | mAP50-95 | precision | recall | miss rate |
|---|---|---|---|---|---|
| Fire | 74.7% | 42.8% | 77.0% | 68.1% | 31.9% |
| Smoke | 86.1% | 56.1% | 87.8% | 78.0% | 22.0% |
| Different IR Fire Detection | mAP50 | mAP50-95 | precision | recall | miss rate |
|---|---|---|---|---|---|
| Model v1 | 92.3% | 67% | 86.1% | 87.7% | 12.3% |
| Model v2 | 91.7% | 64.8% | 86.0% | 87.4% | 12.6% |
| Model v3 | 93.6% | 71.8% | 88.9% | 86.6% | 13.4% |






