Skip to content

Model Training Workflow

Jim Groot edited this page Sep 8, 2020 · 3 revisions

About:

  • Workflow for model training.
  • To train a model for object detection there are certain steps that need to process one by one.
  • This workflow is to automate that process using a python wrapper.
  • The steps involved in model training are:
    1. Generate tf-record formate of training data and testing data.
    2. Downloading base model files.
    3. update the training config file for the path, steps, learning rate..etc.
    4. start model training.
    5. Running evaluation with the latest checkpoint.
    6. Freezing the model.
  • The naming convention of the output_dir is: "(model_version)_(architecture_name)"
  • The output file contains the following files and directory:
    1. checkpoint for each iteration of dataset except for the last one as it is already in the training folder
    2. Frozen graph file.
    3. Eval dir containing tensorboard events for evaluation and visualization on test data.
    4. Meta_data.txt having meta-information about the model.
    5. evaluation_results.json having evaluations matrices.
    6. model training config file.
    7. commands.txt having commands to train model locally but it requires path update.

List of available architecture on s3:

  • faster_rcnn_inception_resnet_v2_atrous_coco(batch size -1)
  • faster_rcnn_resnet101_coco_model(batch size -1)
  • ssd_resnet50_fpn_coco_model(batch size-1)
  • ssd_inception_v2_coco_model(batch size - 16)
  • rfcn_resnet101_coco_model(batch size - 1)
  • faster_rcnn_resnet50_coco_model(batch size - 1)
  • faster_rcnn_nas(batch size-1)
  • ssd_mobilenet_v1_coco(batch size -16)
  • ssd_mobilenet_v2_coco(batch size -16)
  • ssd_mobilenet_v1_fpn_coco_model(batch size- 16)
  • faster_rcnn_resnet101_kitti(batch size - 1)

Clone this wiki locally