State-Of-The-Art Object Detection Using YOLOv5 and Custom Dataset!

YOLOv5 is lightweight, extremely easy to use, trains quickly, inferences quickly, and performs well.

YOLO is an acronym for “You Only Look Once”, it is considered the first choice for real-time object detection among many computer vision and machine learning experts and this is simply because of it’s the state-of-the-art real-time object detection algorithm in terms of performance(FPS), ease of use(setting up and configurations) and versatility(models can be coverted to serve different devices).

In this tutorial we are going to train a real-time object detection model using YOLOv5(latest but unofficial version of YOLO).

Let’s take a look at the brief history of YOLO just before we dive into the details of setting up and using YOLOv5 for training on custom datasets.

Joseph Redmon introduced the first version of YOLO in his May 2016 paper, “You Only Look Once: Unified, Real-Time Object Detection.”

The following year Joseph Redmon subsequently introduced YOLOv2 in a December 2017 paper titled “YOLO9000: Better, Faster, Stronger.”

He and his advisor also published “YOLOv3: An Incremental Improvement” in April 2018, this is by far the most popular and stable version of YOLO that it had some many forks on github and notetably Glenn Jocher(publisher and author of YOLOv5) who is unaffiliated with Joseph Redmon, created a popular YOLOv3 implementation in PyTorch.

Redmon announced he was stepping away from computer vision research in February 2020.

From that point forward, it became unclear who, if anyone should continue to use the name “YOLO” to refer to new model architectures. Some have considered YOLOv3 to be “the last YOLO.”

Alexey Bochkovskiy published YOLOv4 on April 23, 2020. He appears to be affiliated to Joseph Redmon and after the release Joseph Redmon gave a sort of endorment on or approval for YOLOV4

So few weeks later, On May 29, 2020, Glenn Jocher from ultralytics created a repository called YOLOv5 that didn’t contain any model code, and on June 9, 2020, he added a commit message to his YOLOv3 implementation titled “YOLOv5 greetings.”

Jocher’s YOLOv5 implementation differs from prior releases in a few notable ways. First, Jocher did not (yet) publish a paper to accompany his release. Second, Jocher implemented YOLOv5 natively in the Ultralytics PyTorch framework, which is very intuitive to use and inferences very fast where as all prior models in the YOLO family leveraged Darknet (an open source neural network framework written in C and CUDA, It is fast, easy to install, and supports CPU and GPU computation).

Jocher’s YOLOv5 repository is far from his first involvement in the YOLO project: he’s made 2,379 commits to his YOLOv3 (pytorch) implementation.

Notably, Jocher is also credited with creating mosaic data augmentation and including it in his YOLOv3 repository, which is one of the many novel data augmentations leveraged in YOLOv4. He is given an acknowledgement in the YOLOv4 paper.

(Note: Glenn Jocher’s YOLOv5 is under active development. Jocher has stated he plans to publish a YOLOv5 summary as a firmer checkpoint of performance later this year.)

YOLOv4 tops YOLOv5 in mAP on the COCO benchmark. (Credit: Bochkovskiy)
Performance of YOLOv5 vs EfficientDet (updated 6/23) (source)

Performance of YOLOv5 vs EfficientDet (updated 6/23) (source)

To learn more about YOLOv5 and its previous releases, visit their site on:, the official repo for YOLOv5 and also the official repo for YOLOv3

Now lets dive right in…

Training Of YOLOv5 Model with Custom Dataset

If you want to do your training on Google colab you can use the notebook from the official ultralytics repo here or you can use this excellent notebook provided by Roboflow here.

You can also download and explore code for this tutorial here.

  • Environmental set up and Installation of YOLOv5 dependencies
  • Preprocessing Custom Dataset
  • Define YOLOv5 Model Configuration and Architecture
  • Train a custom YOLOv5 Detector
  • Evaluate YOLOv5 performance
  • Visualize YOLOv5 training data
  • Run YOLOv5 Inference on test images
  • Export Saved YOLOv5 Weights for Future Inference

To start off with YOLOv5, we first have to clone the YOLOv5 repository and install dependencies. This will set up our programming environment to be ready to running object detection training and inference commands.

Requirements: Python 3.7 or later with all requirements.txt dependencies installed, including torch >= 1.5.

To install YOLOv5 and its dependencies run:

$ git clone # clone repo$ pip install -U -r yolov5/requirements.txt # install dependencies$ cd /content/yolov5 #change directory into project folder.

contents of requirements.txt file:


Note: For this tutorial purpose I have used macOS Catalina version 10.15 and I subsequently tested it on ubuntu 18.04 and I was able to run it no with bugs or errors.

You are ready to move into the next step if you have succeeded in installing YOLOv5 and its dependencies with no issues.

In other to train your object detector model using YOLOv5 then your custom datasets needs to be labeled and annotated in YOLO format.

Here are some custom object detection data in YOLOv5 format from Roboflow, you can use choose and download any dataset you want to use for this tutorial.

In this tutorial, I collected some images of antennas using a drone and I annotated and labeled these images with LabelImg.

Note: If you have unlabeled images, you will first need to label them. For free open source labeling tools, we recommend the following guides on getting started with LabelImg or getting started with CVAT annotation tools. Try labeling at least 100 images to proceed in this tutorial. To improve your model’s performance later, you will want to label more dataset.

You can watch this youtube video on how to label data using LabelImg.

Labeling custom images with LabelIMg

Note: After labeling you should split your datasets into training set, validation set and test set all in seperate folders and make sure you keep your annotations and images in the same directory. A recommended splitting percentage would be to keep 70% data in the training set, 20% in the validation set, and 10 % in the testing set. You should then copy the training and validation folders and paste them into the data directory in the project folder.

Converting Annotation Format

You can make use of the script below should in case you need to convert your annotation format from Pascal VOC XML to YOLOv5 txt format, for instance in my case after annotating and labeling my datasets(images) with LabelImg the annotation format was in XML so i used the script below to convert to txt format which YOLOv5 supports. This script also automatically generates two text files which contains the full path of all the images in the training and validation folders, these text files are referenced in the configuration yaml file which we will get to in a bit.

Note: You should have this script in the same directory with your training and validation dataset(they should all be inside data folder which is inturn inside yolov5 project folder).

Here is my modified version of the script:

import glob
import os
import pickle
import xml.etree.ElementTree as ET
from os import listdir, getcwd
from os.path import join
dirs = ['train', 'val']
classes = ['antenna']
def getImagesInDir(dir_path):
image_list = []
for filename in glob.glob(dir_path + '/*.jpg'):
return image_list
def convert(size, box):
dw = 1./(size[0])
dh = 1./(size[1])
x = (box[0] + box[1])/2.0 - 1
y = (box[2] + box[3])/2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
def convert_annotation(dir_path, output_path, image_path):
basename = os.path.basename(image_path)
basename_no_ext = os.path.splitext(basename)[0]
in_file = open(dir_path + '/' + basename_no_ext + '.xml')
out_file = open(output_path + basename_no_ext + '.txt', 'w')
tree = ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult)==1:
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')

b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
cwd = getcwd()for dir_path in dirs:
full_dir_path = cwd + '/' + dir_path
output_path = full_dir_path +'/'
if not os.path.exists(output_path):
image_paths = getImagesInDir(full_dir_path)
list_file = open(full_dir_path + '.txt', 'w')
for image_path in image_paths:
list_file.write(image_path + '\n')
convert_annotation(full_dir_path, output_path, image_path)
print("Finished processing: " + dir_path)

Before training you need to modify the YAML file which specifies the location or path of thetrainingfolder and validation folder, and also information on our the names and number of classes.

#train and val datasets (image directory or *.txt file with image paths)
train: /Users/macbook/Desktop/antenna/yolov5/data/train.txt
val: /Users/macbook/Desktop/antenna/yolov5/data/val.txt
# number of classes
nc: 1
# class names
names: ['antenna']

In my case the configuration file was named antenna.yaml, it is model configuration file for our custom YOLOv5 object detector. For this tutorial, we chose the smallest and fastest base model of YOLOv5 which is yolov5s. You have the option to pick any of the four YOLOv5 models which includes:

  1. yolov5-s
  2. yolov5-m
  3. yolov5-l
  4. yolov5-x
comparisons between YOLOv5 models and EfficientDet

For further comparison of YOLOv5 models you can check here.

Next, you should download pretrained weights for transfer learning from Ultralytics Google Drive folder. After downloading your prefered model, move the downloaded model weight into the weight folder that is in the root project folder.

Next, lets be sure to modify the YAML configuration file of the corresponding version of YOLOv5 model that you just downloaded and choose to use for your training, from your YOLOv5 folderchange directory to models folder and change number of class nc to 1.

# parameters
nc: 1 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
# anchors
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32

In my case i used yolov5so I opened the yolov5s.yaml file and changed the number of classes to 1 since I am trying to detect only one object(antenna) at this time.

Finally, we we have everything well-configured and ready to begin training our custom YOLOv5 object detector model on custom datasets.

To kick off training we run the training command with the following options:

  • img: define input image size
  • batch: determine batch size
  • epochs: define the number of training epochs. (Note: often, 3000+ are common here!)
  • data: set the path to our yaml file
  • cfg: specify our model configuration
  • weights: specify a custom path to weights
  • name: result names
  • nosave: only save the final checkpoint
  • cache: cache images for faster training
  • device: to select the training device, “0” for GPU, and “cpu” for CPU.

First of, make sure you change directory to the root project directory and run the training command below:

$ python3 --img 640 --batch 1 --epochs 30 --data ./data/antenna.yaml --cfg ./models/yolov5s.yaml --weights --device cpu

The command above will successully start the training processs if all the steps were done correctly. I used'1'as my batch size and I trained my model for 30 epochs, however you can increase your batch size and the number of training epochs for even better performance.

During training, you want to be watching the mAP@0.5 to see how your detector is performing — see this post on breaking down mAP.

Once we have completed training the trained model will be saved in your “weights” folder/directory and then we evaluate how well the training procedure performed by looking at the validation metrics. The training script will drop tensorboard logs in runs. We visualize those here:

Visualizing tensorboard results on our custom dataset

And if you can’t visualize Tensorboard for whatever reason the results can also be plotted with utils.plot_results and saving a result.png

You want to take the trained model weights at the point where the validation mAP reaches its highest.

During training, the YOLOv5 training pipeline creates batches of training data with augmentations. We can visualize the training data ground truth as well as the augmented training data.

training data ground truth
augmented training data

Now we take our trained model and make inference on test images. For inference we invoke those weights along with a conf specifying model confidence (higher confidence required makes less predictions), and a inference source. source can accept a directory of images, individual images, video files, and also a device's webcam port. For source, I have moved mytest1/*jpg to test_infer/.

python3 --source ./inference/images/test1.jpg  --weights weights/ --conf 0.5

Your results will be saved in theinference directory.


I hoped you enjoyed following my tutorial in training your custom YOLOv5 detector and found it useful.

You can fork and explore code for this tutorial here.

Please if you have any feedback or suggestions you can leave them in the response section.

Connect with me on Linkedin , Twitter and Github.

Special thanks to Ultralytics and Roboflow.

Thanks a lot for following my tutorial.

Engineer | Developer | Writer