Reading Time: 5 minutes

Introduction and background on YOLOR

YOLOR (You Only Learn One Representation) is an object detection algorithm released in 2021. YOLOR encodes implicit knowledge and explicit knowledge simultaneously using a unified network. Object detection deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. It is simply using object recognition with image segmentation to create labeled bounding boxes to identify and mark categorizations of objects in images and video.

How YOLOR accounts for implicit knowledge:

Manifold Space Reduction:

The authors assert that a good representation should be able to find an appropriate projection in the manifold space to which it belongs(1) For example, as shown in the figure below, if the target categories can be successfully classified by the hyper-plane in the projection space, that will be the best outcome. (1)

Kernel Alignment:

From the figure below (without alignment), kernel space misalignment is a frequent headache with multi-task and multi-head neural networks. To deal with this problem, the authors suggest, performing addition and multiplication of output feature and implicit representation, so that Kernel space can be translated, rotated, and scaled to align each output kernel space of neural networks, as shown in Figure (b). (1)

Capabilities of YOLOR

YOLOR agorithm is able to detect and classify objects before surrounding them with a labeled bounding box. From the figure below, YOLOR, when trained on the same dataset with YOLOX and Scaled-YOLOv4, performs better in terms of both batch 1 latency and average precision.

You can read more on YOLOR here.

Train a YOLOR model, and use it to detect stamps

Install YOLOR Dependencies

Clone the base YOLOR repository and download the necessary requirements:

# clone YOLOR repository
!git clone
%cd yolor
!git reset --hard eb3ef0b7472413d6740f5cde39beb1a2f5b8b5d1

# install dependencies as necessary
!pip install -qr requirements.txt

Proceed to install Mish CUDA to be able to use the Mish activation:

# Install Mish CUDA
!git clone
%cd mish-cuda
!git reset --hard 6f38976064cbcc4782f4212d7c0c5f6dd5e315a8
!python build install
%cd ..

To use the DWT downsampling module, we will need PyTorch wavelelts:

# Install PyTorch Wavelets
!git clone
%cd pytorch_wavelets
!pip install .
%cd ..

Custom YOLOR Object Detection Data

You can use any public dataset available, or create yours to be used in training. For our case, we will use Roboflow to creae the stamp dataset and labell the images. You are free to use any other lebelling tool you are comfortable with.

Labeling Images

Note: the public dataset has images that are already labeled. If you do not have labeled images you can easily label in Roboflow. This is a pre-requisite to training your custom object detector.

CLick on the version of the daaset you have prepared, This shows a pop-up, select the YOLOv5 PyTorch dataset format. Select “show download code” and continue:

You will then want to get the Jupyter Notebook command generated and replace this line in the notebook with your new command:

!curl -L "[YOUR-KEY-HERE]" >; unzip; rm

Prepare Pre-Trained Weights for YOLOR

YOLOR comes with some pre-trained weights, download the implicit and explicit weights of the network by the shell script provided:

%cd /content/yolor
!bash scripts/

Run YOLOR Training

Run the training with the following options:

  • img: define input image size
  • batch: determine batch size
  • epochs: define the number of training epochs. (Note: often, 3000+ are common here!)
  • data: set the path to our yaml file (This is provided when we downloaded the dataset from Roboflow)
  • cfg: specify our model configuration
  • weights: specify a custom path to weights. (Note: We can specify the pretrained weights we downloaded up above with the shell script)
  • name: result names
  • hyp: Define the hyperparamters for training

And run the training command:

!python --batch-size 8 --img 416 416 --data '../data.yaml' --cfg cfg/yolor_p6.cfg --weights '/content/yolor/' --device 0 --name yolor_p6 --hyp '/content/yolor/data/hyp.scratch.1280.yaml' --epochs 50

Export Saved YOLOR Weights for Future Inference

Export the YOLOR detector for future inference in other projects:

from google.colab import drive

%cp /content/yolor/runs/train/yolor_p6/weights/ /content/gdrive/My\ Drive

Example stamp detection

With the detecter in place let us write a python script to detect our stamps:

import cv2
import numpy as np
import time
import datetime
from google.colab.patches import cv2_imshow
import os
detection function

def imageM(pathf, pathff):
    confidenceThreshold = 0.5
    NMSThreshold = 0.3

    modelConfiguration = 'darknet/cfg/custom-yolov4-detector.cfg'
    modelWeights = 'darknet/backup/custom-yolov4-detector_last.weights'

    labelsPath = 'darknet/data/coco.names'
    labels = open(labelsPath).read().strip().split('\n')

    COLORS = np.random.randint(0, 255, size=(len(labels), 3), dtype="uint8")

    net = cv2.dnn.readNetFromDarknet(modelConfiguration, modelWeights)

    image = cv2.imread(pathf)
    (H, W) = image.shape[:2]

    #Determine output layer names
    layerName = net.getLayerNames()
    layerName = [layerName[i[0] - 1] for i in net.getUnconnectedOutLayers()]

    start = time.time()
    blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB = True, crop = False)
    layersOutputs = net.forward(layerName)
    end = time.time()

    boxes = []
    confidences = []
    classIDs = []

    for output in layersOutputs:
        for detection in output:
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
            if confidence > confidenceThreshold:
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY,  width, height) = box.astype('int')
                x = int(centerX - (width/2))
                y = int(centerY - (height/2))

                boxes.append([x, y, int(width), int(height)])

    #Apply Non Maxima Suppression
    detectionNMS = cv2.dnn.NMSBoxes(boxes, confidences, confidenceThreshold, NMSThreshold)

    if(len(detectionNMS) > 0):
        for i in detectionNMS.flatten():
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])

            color = [int(c) for c in COLORS[classIDs[i]]]
            cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
            conff = round(confidences[i]*100,2)
            text = '{}: {:.4f}'.format(labels[classIDs[i]], conff)
            cv2.putText(image, text+ "%", (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

detection function

for subdir, dirs, files in os.walk(r'trainSet'):
    for filename in files:
        filepath = subdir + os.sep + filename
        imageM(filepath, filename)
detection function


Categorized in:

Tagged in: