Truck logo recognition looks like an easy classification, just like classifying a car or a truck in the vehicle data collection. But in reality, it is a multi-class classification problem of classifying trucking company logos.
Here is how we looked at it. We started with Image classification. Image classification is classifying an entire image into its classes. If there are two pictures, one has a truck in it, and the other has a car. We classify the image with the truck as truck and the image with the car as a car. Image classification with localization is classifying an image with a class and also finding the location of the object in that image. The location of the object is found by drawing a bounding box around it.
For this, we took ten different truck logos. We took 700 images for training and 300 images as a validation dataset. The images were trained with two models, MobileNet V2 and Yolo V4.
MobileNet V2
MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build lightweight deep neural networks. MobileNet neural networks occupy less space. The size of MobileNets is 17 Mb. And the number of parameters is 4.2 million. Because of the small size, MobileNets are used in mobile devices.
MobileNets are faster and smaller than many major neural networks. But there is a trade-off of less accuracy for the speed and size of these networks when compared to other models.
For this model, we performed image classification, and here are the results of MobileNet V2 :
The drawback with image classification is when there are multiple classes in a single image the model predicts the image as the class which has more accuracy. Also it is difficult for the model to understand the class which is present in small portion of image, as there is noise present in the same image.
Let’s look at this image. The car is present in a small portion of this image, and its surroundings are considered as noise. So this becomes difficult for the model to learn that there is a car in this image, and the model gets confused.
This is not the same when we train the model for image classification with localization. The model, when trained for image classification with localization, learns the pattern where the object is present and ignores its surroundings. Another advantage is, the model can show multiple classes if they are present in the image.
Truck logo recognition works better when we do image classification with localization as there will be noise present in the image when the model is deployed in real-time.
Yolo V4
Yolo stands for Yolo Only Look Once and is a popular model. While training the model, we give an image and the coordinates of the class present in that image. The coordinates will look like this (bx, by, bh, bw). Here bx and by are the center of the class, bh and bw are the height and width of the class. The Yolo model has conv.net, which is a convolution layer, a deep neural network, and an output layer that has a softmax activation layer. The output variable is a vector and looks like the following.
Pc tells whether the object is present in the image or not. bx, by, bh and bw are the object coordinates. c1, c2, and c3 are the predictions of the classes.
The YOLO algorithm divides the input image into grid cells of order (nxn). After dividing into grids, it applies its algorithm to each of the grid cells.
Intersection Over Union is the evaluation factor that evaluates how well the box is drawn by the model.
If IoU is greater than or equal to 5, the prediction is good, else the prediction is bad.
There is a chance that one single object can have multiple predictions. This challenge can be solved by non-max suppression. If there are multiple predictions, the prediction with maximum Pc value is retained, and the rest are removed. The removal of other predictions is based on the IoU score. After considering the highest Pc value, the model evaluates using IoU. If IoU is greater than 5, it is safe to remove other predictions.
Here are the results of the Yolo V4 model :
Looking at the classification reports of both the models, we can tell Yolo performs better than MobileNet. The Yolo model is best suited for image classification with localization.