IMPLEMENTED CNN BASED SINGLE SHOT MEAL DETECTOR (SSMD) BASED ON DARKNET ARCHITECTURE.
Developed a single shot meal detector (SSMD) system which is capable of detecting 14 meal classes, while localizing them in the image or video sequence using a bounding box. SSMD is based on darknet architecture, which has been proven to be fast and accurate to perform multi-object detections.
TRAINING DATASET:
The 14 meal classes were: salad, pasta, hotdog, frenchfries, burger, apple, banana, broccoli, pizza, egg, tomato, rice, strawberry, and cookie.
A combination of hand-labelled images along with labelled images from Imagenet and UEC256 food dataset were used for our training dataset.
NETWORK ARCHITECTURE:
Fully convolutional neural network based on darknet architecture was utilized to train SSMD. IIt contains 53 convolutional layers only making it a fully convolutional network (FCN), each followed by batch normalization layer and Leaky ReLU activation. No form of pooling was used, and a convolutional layer with stride 2 was used to downsample the feature maps.
This helps in preventing loss of low-level features often attributed to pooling.
SSMD is invariant to the size of the input image and can work flawlessly on images as well as video sequences.
OUTPUT:
SSMD was trained on the darknet architecture with YOLOv3 algorithm
CONCLUSION:
A single shot meal detector was trained based on darknet architecture.
The meal detector was trained using 7532 images for 21300 epochs.
It is capable of detecting 14 classes in an image or a video sequence within 0.1-0.2s.
SSMD provides the class label, prediction confidence and bounding box for each detected meal object.
An average true detection rate of 68% and average true detection rate of 98% across all 14 classes on the competition dataset was shown.
RESULTS:
Full results can be found in the report attached below: