Detecting Pests Using Pre-Trained SSD Models

Sergey L. Gladkiy

5.00/5 (3 votes)

Dec 16, 2020

CPOL

4 min read

9929

121

In the next article, we’ll use a pre-trained DNN to detect pests on video.

Introduction

Unruly wildlife can be a pain for businesses and homeowners alike. Animals like deer, moose, and even cats can cause damage to gardens, crops, and property.

In this article series, we’ll demonstrate how to detect pests (such as a moose) in real time (or near-real time) on a Raspberry Pi and then take action to get rid of the pest. Since we don’t want to cause any harm, we’ll focus on scaring the pest away by playing a loud noise.

You are welcome to download the source code of the project. We are assuming that you are familiar with Python and have a basic understanding of how neural networks work.

In the previous article in the series, we compared two DNN types we can use to detect pests: detectors and classifiers. The detectors won. In this article, we’ll develop Python code for detecting pests using a pre-trained detection DNN.

Selecting Network Architecture

There are several common network architectures for object detection, such as Faster-RCNN, Single-Shot Detector (SSD), and You Only Look Once (YOLO).

Since our network needs to run on an edge device that has limited memory and CPU, we’re going to use the MobileNet Single Shot Detector (SSD) architecture. MobileNet SSD is a lightweight object detector network that performs well on mobile and edge devices. It was trained on the Pascal VOC 2012 dataset, which contains some classes that may represent pests, such as cat, cow, dog, horse, and sheep.

We’ll use the same algorithm for pest detection on video as the algorithm used for human detection in this prior article series.

Code for Pest Detection

First, we need to modify the MobileNet code to make it detect pests.

Let’s start by creating some utility classes to make this task easier:

import cv2
import numpy as np
import os

class CaffeModelLoader:    
    @staticmethod
    def load(proto, model):
        net = cv2.dnn.readNetFromCaffe(proto, model)
        return net

class FrameProcessor:    
    def __init__(self, size, scale, mean):
        self.size = size
        self.scale = scale
        self.mean = mean
    
    def get_blob(self, frame):
        img = frame
        (h, w, c) = frame.shape
        if w>h :
            dx = int((w-h)/2)
            img = frame[0:h, dx:dx+h]
            
        resized = cv2.resize(img, (self.size, self.size), cv2.INTER_AREA)
        blob = cv2.dnn.blobFromImage(resized, self.scale, (self.size, self.size), self.mean, False, False)
        return blob

class Utils:    
    @staticmethod
    def draw_object(obj, label, color, frame):
        (confidence, (x1, y1, w, h)) =  obj
        x2 = x1+w
        y2 = y1+h
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
        y3 = y1-12
        text = label + " " + str(confidence)+"%"
        cv2.putText(frame, text, (x1, y3), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1, cv2.LINE_AA)
        
    @staticmethod
    def draw_objects(objects, label, color, frame):
        for (i, obj) in enumerate(objects):
            Utils.draw_object(obj, label, color, frame)

The CaffeModelLoader class loads a Caffe model from disk using the provided paths for prototype and model files.

The next utility class, FrameProcessor, converts frames to blobs (specially structured data used as CNN input).

Finally, the Utils class draws bounding rectangles around any objects detected in a frame. Most of the methods our utility classes use come from the Python version of the OpenCV library. Let’s look at these in detail.

That’s it for our utility classes. Next, we’ll write code that actually detects pests.

Well start with the SSD class, which detects objects of a specified class in a frame:

class SSD:    
    def __init__(self, frame_proc, ssd_net):
        self.proc = frame_proc
        self.net = ssd_net
    
    def detect(self, frame):
        blob = self.proc.get_blob(frame)
        self.net.setInput(blob)
        detections = self.net.forward()
        # detected object count
        k = detections.shape[2]
        obj_data = []
        for i in np.arange(0, k):
            obj = detections[0, 0, i, :]
            obj_data.append(obj)
            
        return obj_data

    def get_object(self, frame, data):
        confidence = int(data[2]*100.0)
        (h, w, c) = frame.shape
        r_x = int(data[3]*h)
        r_y = int(data[4]*h)
        r_w = int((data[5]-data[3])*h)
        r_h = int((data[6]-data[4])*h)
        
        if w>h :
            dx = int((w-h)/2)
            r_x = r_x+dx
        
        obj_rect = (r_x, r_y, r_w, r_h)
        
        return (confidence, obj_rect)
        
    def get_objects(self, frame, obj_data, class_num, min_confidence):
        objects = []
        for (i, data) in enumerate(obj_data):
            obj_class = int(data[1])
            obj_confidence = data[2]
            if obj_class==class_num and obj_confidence>=min_confidence :
                obj = self.get_object(frame, data)
                objects.append(obj)
                
        return objects

The key methods in the class are detect and get_objects.

The detect method applies the loaded DNN model to each frame to detect objects of all possible classes.

The get_objects method looks at the detected objects and selects only those that both belong to the specified class and have a high probability of being correctly detected (confidence).

Then, we’ll the VideoSSD class, which runs pest detection on an entire video clip:

class VideoSSD:    
    def __init__(self, ssd):
        self.ssd = ssd
    
    def detect(self, video, class_num, min_confidence, class_name):
        detection_num = 0;
        capture = cv2.VideoCapture(video)
        img = None

        dname = 'Pest detections'
        cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(dname, 1280, 960)
       
        # Capture all frames
        while(True):    
            (ret, frame) = capture.read()
            if frame is None:
                break
        
            obj_data = self.ssd.detect(frame)
            class_objects = self.ssd.get_objects(frame, obj_data, class_num, min_confidence)
            p_count = len(class_objects)
            detection_num += p_count
            
            if len(class_objects)>0:
                Utils.draw_objects(class_objects, class_name, (0, 0, 255), frame)
            
            # Display the resulting frame
            cv2.imshow(dname,frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            
        capture.release()
        cv2.destroyAllWindows()    
        
        return detection_num

The only method in the class is detect. It processes all the frames extracted from a video file. In each frame, it detects all objects of the class specified by the class_num parameter and then displays the frame with bounding rectangles around the objects it detected.

Does It Work?

Let’s launch our code and see how it handles a video file. The following code loads a video file and tries to detect dogs:

proto_file = r"C:\PI_PEST\net\mobilenet.prototxt"
model_file = r"C:\PI_PEST\net\mobilenet.caffemodel"
ssd_net = CaffeModelLoader.load(proto_file, model_file)

mobile_proc_frame_size = 300
ssd_proc = FrameProcessor(mobile_proc_frame_size, 1.0/127.5, 127.5)

pest_class = 12
pest_name = "DOG"

ssd = SSD(ssd_proc, ssd_net)

video_file = r"C:\PI_PEST\video\dog_1.mp4"

video_ssd = VideoSSD(ssd)
detections = video_ssd.detect(video_file, pest_class, 0.2, pest_name)

We set the value of pest_class to 12 because "dog" is the 12^th class in the MobileNet SSD model. Here is the video captured while running the above code.

Will It Work on an Edge Device?

As you can see, our SSD detector successfully detected dogs in the video when run on a PC. What about an edge device? Will the detector process the feed fast enough to detect objects in real-time? We can find out by testing the frame rate, measured in frames per second (FPS).

In the article we’d quoted before, the model we borrowed ran at about 1.25 FPS on a Raspberry Pi 3 device. Is that enough to detect pests? We can assume that, on average, an animal would be captured on camera for at least 2 to 3 seconds. That means we’ll have 2 to 3 frames to detect a pest and react to itk. Sounds like decent odds.

Next Steps

So far, the results aren’t very promising for wildlife detection... But let’s not give up!

In the next article, we’ll talk about some ideas for detecting "exotic" pests, such as moose and armadillos.