Here we provide short explanation of the requirements for the dataset. Then we suggest approaches to collect the data: searching images on the Internet, searching video and upload frames from it. Then we provide references for some found videos. Then we explain the base steps of data collecting using available tools. Then we provide Python code for data preprocessing. Finally, we demonstrate the collected image examples.
Unruly wildlife can be a pain for businesses and homeowners alike. Animals like deer, moose, and even cats can cause damage to gardens, crops, and property.
In this article series, we’ll demonstrate how to detect pests (such as a moose) in real time (or near-real time) on a Raspberry Pi and then take action to get rid of the pest. Since we don’t want to cause any harm, we’ll focus on scaring the pest away by playing a loud noise.
You are welcome to download the source code of the project. We are assuming that you are familiar with Python and have a basic understanding of how neural networks work.
In the last article, we discussed ways of detecting "exotic" pests — those not considered in most pre-trained DNN models. We decided to develop and train our own classifier DNN model and use it in conjunction with a motion detection algorithm. In this article, we’ll see how we can prepare an adequate dataset for our DNN classifier.
Whether a specific dataset is suitable for a specific DNN model depends on the problem the model is expected to solve. The pest we’re going to detect is moose, and we’d like to detect this animal in real-life situations. Clearly, our dataset must contain many images featuring moose.
The animal in the images should be filmed from various angles and in various poses. The minimum acceptable number of images depends on the DNN model and the required accuracy. Generally, deep learning researchers recommend 1000 to 5000 images per object class.
To detect moose in a real-life setting, we should be able to distinguish the moose from all other objects that might be present in any given frame. So we need two object categories: Moose and background (not moose).
The first step is to gather the relevant images. The simplest way to do this is by searching the internet and saving the images you find. It’s a long and boring process, but we need plenty of images to train our own DNN model.
It’s also an important lesson. Data acquisition and preparation is often the most difficult part of an AI project. Unless you’re working on cutting edge research, you’re probably not going to be designing new neural network architecture. If you don’t have a large, clean data set already prepared for the problem you’re trying to solve, you can expect to spend 70% (of more!) of your AI project time here.
In this case, we can simplify and speed up the process by searching for videos instead of separate images. First, we search for and download videos with moose. Then we extract frames from the video files. Let’s take this path.
A quick search produces a surprisingly large list of videos with moose. Here are some examples:
Next, we download the video files and use VLC player to extract the relevant frames.. VLC enables you to watch a video frame-by-frame and upload a frame from any time marker to the disk. In our case, this resulted in 174 images containing moose. Here are some example frames:
The frames with moose in them don’t comprise a complete dataset for DNN training. To train a classifier, all images in the dataset must be of the same size, and the object for classification must be seen as a whole in each, for example, with Cascade Trainer GUI. Don’t forget that we need samples for the background class, too. We can get these from the same frames by cropping the moose away.
As the cropped images are likely to be different sizes, let’s automatically resize them with some custom Python code:
files = 
filenames = os.listdir(folder)
for (i, fname) in enumerate(filenames):
fullpath = os.path.join(folder, fname)
def __init__(self, size):
self.size = size
def process(self, source, dest):
files = FileUtils.get_files(source)
if not os.path.isdir(dest):
for (i, fname) in enumerate(files):
img = cv2.imread(fname)
(h, w, c) = img.shape
if w>h :
dx = int((w-h)/2)
img = img[0:h, dx:dx+h]
if h>w :
dy = int((h-w)/2)
img = img[dy:dy+w, 0:w]
resized = cv2.resize(img, (self.size, self.size), cv2.INTER_AREA)
f = os.path.basename(fname)
dfname = os.path.join(dest, f)
The code uses functions from the OpenCV package to load sample images from the source folder, resize them, and save them to the destination directory. Note that the image resizer receives only one initialization parameter:
Most convolutional networks for image processing work with square input images, so we’ll make our resizer convert the raw images to squares. The resizing algorithm considers the fact that the initial image might not have been square. To avoid object distortion, the algorithm first crops the centered square segment from the image and then resizes it.
Now let’s run the code and process the images:
source = r"C:\PI_PEST\moose_cropped"
dest = r"C:\PI_PEST\moose_resized"
resizer = Resizer(128)
This gives us correctly sized samples for our dataset.
Now we’ve got 484 samples in total: 200 items for the moose category and 284 items for the background (not moose) objects.
484 images are not enough to train our DNN for high accuracy. We could find more video files and extract more frames to increase the dataset size, but there is a better way. In the next article, we’ll see how the same result can be achieved by data augmentation.