A couple weeks ago we learned how to and OpenCV 3.3’s deep neural network (dnn ) module.While this original blog post demonstrated how we can categorize an image into one of ImageNet’s 1,000 separate class labels it could not tell us where an object resides in image.In order to obtain the bounding box (x, y)-coordinates for an object in a image we need to instead apply object detection.Object detection can not only tell us what is in an image but also where the object is as well.In the remainder of today’s blog post we’ll discuss how to apply object detection using deep learning and OpenCV. Figure 2: (Left) Standard convolutional layer with batch normalization and ReLU. (Right) Depthwise separable convolution with depthwise and pointwise layers followed by batch normalization and ReLU (figure and caption from Liu et al.).When building object detection networks we normally use an existing network architecture, such as VGG or ResNet, and then use it inside the object detection pipeline.
The problem is that these network architectures can be very large in the order of 200-500MB.Network architectures such as these are unsuitable for resource constrained devices due to their sheer size and resulting number of computations.Instead, we can use (Howard et al., 2017), another paper by Google researchers. We call these networks “MobileNets” because they are designed for resource constrained devices such as your smartphone.
MobileNets differ from traditional CNNs through the usage of depthwise separable convolution ( Figure 2 above).The general idea behind depthwise separable convolution is to split convolution into two stages:. A 3×3 depthwise convolution. Followed by a 1×1 pointwise convolution.This allows us to actually reduce the number of parameters in our network.The problem is that we sacrifice accuracy — MobileNets are normally not as accurate as their larger big brothersbut they are much more resource efficient.For more details on MobileNets please see. Combining MobileNets and Single Shot Detectors for fast, efficient deep-learning based object detectionIf we combine both the MobileNet architecture and the Single Shot Detector (SSD) framework, we arrive at a fast, efficient deep learning-based method to object detection.The model we’ll be using in this blog post is a Caffe version of the by Howard et al. And was trained by chuanqi305.The MobileNet SSD was first trained on the (Common Objects in Context) and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).We can therefore detect 20 objects in images (+1 for the background class), including airplanes, bicycles, birds, boats, bottles, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorbikes, people, potted plants, sheep, sofas, trains, and tv monitors. Deep learning-based object detection with OpenCVIn this section we will use the MobileNet SSD + deep neural network (dnn ) module in OpenCV to build our object detector.I would suggest using the “Downloads” code at the bottom of this blog post to download the source code + trained network + example images so you can test them on your machine.Let’s go ahead and get started building our deep learning object detector using OpenCV.Open up a new file, name itdeeplearningobjectdetection.py, and insert the following code. # import the necessary packagesimport numpy as npimport argparseimport cv2# construct the argument parse and parse the argumentsap = argparse.ArgumentParserap.addargument('-i', '-image', required=True,help='path to input image')ap.addargument('-p', '-prototxt', required=True,help='path to Caffe 'deploy' prototxt file')ap.addargument('-m', '-model', required=True,help='path to Caffe pre-trained model')ap.addargument('-c', '-confidence', type=float, default=0.2,help='minimum probability to filter weak detections')args = vars(ap.parseargs).

Args = vars ( ap. Parseargs ( ) )On Lines 2-4 we import packages required for this script — thednn module is included incv2, again, making hte assumption that you’re using OpenCV 3.3.Then, we parse our command line arguments ( Lines 7-16):. image: The path to the input image. prototxt: The path to the Caffe prototxt file.
model: The path to the pre-trained model. confidence: The minimum probability threshold to filter weak detections. The default is 20%.Again, example files for the first three arguments are included in the “Downloads” section of this blog post. I urge you to start there while also supplying some query images of your own.Next, let’s initialize class labels and bounding box colors. FONTHERSHEYSIMPLEX, 0.5, COLORS idx , 2 )We start by looping over our detections, keeping in mind that multiple objects can be detected in a single image. We also apply a check to the confidence (i.e., probability) associated with each detection. If the confidence is high enough (i.e.
Above the threshold), then we’ll display the prediction in the terminal as well as draw the prediction on the image with text and a colored bounding box. Figure 8: Me and the family beagle are corrected as a “person” and a “dog” via deep learning, object detection, and OpenCV. The TV monitor is not recognized.Unfortunately the TV monitor isn’t recognized in this image which is likely due to (1) me blocking it and (2) poor contrast around the TV. That being said, we have demonstrated excellent object detection results using OpenCV’sdnn module. SummaryIn today’s blog post we learned how to perform object detection using deep learning and OpenCV.Specifically, we used both MobileNets + Single Shot Detectors along with OpenCV 3.3’s brand new (totally overhauled)dnn module to detect objects in images.As a computer vision and deep learning community we owe a lot to the contributions of Aleksandr Rybnikov, the main contributor to thednn module for making deep learning so accessible from within the OpenCV library.
You can find Aleksandr’s original OpenCV example script — I have modified it for the purposes of this blog post.In a future blog post I’ll be demonstrating how we can modify today’s tutorial to work with real-time video streams, thus enabling us to perform deep learning-based object detection to videos. We’ll be sure to leverage efficient frame I/O to increase the FPS throughout our pipeline as well.To be notified when future blog posts (such as the real-time object detection tutorial) are published here on PyImageSearch, simply enter your email address in the form below. Hi Adrian, You always inspired me with your Tremendous Innovation and become my Role Model too.Now Coming back to the Topic, I’m Getting this error:Traceback (most recent call last):File “deeplearningobjectdetection.py”, line 32, innet = cv2.dnn.readNetFromCaffe(args“prototxt”, args“model”)AttributeError: ‘module’ object has no attribute ‘dnn’Eventhough after installing Lasagne, it is giving me the error:ImportError: Could not import Theano.Please make sure you install a recent enough version of Theano. Seesection ‘Install from PyPI’ in the installation docs for more details. Your tutorials are really excellent! You get the impression that everything is so simple.On the basis of your code, which works perfectly, I would now like to identify (car / van / small trucks / large trucks).As you suggested, I looked into the Caffe Model Zoo. I tried to use GoogLeNetcars by retrieving directly.model And the corresponding prototxt But simply changing the model does seem to be the right way to go.
What should I do? Yes I completely discover the subject.Thanks in advance. Interestingly, running your code on my machine gives different object detection results than yours. For instance, on example 3, I can only detect the horse and one potted plant. On example 5 I get the same detection plus the dog is also detected as a cat (with a higher probability) and the model is able to capture the person in the back, left side near the fence.Is this variation expected? I would have expected that the dnn model would behave the same on an the same image for all repetitions of the experiment.thanks for the great post! Hi Adrian,I have come across some problems when understanding your code:In this line,detections.shape2what does this line means when the blob is forward pass through the network in the line “net.forward”?In this line,confidence = detections0, 0, i, 2what are these 4 parameters(0,0,i,2) means and how it extracts the confidence of the object detected?In this line,idx = int(detections0, 0, i, 1)what is this 1 signifies in detections ?In this line,box = detections0, 0, i, 3:7.
np.array(w, h, w, h)what do you want to do by multiplying numpy array with detections? Why you take 4th argument of detections as 3:7, what does this mean? Why you pass w, h, w, h to numpy array and why you pass width and height two times to numpy array?Please help, thanks in advance. The detections object is a mulit-dimensional NumPy array. The call to detections.shape gives us the number of actual detections.
We can then extract the confidence for the i-th detection via detections0, 0, i, 2. The slice 3:7 gives us the bounding box coordinates of the object that was detected. We need to multiply these coordinates by the image width and height as they were relatively scaled by the SSD.Take a look at the detections NumPy array and play around with it. If you’re new to NumPy, take the time to educate yourself on how array slices work and how vector multiplies work. This will help you learn more. Hi Adrian,Just to make sure I’m understanding what is going on here. SSD is an object detector that sits on top of an image classifier (in this case MobileNet).
So, technically, one can switch to a more accurate (but slower) image classifier such as Inception. And this would improve the detection results of SSD. Is this correct? I guess I can look at your other posts about using Google LeNet and change a few lines in this example to switch MobileNet with Google LeNet in OpenCV?Also, have you come across any implementations or blog posts that discuss playing around with various image classifiers + SSD in Keras to perform object detection?Thanks once again for your blog posts.
They have saved me hours and hours of time and the hair on my head.Cheers! This is a bit incorrect. In the SSD architecture, the bounding boxes and confidences for multiple categories are predicted directly within a single network.
We can modify an existing network architecture to fit the SSD framework and then train it to recognize objects, but they are not hot swappable.For example, the base of the network could be VGG or ResNet through the final pooling layers. We then convert the FC layers to CNV layers. Additional layers are then used to perform the object detection. The loss function then minimizes over correct classifications and detections. A complete review of the SSD framework is outside the scope of this post, but I will be covering it in detail inside.There are one or two implementations I’ve seen of SSDs in Keras and mxnet, but from what I understand they are a bit buggy. Yes, you are absolutely correct.
The ImageNet Bundle of Deep Learning for Computer Vision with Python will demonstrate how to train your own custom object detectors using deep learning. From there I’ll also demonstrate how to create a custom image processing pipeline that will enable you to take an input image and obtain the output predictions + detections using your classifier.Secondly, I will be reviewing SSD inside the ImageNet Bundle. I won’t be demonstrating how to implement it, but I will be discussing how it works and demonstrating how to use it. Hi Adrian,first of all, thanks for this great tutorial!I have a short question: I am trying to rebuilt your tutorial with the openCV C API. When I see the call for the function for the blog generation from the input image:cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5)it is hard for me to match it up with the corresponding C API functionCould you give me a small hint how to match it? Escpecially the scalar value “0.007843” and “127.5” did not realy match for me.Thanks for you help and again great work!Johannes. Thanks for the clear tutorial, really makes difference in trying to figure this stuff out!This is what I don’t get about how the dnn works (I’m a newbie with the object detection so:D):how does the model go through the blob to get the location?
I mean, if the object recognition model is (presumably) trained with the object nicely framed in the middle of the image, how does the detection model find a small or partially covered object like the baseball glove? Does it somehow divide the image in seqments? Traceback (most recent call last):File “realtimeobjectdetection.py”, line 33, innet = cv2.dnn.readNetFromCaffe(args“prototxt”, args“model”)AttributeError: ‘module’ object has no attribute ‘dnn’this is the issue i am getting while running the realtimeobjectdetection.py fileis there anything wrong with my opencv installation??I installed it from the link provided by you, i din run into any issue, while installation, however while running the realtimeobjectdetection.py file i get the above error.Please help, anyone if came across such issue. Hi,I really enjoyed your tutorial because it gave me a good start with this interesting topic. So one question regarding object detection. Is there an approach that will tell me, if a general object is in my image or not? Let’s say we have a background that stays the same and there is an object in the image.
Os Listdir Avoid Hidden Files
I do not know what is the object, but that an object is there. At the moment I tend to solve this problem with “classic” computer vision, is there a deep learning approach? Maybe check if no object is matched? (with certain probability). Thanks very much for this awesome tutorial.
I have one concern here though. You only take one image for detection. But this is not efficient.
If I have multiple images or a video file I could read bunch of images/frames and try to detect them all at once. That will be much faster. This is a huge problem that I’m facing right now with RCNN. I can test one image but I could not find any solution how to do batch testing.
It would be really great if you could also do a post about it.Love your work btw 🙂 Thanks very much. Nice example.

I looked at the script of Aleksandr Rybnikov you mentioned in the post and tried to adapt your example to use it with the tensorflow prelearned model.I adapted it to the 90 classes, used the ssdmobilenetv1coco.pbtxt from opencvextra and downloaded ssdmobilenetv1coco.tar.gz to get the frozeninferencegraph.pb.At first I used the graph.pbtxt included in the tar file but that doesn’t work with OpenCV3.1.1 and your script. So I tried the ssdmobilenetv1coco.pbtxt from opencvextra. This sort of works (doesn’t give errors) but the object recognition results are not good.Is there a way to generate an OpenCV3.1.1 compatible.pbtxt? To work with your script or doesn’t it work this way? I built OpenCV 3.3 on a Raspberry Pi3 following your Raspbian Stretch instructions and downloaded this sample code. Everything seems to work except my results don’t quite match the results shown in this blog.Particularly example05.jpg where I get:INFO car: 99.49%INFO cat: 61.79%INFO dog: 50.56%INFO horse: 99.80%INFO person: 86.79%INFO person: 26.94%Instead of what you show:INFO car: 99.87%INFO dog: 94.88%INFO horse: 99.97%INFO person: 99.88%Seems I should get the same results with the same code and test images, but it appears I don’t. The “boxes” drawn on my images seem better located than those in your example, except for the cat, which is not really there and probably is drawn over by the box for the dog.I setup virtual environments for python3 and python2.7 and my results are the same with the python3 and python2.7 environment, but different from yours.
I was not able to detect the ‘background’ class, even when testing it against ‘white background’ image! Could you provide me with some idea for an image where the background class can be detected.Secondly I wanted to ask if the training file is available for this? I wanted to train some classes on my own.Thirdly is there any portal where datasets for multiple images can be easily availible that can be used to test this?I am hoping to receive your guidance at the earliest. Thank you so much 🙂. Hi Adrian,Thanks again for the great post.I am using the above code, to get distance value from rectified stereo left and right images.
I detect same object in both left and right images using the cv2.dnn.blobFromImage. Then from difference in the horizontal pixel location,i am finding distance.But the blob returns different vertical pixel values for same object, as the images are rectified, we should get same value right, do you know why this happens?Also the estimated distance is erroneous, is is due to resize or scaling that we apply during cv2.dnn.blobFromImage function?Thanks in advance! You should always make annotations of the class ID + bounding boxes of each object in an image and save the annotations to a separate file (I recommend a simple CSV or JSON file). You can always use this information to later crop out bounding boxes and save the ROIs individually if you wish. The reverse is not true.Since SSDs and Faster R-CNNs have a concept of hard-negatives (where they take a non-annotated ROI region and see if the network incorrectly classifies it) you’ll want to supply the entire image to the network, not just a small crop of the ROI. Simply put:1. A classification network will give you a class label of what the image contains.2.
An object detection network will give you multiple class labels AND bounding boxes that indicate where in the image each object is.Keep in mind that it’s impossible for a machine learning model to recognize classes or objects it was not trained it. It has to be trained on the classes to recognize them.If you’re interested in learning more about classification, object detection, and deep learning, I would suggest taking a look at where I discuss the techniques in detail (and with source code to help solidify the concepts). First of all, love your work.
And especially love this tutorial for making ML easily understandable and used with opencv.Just wanted to let you know about the MobileNet-SSD object detection model trained in TensorFlow found by following the information in opencv dnn samples “mobilenetssdaccuracy.py” has alot higher accuracy (or more detections if accuracy isnt the right word here).It detected the tv in the background of your last picture and detected relatively small people in a picture that the caffe model provided here didnt. With roughly the same time for prediction. Hey Bhavitha — explaining the entire process of how an image/volume is transformed layer-by-layer by a network is far too detailed to cover in a blog post comment, especially when you consider the different types of layers (convolution, activation, batch normalization, pooling, etc.).The gist is that a network is inputted to a network. A total of K convolutions are applied resulting in a MxNxK volume.
We then pass through a non-linear activation (ReLU) and optionally a batch normalization (sometimes the order of activation and BN are swapped). Max pooling could be used to reduce volume size or convolutions can be used as well if their strides are large enough.This process repeats, reducing the size of the volume and increasing the depth as it passes through the network.Eventually we use a fully-connected layer(s) to obtain the final predictions.If you’re interested in learning more about CNNs, including:– How they work– The parameters used for each layer– How to piece together the building blocks to build your own CNN architecturesThen I suggest you work through where I discuss all if this in detail.I hope that helps! Hey,First of all, I love your blog. They are simple and easy to follow.Second of all, I have a question.
From the description above, I understand that–prototxt: The path to the Caffe prototxt file.–model: The path to the pre-trained model.I have installed Caffe successfully. I have OpenCV version 3.4.1 and I am using python 3.5So my question is:Does MobileNetSSDdeploy.prototxt.txt gets install when one installs Caffe?
I could not find it in the “Caffe” (the installed) folder.Also, how do I train the model?For example, I want to train an image with a different set of objects (not the one mentioned above) and would like to have lesser neural network layers (since I do not have a complicated image to train). How do I do that?I am new to deep learning and trying to understand the program.Thank you very much!Best Regards.
Python Ignore Hidden Files
Hello, Adrian! Great lesson, thank you!I have a question.
In your post you mentioned that it example based on combination of the MobileNet architecture and the Single Shot Detector (SSD) framework. As I understood you right, it example only suit to COCO dataset and was pretrained on it.What if I want to use it network for my purposes? I need to gather my own database and train it network on it? If yes, what requirements to images will be, where to find it? And how to train it?
Use Caffe, right?Just want to clarify it details and get any possible links.Thank you for answer. Hey Adrian, thanks for this wonderful article and for so many comments; I went through each of them and spawned multiple tabs. I have three questions as below.1) Are you aware of any trained dataset which consist of primitive geometrical shapes viz.
Squares, circles, rectangles, semi-circles, quadrilaterals, polygons, etc. Where the shapes are just wire-frames and not the solid types filled with some colors?2) If such a dataset exists then can deep learning like in this article be applied to recognize multiple shapes of different sizes stacked together in a drawing as in? If yes, can there sizes be extracted too using some technique? Thanks a lot Adrian, it saved my time in exploring such a dataset. Actually, there exist many 2D shapes dataset but they are very big and contain many different things so probably ill-suited for my problem.
So what I understand from your advice is that I generate all the shapes with different parameters and create a dataset of all shapes occurring in my drawings. After that I can train a model for this dataset and do object recognition using deep learning? Is deep learning the only solution if I want to have an AI based solution? Thanks a lot. Hi,thanks a lot for this amazing tutorial.
Its really very helpful. I am able to execute this code on windows and getting good results. But when the same code i am executing on Raspberry Pi, i get this following error:(h, w) = image.shape:2AttributeError: ‘NoneType’ object has no attribute ‘shape’the location for this error it shows as in /imutils/sitepackage/convenience.py line no 69please let me know what should i exactly change in the code. I have referred the blog written for thisbut still i am getting the same error.I am looking forward for your help/inputs.Thanks a lot.!!!
Thanks a lot for your tutorial, it worked perfectly for me.I have a couple of questions though.1) I tried the tensorflow framework itself to implement the same task – object detection, I used their exampleIt worked well but much, much slower and, what was more important to me, it consumed a tremendous amount of memory (around 1G for just one image processing). How would you explain that OpenCV does its work much better (faster, less greedy)?2) Do I understand properly that I can feed cv2.dnn any other supported model from other frameworks like tensorflow?Thanks a lot! Although I am getting this errorINFO loading modelINFO computing object detectionsINFO car: 99.95%INFO car: 95.62%Traceback (most recent call last):File “deeplearningobjectdetection.py”, line 74, incv2.imshow(‘Output’, image)cv2.error: OpenCV(3.4.2) /home/pi/opencv-3.4.2/modules/highgui/src/window.cpp:632: error: (-2:Unspecified error) The function is not implemented.
Rebuild the library with Windows, GTK+ 2.x or Carbon support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function ‘cvShowImage’. Hi Adrian, thank you for the tutorials. I am a beginner and your tutorials are of great help. I do have some questions.1) I tried dnn-mobilenet ssd, using the 20 classes trained by chuanqi (same as yours). But there are any pre-trained models that I can use?
I am actually trying to detect boxes but sadly the 20 classes did not include boxes.2) Inception V3’s model do have cartons. But are they strictly for image classification? Can we use the model for object detection? If can, how can we do that?
Thank you very much for a great post!I downloaded the code and it works really well. However, I obtain slightly different detection results than the ones you showed. For instance, one potted plant and the person are missing in my detections in the file example03.jpg (a horse jumping over a hurdle). I also get different bounding boxes in the first image of two cars on the highway.My question is: was the model retrained in the meantime? What I find surprising is that it seems significantly less accurate than the YOLO network you presented recently.Many thanks in advance for your answer!
Hello Mr Andrian Thank you for sharing this article. I would like to ask about something about object detection.I have case that want to classify the person holds snack or not. If the person holds snack will give the label “person bring snack” and if not give the label “person not bring snack”. And the question is. How do i train data? I will train data separately. For instance in this case there are two objects: person and snack and i give the label which is person and which is snack.OrI create the the training data label “person holds image” with no separately?Thank you in advance.
Hi Adrian,Thanks for the wonderful post. I learned a lot from it.However, can i double check with you whether I have understood the “frameworks” correctly?For my understanding, the MobileNet, VGG, GoogleLeNet etc. Are all some “base framework” and caffe, YOLO, SSDs etc. Are so call “object detection framework”.
And we are connecting these two frameworks together to get the whole network to achieve the object detection task.If I understood the above concept correctly, then why don’t we just make one big framework to do the object detection task instead of combining two together?Thanks. Before you leave a comment.Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:. If you have a question, read the comments first.
You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it. Be respectful of the space.
I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space. Be patient. I receive 200+ comments and emails per day.
Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question. Do you need priority support? I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me.
If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.Thank you for keeping these guidelines in mind before submitting your comment. Thank you for the comment!I hand moderate all incoming questions to the PyImageSearch blog (typically once per week). Once your comment is approved it will show up on this page.In the meantime, here are a few suggestions to help you with your question:. Have you read the comments on this page? It's likely that I have already addressed your question in a previous comment. Additionally, try searching this page (i.e., ctrl + f) and search for keywords related to your question.
Did you include code/terminal output in the comment? If so, your comment won't be approved. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together.
I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of this page. Kindly be respectful of this space. Have you checked my books and courses? I've written four separate books and courses which address the vast majority of Computer Vision, Deep Learning, and OpenCV questions you may have.
Do you need priority support? I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me.
If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.Thanks, and I look forward to helping you out!
Nothing in this thread has worked for me. Import os import cv2 import time import argparse import multiprocessing import numpy as np import tensorflow as tf from matplotlib import pyplot as plt%matplotlib inline from objectdetection.utils import labelmaputil from objectdetection.utils import visualizationutils as visutil I am using anaconda jupyter on ubuntu 16.04ltsThis is my errorModuleNotFoundError Traceback (most recent call last)in - 1 from objectdetection.utils import labelmaputil2 from objectdetection.utils import visualizationutils as visutilModuleNotFoundError: No module named 'objectdetection'. Hello, I was able to figure out the problem as objectdetection library is not installed so please run below command inside the directory models/research sudo python setup.py installI hope, this will solve the problem.
If such solution does not work, then please execute the below command one by one inside the directory models/research export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slimsudo python setup.py installI still get that errorThis will resolve the error. I too faced such error while creating model from exportinferencegraph.py. I followed the above procedure, error was gone. I hope this solution may be a help to you.With RegardsAI Sangam.