You’re interested in Computer Vision, Deep Learning, and OpenCV…but you don’t know how to get started.
Follow these steps to get OpenCV configured/installed on your system, learn the fundamentals of Computer Vision, and graduate to more advanced topics, including Deep Learning, Face Recognition, Object Detection, and more!
Step #1: Install OpenCV + Python on Your System (Beginner)
Before you can start learning OpenCV you first need to install the OpenCV library on your system.
By farthe easiest way to install OpenCV is via pip:
Once you have OpenCV installed on your Windows system all code examples included in my tutorials shouldwork (just understand that I cannot provide support for them if you are using Windows).
If you are struggling to configure your development environment be sure to take a look at my book, Practical Python and OpenCV, which includes a pre-configured VirtualBox Virtual Machine.
All you need to do is installVirtualBox, download the VM file, import it and load the pre-configured development environment.
And best of all, this VM will work on Linux, macOS, and Windows!
Step #2: Understand Command Line Arguments (Beginner)
Command line arguments aren’t a Computer Vision concept but they areused heavily here on PyImageSearch and elsewhere online.
If you intend on studying advanced Computer Science topics such as Computer Vision and Deep Learning then you needto understand command line arguments:
Drawing lines, rectangles, circles, and text on an image
Masking and bitwise operations
Contour and shape detection
…and more!
Additionally, if you want a consolidated review of the OpenCV library that willget you up to speed in less than a weekend, you should take a look at my book, Practical Python and OpenCV.
Step #4: Build OpenCV Mini-Projects (Beginner)
At this point you have learned the basics of OpenCV and have a solid foundation to build upon.
Take the time now to follow these guides and practice building mini-projects with OpenCV.
To start, I highly recommendyou follow this guide on debugging common “NoneType” errors with OpenCV:
You’ll see these types of errors when (1) your path to an input image is incorrect, returning incv2.imread returningNone or (2) OpenCV cannot properly access your video stream.
Trust me, at some point in your Computer Vision/OpenCV career you’ll see this error— take the time now to read the article above to learn how to diagnose and resolve the error.
The following tutorials will help you extend your OpenCV knowledge and build on the fundamentals:
Additionally, I recommend that you take these projects and extend them in some manner, enabling you to gain additional practice.
As you work through each tutorial, keep a notepad handy and jot down inspiration as it comes to you.
For example:
How might you apply the algorithm covered in a tutorial to yourparticular dataset of images?
What would you change if you wanted to filter out specific objects using contours?
Make notes to yourself and come back and try to solve these mini-projects later.
Step #5: Solve More Advanced OpenCV Projects (Intermediate)
Practice makes perfect and Computer Vision/OpenCV are no different.
After working through the tutorials in Step #4 (and ideally extending them in some manner), you are now ready to apply OpenCV to more intermediate projects.
My first suggestion is to learn how to access your webcam using OpenCV.
The following tutorial will enable you to access your webcam in a threaded, efficient manner:
These algorithms utilize keypoint detection, local invariant descriptor extraction, and keypoint matching to build a program capable of stitching multiple images together, resulting in a panorama.
There is a dedicated Optical Character Recognition (OCR) section later in this guide, but it doesn’t hurt to gain some experience with it now:
Again, keep a notepad handy as you work through these projects.
Practice extending them in some manner to gain additional experience.
Step #6: Pick Your Niche (Intermediate)
Congratulations, you have now learned the fundamentals of Image Processing, Computer Vision, and OpenCV!
The Computer Vision field is compromised of subfields (i.e., niches), including Deep Learning, Medical Computer Vision, Face Applications,and many others.
Many of these fields overlapand intertwine as well — they are not mutually exclusive.
That said, as long as you follow this page you’ll always have the proper prerequisites for a given niche, so don’t worry!
Most readers jump immediately into Deep Learning as it’s one of themost popularfields in Computer Science; however,
Where to Next?
If you need additional help learning the basics of OpenCV, I would recommend you read my book, Practical Python and OpenCV.
This book is meant to be a gentle introduction to the world of Computer Vision and Image Processing through the OpenCV library.
And if you don’t know Python, don’t worry!
Since I explain everycode examples in the book line-by-line, 1000s of PyImageSearch readers have used this book to not only learnOpenCV, but alsoPythonat the same time!
If you’re looking for a more in-depth treatment of the Computer Vision field, I would instead recommend the PyImageSearch Gurus course.
The PyImageSearch Gurus course is similar to a college survey course in Computer Vision, but muchmore hands-on and practical (including well documented source code examples).
Otherwise, my personal recommendation would be to jump into the Deep Learningsection — most PyImageSearch readers who are interested in Computer Vision are alsointerested in Deep Learning as well.
Deep Learning
Deep Learning algorithms are capable of obtaining unprecedented accuracy in Computer Vision tasks, including Image Classification, Object Detection, Segmentation, and more.
Follow these steps and you’ll have enough knowledge to start applying Deep Learning to your own projects.
Step #1: Configure your Deep Learning environment (Beginner)
Before you can apply Deep Learning to your projects, you first need to configure your Deep Learning development environment.
The following guides will help you install Keras, TensorFlow, OpenCV, and all other necessary CV and DL libraries you needto be successful when applying Deep Learning to your own projects:
Pick up a copy of my book, Deep Learning for Computer Vision with Python, which includes a VirtualBox Virtual Machine with all the DL and CV libraries you need pre-configuredand pre-installed.
All you need to do is installVirtualBox, download the VM file, import it and load the pre-configured development environment.
And best of all, this VM will work on Linux, macOS, and Windows!
Step #2: Train Your First Neural Network (Beginner)
Provided that you have successfully configured your Deep Learning development environment, you can move now to training your first Neural Network!
I recommend starting with this tutorial which will teach you the basics of the Keras Deep Learning library:
After that, you should read this guide on training LeNet, a classic Convolutional Neural Network that is both simple to understand and easy to implement:
Now that you understand what kernels and convolution are, you should move on to this guide which will teach you how Keras’ utilizes convolution to build a CNN:
So, you trained your own CNN from Step #5 — but your accurate isn’t as good as what you want it to be.
What now?
In order to obtain a highly accurate Deep Learning model, you need to tune your learning rate, the most important hyperparameterwhen training a Neural Network.
The following tutorial will teach youhow to start training, stop training, reduce your learning rate, and continue training, a critical skill when training neural networks:
This guide will teach you about learning rate schedules and decay, a method that can be quickly implemented to slowly lower your learning rate when training, allowing it to descend into lower areas of the loss landscape, and ideally obtain higher accuracy:
You should also read about Cyclical Learning Rates (CLRs), a technique used to oscillate your learning rate between an upper and lower bound, enabling your model to break out of local minima:
If you haven’t already, you will run into two important terms in Deep Learning literature:
Generalization:The ability of your model to correctly classify images that are outsidethe training set used to train the model.
Your model is said to “generalize well” if it can correctly classify images that it has neverseen before.
Generalization is absolutely criticalwhen training a Deep Learning model.
Imagine if you were working for Tesla and needed to train a self-driving car application used to detect cars on the road.
Your model worked well on the training set…but when you evaluated it on the testing set you found that the model failed to detect the majority of cars on the road!
In such a situation we would say that your model “failed to generalize”.
To fix this problem you need to apply regularization.
Regularization: The term “regularization” is used to encompass all techniques used to (1) prevent your model from overfitting and (2) generalize well to your validation and testing sets.
Regularization techniques include:
L2 regularization(also called weight decay)
Updating the CNN architecture to include dropout
You can read the following tutorial for an introduction/motivation to regularization:
Step #8: Feature Extraction and Fine-tuning Pre-trained Networks (Intermediate)
So far we’ve trained our CNNs from scratch — but is it possible to take a pre-trainedmodel and use it to classify images it was never trained on?
Yes, it absolutely is!
Taking a pre-trained model and using it to classify data it was never trained on is called transfer learning.
There are two types of transfer learning:
Feature extraction: Here we treat our CNN as an arbitrary feature extractor.
An input image is presented to the CNN.
The image is forward-propagated to an arbitrary layer of the network.
We take those activations as our outputand treat them like a feature vector.
Given feature vectors for all input images in our dataset we train an arbitrary Machine Learning model (ex., Logistic Regression, Support Vector Machine, SVM) on top of our extracted features.
When making a prediction, we:
Forward-propagate the input image.
Take the output features.
Pass them to our ML classifier to obtain our output prediction.
I’ll wrap up this section by saying thattransfer learning is a criticalskill for you to properly learn.
Use the above tutorials to help you get started, but for a deeper dive into my tips, suggestions, and best practices when applying Deep Learning and Transfer Learning, be sure to read my book:
Inside the text I not only explain transfer learning in detail, but also provide a number of case studies to show you how to successfully apply it to your own custom datasets.
Step #9: Video Classification (Advanced)
At this point you have a good understanding of how to apply CNNs to images— but what about videos?
Can the same algorithms and techniques be applied?
Video classification is an entirely different beast — typical algorithms you may want to use here includeRecurrent Neural Networks(RNNs) andLong Short-Term Memory networks(LSTMs).
However, before you start breaking out the “big guns” you should read this guide:
Inside you’ll learn how to use prediction averagingto reduce “prediction flickering”and create a CNN capable of applying stable video classification.
Step #10: Multi-Input and Multi-Output Networks (Advanced)
Imagine you are hired by a large clothing company (ex., Nordstorms, Neiman Marcus, etc.) and are tasked with building a CNN to classify two attributes of an input clothing image:
Clothing Type:Shirt, dress, pants, shoes, etc.
Color:The actual colorof the item of clothing (i.e., blue, green, red, etc.).
To get started building such a model, you should refer to this tutorial:
Now, let’s imagine that for your next job you are hired by real estate company used to automatically predict the price of a house based solely on input images.
You are given images of the bedroom, bathroom, living room, and house exterior.
You now need to train a CNN to predict the house price using justthose images.
To accomplish that task you’ll need a multi-input network:
The OpenCV library ships with a number of pre-trained models for neural style transfer, black and white image colorization,holistically-nested edge detectionand others — you can learn about these models using the links below:
If you intend on deploying your models to production, and more specifically, behind a REST API, I’ve authored three tutorials on the topic, each building on top of each other:
Super practical walkthroughs that present solutions to actual, real-world image classification problems, challenges, and competitions.
Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition
Otherwise, I would recommend reading the following sections of this guide:
Object Detection:State-of-the-art object detectors, including Faster R-CNN, Single Shot Detectors (SSDs), YOLO, and RetinaNet allrely on Deep Learning.
If you want to learn how to not only classifyan input image but also locate where in the object is, then you’ll want to read these guides.
Medical Computer Vision: Apply Computer Vision and Deep Learning to medical image analysis and learn how to classify blood cells and detect cancer.
Face Applications
Using Computer Vision we can perform a variety of facial applications, including facial recognition, building a virtual makeover system (i.e., makeup, cosmetics, eyeglasses/sunglasses, etc.), or even aiding in law enforcementto help detect, recognize, and track criminals.
Computer Vision is powering facial recognition at amassivescale — just take a second to consider that over 350 millionimages are uploaded to Facebook every day.
For each of those images, Facebook is runningface detection(to detect the presence) of faces followed by face recognition(to actually tagpeople in photos).
In this section you’ll learn the basics of facial applications using Computer Vision.
Step #1: Install OpenCV, dlib, and face_recognition (Beginner)
Before you can build facial applications, you first need to configure your development environment.
Start by following Step #1 of the How Do I Get Started?section to install OpenCV on your system.
From there, you’ll need to install the dlib and face_recognition libraries.
TheInstall your face recognition librariesofthis tutorialwill help you install both dlib and face_recognition.
Make sure you have installed OpenCV, dlib, and face_recognition before continuing!
Step #2: Detect Faces in Images and Video (Beginner)
In order to apply Computer Vision to facial applications you first need to detectand findfaces in an input image.
Face detection is differentthan face recognition.
During face detection we are simply trying to locatewhere in the image faces are.
Our face detection algorithms do not know whois in the image, simply that a given face existsat a particular location.
Once we have our detected faces, we pass them into a facial recognition algorithm which outputs the actual identify of the person/face.
Thus, all Computer Vision and facial applications muststart with face detection.
There are a number of face detectors that you can use, but my favorite is OpenCV’s Deep Learning-based face detector:
OpenCV’s face detector isaccurateand able torun in real-timeon modern laptops/desktops.
That said, if you’re using a resource constrained devices (such as the Raspberry Pi), the Deep Learning-based face detector may be too slow for your application.
In that case, you may want to utilizeHaar cascadesorHOG + Linear SVMinstead:
Haar cascades are veryfast but prone to false-positive detections.
It can also be a pain to properly tune the parameters to the face detector.
HOG + Linear SVM is a nice balance between the Haar cascades and OpenCV’s Deep Learning-based face detector.
This detector is slowerthan Haar but is also more accurate.
Here’s my suggestion:
If you need accuracy, go with OpenCV’s Deep Learning face detector.
If you need pure speed, go with Haar cascades.
And if you need abalancebetween the two, go with HOG + Linear SVM.
Finally, make sure you try all three detectors before you decide!
Gather a few example images and test out the face detectors.
Let your empirical results guide you — apply face detection using each of the algorithms, examine the results, and double-down on the algorithm that gave you the best results.
Step #3: Discover Facial Landmarks (Intermediate)
At this point you can detect the location of a face in an image.
But what if we wanted to localize various facial structures, including:
Nose
Eyes
Mouth
Jawline
Using facial landmarkswe can do exactly that!
And best of all, facial landmark algorithms are capable of running in real-time!
Most of your computation is going to be spent detectingthe actual face — once you have the face detected, facial landmarks are quite fast!
Start by reading the following tutorials to learn how localize facial structures on a detected face:
You can then take the dataset you created and proceed to the next step to build your actual face recognition system.
Note: If you don’t want to build your own dataset you can proceed immediately to Step #6 — I’ve provided my own personal example datasets for the tutorials in Step #6 so you can continue to learn how to apply face recognition even if you don’t gather your own images.
Step #6: Face Recognition (Intermediate)
At this point you have either (1) created your own face recognition dataset using the previous step or (2) elected to use my own example datasets I put together for the face recognition tutorials.
To build your first face recognition system, follow this guide:
You’ll note that this tutorial does not rely on the dlib and face_recognition libraries — instead, we use OpenCV’s FaceNet model.
A great project for youwould be to:
ReplaceOpenCV’s FaceNet model with the dlib and face_recognition packages.
Extractthe 128-d facial embeddings
Traina Logistic Regression or Support Vector Machine (SVM) on the embeddings extracted by dlib/face_recognition
Take your time whewn implementing the above project — it will be a great learning experience for you.
Step #7: Improve Your Face Recognition Accuracy (Intermediate)
Whenever I write about face recognition the #1 question I get asked is:
“How can I improve my face recognition accuracy?”
I’m glad you asked — and in fact, I’ve already covered the topic.
Make sure you refer to the Drawbacks, limitations, and how to obtain higher face recognition accuracy section (right before the Summary) of the following tutorial:
Inside that section I discuss how you can improve your face recognition accuracy.
Step #8: Detect Fake Faces and Perform Anti-Face Spoofing
You may have noticed that it’s possible to “trick” and “fool” your face recognition system by holding up a printed photo of a person or photo of the person on your screen.
In those situations your face recognition correctlyrecognizes the person, but fails to realize that it’s a fake/spoofedface!
The PyImageSearch Gurus course includes additional modules and lessons on face recognition.
Additionally, you’ll also find:
Anactionable, real-world courseon OpenCV and computer vision (similar to a college survey course on Computer Vision but muchmore hands-on and practical).
The most comprehensive computer vision education online today.The PyImageSearch Gurus course covers13 modulesbroken out into168 lessons, with other2,161 pagesof content. You won’t find a more detailed computer vision course anywhere else online, I guarantee it.
A community of like-minded developers, researchers, and studentsjust like you, who are eager to learn computer vision and level-up their skills.
To learn more about the PyImageSearch Gurus course, just use the link below:
One of the first applications of Computer Vision was Optical Character Recognition (OCR).
OCR algorithms seek to (1) take an input image and then (2) recognize the text/charactersin the image, returning a human-readable string to the user (in this case a “string” is assumed to be a variable containing the text that was recognized).
While OCR is a simple concept to comprehend (input image in, human-readable text out) it’s actually extremely challenging problem that is far from solved.
The steps in this section will arm you with the knowledge you need to build your own OCR pipelines.
Step #1: Install OpenCV (Beginner)
Before you can apply OCR to your own projects you first need to install OpenCV.
Follow Step #1 of theHow Do I Get Started?section above to install OpenCV on your system.
Once you have OpenCV installed you can move on to Step #2.
Step #2: Discover Tesseract for OCR (Beginner)
Tesseract is an OCR engine/API that was originally developed by Hewlett-Packard in the 1980s.
The library was open-sourced in 2005 and later adopted by Google in 2006.
Tesseract supports over 100 written languages, ranging from English to to Punjabi to Yiddish.
Combining OpenCV with Tesseract is by farthe fastest way to get started with OCR.
First, make sure you Tesseract installed on your system:
Again, follow the guides and practice with them — they will help you learn how to apply OCR to your tasks.
Step #5: Text Detection in Natural Scenes (Intermediate)
So far we’ve applied OCR to images that were captured under controlled environments(i.e., no major changes in lighting, viewpoint, etc.).
But what if we wanted to apply OCR to images in uncontrolled environments?
Imagine we were tasked with building a Computer Vision system for Facebook to handle OCR’ing the 350+ million new images uploaded to their new system.
In that we case, we can make zero assumptionsregarding the environment in which the images were captured.
Some images may be captured using a high quality DSLR camera, others with a standard iPhone camera, and even others with a decade old flip phone — again, we can make no assumptionsregarding the quality, viewing angle, or even contents of the image.
In that case, we need to break OCR into a two stage process:
Stage #1:Use the EAST Deep Learning-based text detector to locate wheretext resides in the input image.
Stage #2: Use an OCR engine (ex., Tesseract) to take the text locations and then actually recognizethe text itself.
To perform Stage #1 (Text Detection) you should follow this tutorial:
If you’ve read the Face Applicationssection above you’ll note that our OCR pipeline is similar to our face recognition pipeline:
First, wedetect the textin the input image (akin to to detecting/locating a face in an image)
And then we take the regions of the image that contain the text, and then actually recognize it(which is similar to taking the location of a face and then actually recognizingwho is in the face).
Step #6: Combine Text Detection with OCR (Advanced)
Now that we know wherein the input image text resides, we can then take those text locations and actually recognize the text.
To accomplish this task we’ll again be using Tesseract, but this time we’ll want to use Tesseract v4.
The v4 release of Tesseract contains a LSTM-based OCR engine that is far more accuratethan previous releases.
You can learn how to combine Text Detection with OCR using Tesseract v4 here:
While the Google Vision API requires (1) an internet connection and (2) payment to utilize, in my opinion it’s one of the bestOCR engines available to you.
OCR is undoubtedly one of the most challengingareas of Computer Vision.
If you need help building your own custom OCR systemsor increasing the accuracy of your current OCR system,, I would recommend joining thePyImageSearch Gurus course.
The course includes private forums where I hang out and answer questions daily.
It’s a great place to get expert advice, both from me, as well as the more advanced students in the course.
Object detection algorithms seek to detect the locationof where an object resides in an image.
These algorithms can be as simple as basic color thresholding or as advanced as training a complex deep neural network from scratch.
In the first part of this section we’ll look at some basic methods of object detection, working all the way up to Deep Learning-based object detectors including YOLO and SSDs.
Step #1: Configure Your Development Environment (Beginner)
Prior to working with object detection you’ll need to configure your development environment.
Install Keras and TensorFlow via Step #1 of the Deep Learningsection.
Provided you have OpenCV, TensorFlow, and Keras installed, you are free to continue with the rest of this tutorial.
Step #2: Create a Basic Object Detector/Tracker (Beginner)
We’ll keep our first object detector/tracker super simple.
We’ll rely strictly on basic image processing concepts, namely color thresholding.
To apply color threshold we define an upperand lowerrange in a given color space (such as RGB, HSV, L*a*b*, etc.)
Then, for an incoming image/frame, we use OpenCV’scv2.inRange function to apply color thresholding, yielding a mask, where:
All foregroundpixels are white
And all background pixels are black
Therefore, all pixels that fall into our upper and lower boundaries will be marked as foreground.
Color thresholding methods, as the name suggestions, are super useful when you know the color of the object you want to detect and track will be differentthan all other colors in the frame.
Furthermore, color thresholding algorithms are very fast, enabling them to run in super real-time, even on resource constrained devices, such as the Raspberry Pi.
Let’s go ahead and implement your first object detector now:
Once you’ve implemented the above two guides I suggest you extend the project by attempting to track your own objects.
Again, keep in mind that this object detector is based on color, so make sure the object you want to detect has a different color than the other objects/background in the scene!
Step #3: Basic Person Detection (Beginner)
Color-based object detectors are fast and efficient, but they do nothing to understand the semantic contents of an image.
For example, how would you go about defining a color range to detect an actual person?
Would you attempt to track based on skin tone?
That would fail pretty quickly — humans have a large variety of skin tones, ranging from ethnicity, to exposure to the sun. Defining such a range would be impossible.
Would clothing work?
Well, maybe if you were at a soccer/football game and wanted to track players on the pitch via their jersey colors.
But for general purpose applications that wouldn’t work either — clothing comes in all shapes, sizes, colors, and designs.
I think you get my point here — trying to detect a person based on color thresholding methods alone simply isn’t going to work.
Instead, you need to use a dedicated object detection algorithm.
One of the most common object detectors is the Viola-Jones algorithm, also known asHaar cascades.
The Viola-Jones algorithm was published back in 2001 but is still used today (although Deep Learning-based object detectors obtain far better accuracy).
Now that we’ve seen how HOG + Linear SVM works in practice, let’s dissect the algorithm a bit.
To start, the HOG + Linear SMV object detectors uses a combination of sliding windows, HOG features, and a Support Vector Machine to localize objects in images.
Image pyramids allow us to detect objects at different scales (i.e., objects that are closer to the camera as well as objects farther away):
Finally, you need to understand the concept of non-maxima suppression, a technique used in bothtraditional object detection as well as Deep Learning-based object detection:
When performing object detection you’ll end up locating multiplebounding boxes surrounding a single object.
This behavior is actually a good thing— it implies that your object detector is working correctly and is “activating” when it gets close to objects it was trained to detect.
The problem is that we now have multiple bounding boxes for oneobject.
To rectify the problem we can apply non-maxima suppression, which as the name suggestions, suppresses (i.e., ignores/deletes) weak, overlapping bounding boxes.
The term “weak” here is used to indicate bounding boxes of low confidence/probability.
If you are interested in learning more about the HOG + Linear SVM object detector, including:
How totrain your own custom HOG + Linear SVM object detector
The inner-workings of the HOG + Linear SVM detector
Inside the course you’ll find 30+ lessons on HOG feature extraction and the HOG + Linear SVM object detection algorithm.
Step #5: Your First Deep Learning Object Detector (Intermediate)
For ~10 years HOG + Linear SVM (including its variants) was considered the state-of-the-art in terms of object detection.
However, Deep Learning-based object detectors, including Faster R-CNN, Single Shot Detector (SSDs), You Only Look Once (YOLO), and RetinaNethave obtained unprecedented object detection accuracy.
The OpenCV library is compatible with a number of pre-trained object detectors — let’s start by taking a look at this SSD:
Step #7: Deep Learning Object Detectors (Intermediate)
For a deeper dive into Deep Learning-based object detection, including how tofilter/remove classes that you want to ignore/not detect, refer to this tutorial:
The YOLO object detector is designed to be super fast; however, it appears that the OpenCV implementation is actually far slowerthan the SSD counterparts.
I’m not entirely sure why that is.
Furthermore, OpenCV’s Deep Neural Network (dnn ) module does not yet support NVIDIA GPUs, meaning that you cannot use your GPU to improve inference speed.
OpenCV is reportedly working on NVIDIA GPU support but it may not be until 2020 until that support is available.
Step #8: Evaluate Deep Learning Object Detector Performance (Intermediate)
If you decide you want to train your own custom object detectors from scratch you’ll need a method to evaluate the accuracy of the model.
To do that we use two metrics: Intersection over Union (IoU)and mean Average Precision (mAP)— you can read about them here:
Step #9: From Object Detection to Semantic/Instance Segmentation (Intermediate)
If you’ve followed along so far, you know that object detection produces bounding boxesthat report the locationand class labelof each detected object in an image.
But what if you wanted to extend object detection to produce pixel-wise masks?
These masks would not only report the bounding box location of each object, but would report which individual pixels belong to the object.
Step #10: Object Detection on Embedded Devices (Advanced)
Deep Learning-based object detectors, while accurate, are extremelycomputationally hungry, making them incredibly challenging to apply them to resource constrained devices such as the Raspberry Pi, Google Coral, and NVIDIA Jetson Nano.
As the name suggestions, this book is dedicated to developing and optimizing Computer Vision and Deep Learning algorithms on resource constrained devices, including the:
Raspberry Pi
Google Coral
Intel Movidius NCS
NVIDIA Jetson Nano
Inside you’ll learn how to train your own object detectors, optimize/convert them for the RPi, Coral, NCS, and/or Nano, and then run the detectors in real-time!
Object Tracking
Object Tracking algorithms are typically applied afterand object has already been detected; therefore, I recommend you read the Object Detectionsection first. Once you’ve read those sets of tutorials, come back here and learn about object tracking.
Object detection algorithms tend to be accurate, but computationally expensive to run.
It may be infeasible/impossible to run a given object detector on everyframe of an incoming video stream and stillmaintain real-time performance.
Therefore, we need an intermediary algorithm that can accept the bounding box location of an object, track it, and then automatically update itselfas the object moves about the frame.
We’ll learn about these types of object tracking algorithms in this section.
Step #1: Install OpenCV on Your System (Beginner)
Prior to working through this section you’ll need to install OpenCV on your system.
Make sure you follow Step #1 of How Do I Get Started?to configure and install OpenCV.
Additionally, I recommend reading the Object Detectionsection first as object detection tends to be a prerequisite to object tracking.
Step #2: Your First Object Tracker (Beginner)
The first object tracker we’ll cover is a color-based tracker.
This algorithm combines bothobject detection and tracking into a single step, and in fact, is the simplestobject tracker possible.
You can read more about color-based detection and tracking here:
Our color-based tracker was a good start, but the algorithm will fail if there is more than one object we want to track.
For example, let’s assume there are multiple objectsin our video stream and we want to associate unique IDs with each of them— how might we go about doing that?
The answer is to apply aCentroid Trackingalgorithm:
Using Centroid Tracking we can not only associate unique IDs with a given object, but also detect when an object is lost and/or has left the field of view.
Multi-object tracking is, by definition, significantly more complex, both in terms of the underlying programming, API calls, and computationally efficiency.
Most multi-object tracking implementations instantiate a brand new Python/OpenCV class to handle object tracking, meaning that if you have Nobjects you want to track, you therefore have Nobject trackers instantiated — which quickly becomes a problem in crowded scenes.
Your CPU will choke on the load and your object tracking system will come to a grinding halt.
One way to overcome this problem is to use multiprocessinganddistribute the load across multiple processes/cores,thus enabling you to reclaim some speed:
Step #6: Applied Object Tracking and Counting (Intermediate)
So far you’ve learned how to apply single object tracking and multi-object tracking.
Let’s put all the pieces together and build a person/footfall counter applicationcapable of detecting, tracking, and counting the number of people that enter/exit a given area (i.e., convenience store, grocery store, etc.):
In particular, you’ll want to note how the above implementation takes a hybrid approach to object detection and tracking, where:
The object detector is only applied every Nframes.
One object tracker is created per detected object.
The trackers enable us to track the objects.
Then, once we reach the N-th frame, we apply object detection, associate centroids, and then create new object trackers.
Such a hybrid implementation enables us to balance speedwith accuracy.
Where to Next?
Object tracking algorithms are more of an advanced Computer Vision concept.
If you’re interested in studying Computer Vision in more detail, I would recommend the PyImageSearch Gurus course.
This course is similar to a college survey in Computer Vision, but waymore practical, including hands-on coding and implementations.
Instance Segmentation and Semantic Segmentation
There are three primary types of algorithms used for image understanding:
Image classificationalgorithms enable you to obtain a single labelthat represents the contents of an image. You can think of image classification as inputting a single image to a network and obtaining a single labelas output.
Object detection algorithms are capable of telling you not only what is in an image, but also wherein the image a given object is. Object detectors thus accept a single input image and then returning multiple valuesas an output. The output itself is a list of values containing (1) the class label and (2) the bounding box (x, y)-coordinates of where the particular object is in the image.
Instance segmentation and semantic segmentation take object detection farther. Instead of returning bounding box coordinates, instance/semantic segmentation methods instead yield pixel-wise masksthat tell us (1) the class label of an object, (2) the bounding box coordinates of the object, and (3)the coordinates of the pixels that belong to the object.
These segmentation algorithms are intermediate/advanced techniques, so make sure you read the Deep Learningsection above to ensure you understand the fundamentals.
Step #1: Configure Your Development Environment (Beginner)
In order to perform instance segmentation you need to have OpenCV, TensorFlow, andKerasinstalled on your system.
Make sure you follow Step #1 from the How Do I Get Started?section to install OpenCV.
From there, follow Step #1 from the Deep Learningsection to ensure TensorFlow and Keras are properly configured.
Step #2: Segmentation vs. Object Detection (Intermediate)
Now that you have your deep learning machine configured, you can learn about instance segmentation.
Follow this guide to utilize your first instance segmentation network using OpenCV:
That guide will also teach you how instance segmentation is different from object detection.
Step #3: Applying Mask R-CNN (Intermediate)
Mask R-CNN is arguably the most popularinstance segmentation architecture.
Mask R-CNNs have been successfully applied to self-driving cars (vehicle, road, and pedestrian detection), medical applications (automatic tumor detection/segmentation), and much more!
This guide will show you how to use Mask R-CNN with OpenCV:
Step #4: Semantic Segmentation with OpenCV (Intermediate)
When performing instance segmentationour goal is to (1) detect objects and then (2) compute pixel-wise masks for each object detected.
Semantic segmentation is a bit different — instead of labeling justthe objects in an input image, semantic segmentation seeks tolabel every pixelin the image.
That means that if a given pixel doesn’t belong to any category/class, we label it as“background”(meaning that the pixel does not belong to any semantically interesting object).
Semantic segmentation algorithms are very popularforself-driving car applicationsas they can segment an input image/frame into components, including road, sidewalk, pedestrian, bicyclist, sky, building, background, etc.
To learn more about semantic segmentation algorithms, refer to this tutorial:
Applying Computer Vision and Deep Learning algorithms to resource constrained devices such as the Raspberry Pi, Google Coral, and NVIDIA Jetson Nano can be super challengingdue to the fact that state-of-the-art CV/DL algorithms are computationally hungry — these resource constrained devices just don’t have enough CPU power and sufficient RAM to feed these hungry algorithm beasts.
But don’t worry!
You can still apply CV and DL to these devices — you just need to follow these guides first.
Step #1: Configure Your Embedded/IoT Device (Beginner)
Before you start applying Computer Vision and Deep Learning to embedded/IoT applications you first need to choose a device.
I suggest starting with the Raspberry Pi— it’s a super cheap ($35) and easily accessible device for your initial forays into embedded/IoT Computer Vision and Deep Learning.
These guides will help you configure your Raspberry Pi:
If I’ve said it once, I’ve said it a hundred times — the best way to learn Computer Vision is through practical, hands-on the projects.
The same is true for Embedded Vision and IoT projects as well.
To gain additional experience building embedded CV projects, follow these guides to work with video on embedded devices, including working with multiple camerasand live streaming video over a network:
From there you’ll want to go through the steps in theDeep Learningsection.
Finally, if you want to integrate text message notifications into the Computer Visions security system we build in the previous step, then read this tutorial:
Step #4: Image Classification on Embedded Devices (Intermediate)
If you followed Step #3 then you found out that running Deep Learning models on resource constrained devices such as the Raspberry Pi can be computationally prohibitive, preventing you from obtaining real-time performance.
In order to boost your Frames Per Second (FPS) throughput rate, you should consider using a coprocessor such as Intel’s Movidius NCSor Google’s Coral USB Accelerator:
This book is your one-stop shopfor learning how to master Computer Vision and Deep Learning on embedded devices.
Computer Vision on the Raspberry Pi
At only $35, the Raspberry Pi (RPi) is a cheap, affordable piece of hardware that can be used by hobbyists, educators, and professionals/industry alike.
The Raspberry Pi 4 (the current model as of this writing) includes a Quad core Cortex-A72 running at 1.5Ghz and either 1GB, 2GB, or 4GB of RAM (depending on which model you purchase) — all running on a computer the size of a credit card.
But don’t let its small size fool you!
The Raspberry Pi can absolutelybe used for Computer Vision and Deep Learning (but you need to know how to tune your algorithms first).
Step #1: Install OpenCV on the Raspberry Pi (Beginner)
Prior to working through these steps I recommend that you first work through the How Do I Get Started? section first.
Not only will that section teach you how to install OpenCV on your Raspberry Pi, but it will also teach you the fundamentals of the OpenCV library.
If you find yourself struggling to get OpenCV installed on your Raspberry Pi, take a look at both:
Assuming you now have OpenCV installed on your RPi, you might be wondering about development best practices — what is the best way to write code on the RPi?
Should you install a dedicated IDE, such as PyCharm, directly on the Pi itself and code there?
Should you use a lightweight code editor such as Sublime Text?
Or should you SSH/VNC in to the RPi and edit the code that way?
You could potentiallydo all three of those, but my favorite is to use either PyCharm or Sublime Text on my laptop/desktop with a SFTP plugin:
Doing so enables me to code using my favorite IDE on my laptop/desktop.
Once I’m done editing a file, I save it, after which the file is automaticallyuploaded to the RPi.
It does take some additional time to configure your RPi and laptop/desktop in this manner, but once you do, it’s so worth it!
Step #3: Access your Raspberry Pi Camera or USB Webcam (Beginner)
Now that your development environment is configured, you should verify that you can access your camera, whether that be a USB webcamor the Raspberry Pi camera module:
Step #4: Your First Computer Vision App on the Raspberry Pi (Beginner)
The Raspberry Pi is naturally suited for home security applications, so let’s learn how we can utilize motion detectionto detect when there is an intruder in our home:
Step #5: OpenCV, GPIO, and the Raspberry Pi (Beginner)
If you want to use the GPIO to control additional hardware, specifically Hardware on Top (HATs), you should study how OpenCV and GPIO can be used together on the Raspberry Pi:
Step #6: Facial Applications on the Raspberry Pi (Intermediate)
Facial applications, including face recognitioncan be extremely tricky on the Raspberry Pi due to the limited computational horsepower.
Algorithms that worked well on our laptop/desktop may not translate well to our Raspberry Pi, so therefore, we need to take care to perform additional optimizations.
These tutorials will get you started applying facial applications on the RPi:
Step #7: Apply Deep Learning on the Raspberry Pi (Intermediate)
Deep Learning algorithms are notoriously computationally hungry, and given the resource constrained nature of the RPi, CPU and memory come at a premium.
To discover why Deep Learning algorithms are slow on the RPi, start by reading these tutorials:
That book will teach you how to use the RPi, Google Coral, Intel Movidius NCS, and NVIDIA Jetson Nano for embedded Computer Vision and Deep learning applications.
And just like all my tutorials, each chapter of the text includes well documented code and detailed walkthroughs, ensuring that you understand exactlywhat’s going on.
Computer Vision and Deep Learning algorithms have touched nearly every facet of Computer Science.
One area that CV and DL algorithms are making a massiveimpact on is the field of Medical Computer Vision.
Using Medical Computer Vision algorithms, we can now automatically analyze cell cultures, detect tumors, and even predict cancerbefore it even metastasizes!
Step #1: Configure Your Development Environment (Beginner)
Step #2 and #3 of this section will require that you have OpenCVconfigured and installed on your machine.
Make sure you follow Step #1 from the How Do I Get Started?section to install OpenCV.
Step #4 covers how to use Deep Learning for Medical Computer Vision.
You will need to have TensorFlowand Kerasinstalled on your system for those guides.
You should follow Step #1 from the Deep Learningsection to ensure TensorFlow and Keras are properly configured.
Step #2: Your First Medical Computer Vision Project (Beginner)
Our first Medical Computer Vision project uses only basic Computer Vision algorithms, thus demonstrating how even basic techniques can make a profoundimpact on the medical community:
Step #4: Solve Real-World Medical Computer Vision Projects (Advanced)
Our previous sections dealt with applying Deep Learning to a small medical image dataset.
But what about largermedical datasets?
Can we apply DL to those datasets as well?
You bet we can!
The following two guides will show you how to use Deep Learning to automatically classify malaria in blood cellsand perform automatic breast cancer detection:
Take your time working through those guides and make special note of how we compute the sensitivityand specificity, of the model — two key metricswhen working with medical imaging tasks that directlyimpact patients.
Where to Next?
As I mention in myAbout page, Medical Computer Vision is a topic near and dear to my heart.
Previously, my company has consulted with the National Cancer Institute and National Institute of Health to develop image processing and machine learning algorithms to automatically analyze breast histology images for cancer risk factors.
I’ve also developed methods to automaticallyrecognize prescription pills in images, thereby reducing the number of injuries and deaths that happen each year due to the incorrect medication being taken.
I continue to write about Medical Computer Vision, so if you’re interested in the topic, be sure to keep an eye on the PyImageSearch blog.
Once you’ve confirmed you can access the RPi camera module you can use theVideoStream class which is compatible with bothbuilt-in/USB webcams andthe RPi camera module:
Inevitably, there will be a time where OpenCV cannot access your camera and your script errors out, resulting in a “NoneType” error — this tutorial will help you diagnose and resolve such errors:
Face detection is a special class ofobject detection.
Object detectors can be trained to recognize just about any type of object.
The OpenCV library enables us to use pre-trained object detectorsto detect common objects we encounter in our daily lives (people, cars, trucks, dogs, cats, etc.).
The following tutorials will teach you how toapply object detection to video streams:
Step #8: Video Classification with Deep Learning (Advanced)
For this step I’ll be making the assumption that you’ve worked through the first half of the Deep Learningsection.
Provided that you have, you may have noticed that applying image classificationto video streamsresults in a sort of prediction flickering.
A “prediction flicker” occurs when an image classification model reports Label Afor Frame N, but then reports Label B(i.e., a different class label) for Frame N + 1(i.e., the next frame in the video stream), despite the frames having near-identical contents!
Prediction flickering is a natural phenomena in video classification.
It happens due to noise in the input frames confusing the classification model.
One simple method to rectify prediction flickering is to apply prediction averaging:
Using prediction averaging you can overcome the prediction flickering problem.
Additionally, you may want to look into more advanced Deep Learning-based image/video classifiers, including Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs).
Where to Next?
If you’re brand new to the world of Computer Vision and Image Processing, I would recommend you read Practical Python and OpenCV.
That book will teach you the basics of Computer Vision through the OpenCV library — and best of all, you can complete that book in only a single weekend.
It’s by farthe fastest way to get up and running with OpenCV.
And furthermore, the book includes complete code templates and examples for working with video files and live video streams with OpenCV.
Content-based Image Retrieval (CBIR)is encompasses all algorithms, techniques, and methods to build an image search engine.
An image search engine functions similar to a text search engine (ex., Google, Bing, etc.).
A user visits the search engine website, but instead of having a text query (ex., “How do I learn OpenCV?”) they instead have an imageas a query.
The goal of the image search engine is to accept the query image and find all visually similarimages in a given dataset.
CBIR is the primary reason I started studying Computer Vision in the first place. I found the topic fascinating and am eager to share my knowledge with you.
Step #1: Install OpenCV on your System (Beginner)
Before you can perform CBIR or build your first image search engine, you first need to install OpenCV your system.
Follow Step #1 of the How Do I Get Started?section above to configure OpenCV and install it on your machine.
Step #2: Build Your First Image Search Engine (Beginner)
The first image search engine you’ll build is also one of the firsttutorials I wrote here on the PyImageSearch blog.
Using this tutorial you’ll learn how to search forvisually similarimages in a dataset using color histograms:
In Step #2 we built an image search engine that characterized the contents of an image based on color— but what if we wanted to quantify the image based on texture, shape, or some combination of all three?
How might we go about doing that?
In order to describe the contents of an image, we first need to understand the concept of image quantification:
As your CBIR system becomes more advanced you’ll start to include sub-steps between the main steps, but for now, understand that those four steps will be present in any image search engine you build.
In that case, you want want to find all duplicate/near-duplicate images in your dataset (as these duplicates provide no additional value to the dataset itself).
The techniques covered here will help you build your own basic image search engines.
The problem with these algorithms is they do not scale.
If you want to build more advanced image search engines that scale to millions of imagesyou’ll want to look into:
The Bag-of-Visual-Words model (BOVW)
k-Means clustering and forming a “codebook”
Vector quantization
Tf-idf weighting
Building an inverted index
The PyImageSearch Gurus courseincludes over 40+ lessons on building image search engines, including how to scale your CBIR system to millions of images.
If you’re interested in learning more about the course, and extending your own CBIR knowledge, just use the link below:
You can learn Computer Vision, Deep Learning, and OpenCV— I am absolutely confident in that.
And if you’ve been following this guide, you’ve seen for yourself how far you’ve progressed.
However, we cannot spend allof our time neck deep in code and implementation — we need to come up for air, rest, and recharge our batteries.
When then happens I suggest supplementing your technical education with a bit of light reading used to open your mind to what the world of Computer Vision and Deep Learning offers you.
After 5 years running the PyImageSearch blog I’ve seen countless readers dramatically change their lives, includingchanging their careersto CV/DL/AI, beingawarded funding, winning Kaggle competitions, and even becoming CTOs of funded companies!
It’s truly a privilege and an honor to be taking this journey with you — thank you for letting me accompany you on it.
Below you’ll find some of my favorite interviews, case studies, and success stories.
Step #1: A Day in the Life of Adrian Rosebrock (Beginner)
Ever wonder what it’s like to work as a Computer Vision/Deep Learning researcher and developer?
You’re not alone.
Over the past 5 years running PyImageSearch, I have received 100s of emails and inquiries that are “outside” traditional CV, DL, and OpenCV questions.
They instead focus on something much more personal — my daily life.
To give you an idea of what it’s like to be me, I’m giving you a behind the scenes look at:
How I spend my day.
What it’s like balancing my role as a (1) computer vision researcher/developer and (2) a writer and owner of PyImageSearch.
The habits and practices I’ve spent years perfecting to help me get shit done.
In the podcast we discuss Computer Vision, Deep Learning, and what the future holds for the fields.
I highly recommend listening to this podcast, regardless if you are brand new to Computer Vision or already a seasoned expert — it’s both entertaining and educational at the same time.
Step #4: From Developer to CTO (Beginner)
Saideep Talari’s story holds a special place in my heart.
David Austin and his teammate, Weimin Wang, took home 1st place (and $25,000) in Kaggle’s Iceberg Classifier Challenge (Kaggle’s most competitive challenge ever).
Step #7: Landing a Research and Development (R&D) Position (Beginner)
Kapil Varshney was recently hired at Esri R&D as a Data Scientist focusing on Computer Vision and Deep Learning.
Kapil’s story is really important as it shows that, no matter what your background is, you can be successful in computer vision and deep learning — you just need the right education first!
Soon after reading DL4CV, Kapil competed in a challenge sponsored by Esri to detect and localize objects in satellite images (including cars, swimming pools, etc.).
He finished in 3rd-place out of 53 competitors.
Esri was so impressed with Kapil’s work that after the contest they called him in for an interview.
Kapil nailed the interview and was hired full-time at Esri R&D.
His work on satellite image analysis at Esri now impacts millions of people across the world daily — and it’s truly a testament to his hard work.
I can’t promise you that you’ll win a Kaggle competition like David or become the CTO of a Computer Vision company like Saideep did, but I canguarantee you that the books and courses I offer here on PyImageSearch are the best resourcesavailable today to help you master computer vision and deep learning.
If you’d like to follow in their steps, you can see what books and courses I offer here:
In-depth dive into the world of computer vision and deep learning.
Whether this is the first time you’ve worked with machine learning and neural networks or you’re already a seasoned deep learning practitioner, DL4CV is engineered from the ground up to help you reach expert status.