In this article, we'll learn how to implement Region proposal object detection with OpenCV, Keras and TensorFlow.
In this step, we'll read the image and apply the selective search method from OpenCV to it. This method will return as a list of rectangles which are basically the region of interest. OpenCV provides us with two different methods for this selective search, one is the "FASTER" method and the other is the "Accurate" method, you have to decide which to use depending on your use case.
Now that we have the rectangles before we go further let's try to visualize what regions of interest it returns.
These are all the Region of Interest that our function receives after filtering out the ROIs which are not sufficiently large, that is to say if the ROI has less than 10% of width or height than that of the image than we won't consider it.
We'll create two separate lists which contain the image in RGB format and another list will have the bounding box coordinates. These lists will be used for prediction and creating bounding boxes respectively. We'll also make sure that we only make predictions on sufficiently large ROI, say which have at least 20% width or height of our image.
Now that we have our region of interests which we have filtered and also preprocessed, let's use them to create predictions using our model.
We're using the ResNet50 model from the Keras pre-trained models, mainly because it's not heavy on the machine and also has a high accuracy. So, first, we'll create our model instance and then pass in our input -> List of ROIs and generate predictions.
Now, that we have predictions let's s show the result on the image.
In this step, we'll create a new dictionary that basically contains the label as the key as the bounding box, and probability as the values. This we'll easily let us access the predictions for each label and apply non_max_suppression to them. We can do this by looping through the predictions and filtering out the predictions with more than 90% confidence( you can change it to your needs). Let's see the code:
{'img': [((126, 295, 530, 800), 0.5174897), ((166, 306, 497, 613), 0.510667), ((176, 484, 520, 656), 0.56631094), ((161, 304, 499, 613), 0.55209666), ((161, 306, 504, 613), 0.6020483), ((161, 306, 499, 613), 0.54256636), ((140, 305, 499, 800), 0.5012991), ((144, 305, 516, 800), 0.50028765), ((162, 305, 499, 642), 0.84315413), ((141, 306, 517, 800), 0.5257749), ((173, 433, 433, 610), 0.56347036)], 'matchstick': [((169, 633, 316, 800), 0.56465816), ((172, 633, 313, 800), 0.7206488), ((333, 639, 467, 800), 0.60068905), ((169, 633, 314, 800), 0.693922), ((172, 633, 314, 800), 0.70851576), ((167, 632, 314, 800), 0.6374499), ((172, 633, 316, 800), 0.5995729), ((169, 640, 307, 800), 0.67480534)], 'guillotine': [((149, 591, 341, 800), 0.59910816), ((149, 591, 338, 800), 0.7370558), ((332, 633, 469, 800), 0.5568006), ((142, 591, 341, 800), 0.6165994), ((332, 634, 468, 800), 0.63907826), ((332, 633, 468, 800), 0.57237893), ((142, 590, 321, 800), 0.6664309), ((331, 635, 467, 800), 0.5186203), ((332, 634, 467, 800), 0.58919555)], 'water_tower': [((144, 596, 488, 800), 0.50619787)], 'barber_chair': [((165, 465, 461, 576), 0.5565266)]}
As you can see it's a dictionary where the label, 'rocking chair', is the key and we have a list of tuple which have bounding boxes and probabilities stored for this label as values.
Look at the objects dictionary again if you don't already know, we have multiple bounding boxes for a single label, so won't there be a cluster in the image if directly show it on the image?Â
Therefore, we need to use the non_max_suppression method which we'll solve this problem for us. But to use this function we need an array of bounding boxes and an array of probabilities, and it returns us an array of bounding boxes.