Photo by Matt Noble on Unsplash

Create Visual Recognition Apps Without Writing Code

…and with no dedicated hardware

Cobus Greyling

--

You are Here…

So, you have seen many cool videos on LinkedIn and Twitter where someone built a visual recognition engine. And there they sit, holding up cups, phones, glasses etc. which are detected and tagged. Or faces and limbs are recognized. And you think, would I ever be able to build something like that?

The answer is simple, yes!

In General…

There are a few impediments in general to building visual recognition software. Firstly, most environments requirement a GPU (Graphics Processing Unit). Most tutorials demand a Nvidia GPU for processing and a default Intel GPU just won’t do. With that, the process of creating models and collecting images is arduous in most cases. The software installation process is daunting to say the least. Precariously balancing one software layer on top of another and finding constraints along the way, like version conflicts.

So you don’t want to invest in expensive hardware and you don’t want to build a precariously balanced software stack, installing software for hours just to end up with an incompatibility or a specific hardware requirement.

You just want to get to grips with the basics to create a visual recognition model and classify images against that model. Look no further…

We will be using IBM Watson Visual Recognition, this is our technology of choice for a few reasons. You can register with only an email address; no credit card details required. Also, it runs in the cloud, with no hardware or software requirements on the user side.

The free tier offers enough functionality for us to come to grips with the concepts.

Shining a Light of Dark Data aka Unstructured Data

Visual recognition is important not only for monitoring of live feeds of videos. But also to search existing video and with this, the audio associated with this video. There are many conversational systems which can benefit greatly by visual augmentation. The same way human hearing can benefit from visual input.

This project can be found here.

Dark Vision is an application that processes videos to discover what’s inside of them. By analyzing individual frames and audio from videos with IBM Watson Visual Recognition and Natural Language Understanding, Dark Vision builds a summary with a set of tags, famous people or landmarks detected in the video. Use this summary to enhance video search and categorization.

This is an example of how videos and pictures can be analyzed.

Follow the instruction of this GitHub project and create your own environment to analyze video and images.

Audio Keywords, Entities, Concepts and Video Transcripts can be added

Cities from Space

This is also an ideal GitHub project to start of with, you can find it here. Once you have completed this Code Pattern, you will have a clear understanding of the following…

Using IBM Watson Visual Recognition Cities from Space

Utilize images, in this case from the International Space Station, to train a Visual Recognition custom classifier. Hence you will create you own models and train them.

Create a Node.js server that can utilize the Watson Visual Recognition service for classifying images. Don’t be daunted by this task, it is very straight forward and no coding is required.

Your server will initialize a Visual Recognition custom classifier when it starts up.

This will help you to classify images of cities from space using Watson Visual Recognition.

Build Your Own Custom Models

The basis for this project can be found here.

This project allows you to easily create different models with as little as fifty images. Once these models are created, you can present these models with a picture, and the application will be able to tell you of there is a match of the presented image to any of the models.

The confidence as a percentage is available for you, should you have to take a desiccation based on the outcome.

Facial Recognition Solution built with this Framework

What makes this application so handy, is that you can train the models of various subjects, be it faces, real world items, shapes and the like.

An Implementation where cars can be Detected

But what if we want to search for an occurrence or multiple occurrences within a single image…

Analyzing Bigger Images

This project leverages the Watson Visual Recognition service with image pre-processing techniques to deliver localized image classification. For example, “show me where there is rust on the bridge”.

The basis for this project can be found here.

Traffic is analyzed for build-up on a Network of Roads

The user drags & drops an image onto the application within the browser, and the image is uploaded to the Node.js application. Once uploaded, the image is “chopped up” into smaller images (tiles) and each individual tile is analyzed by the Watson Visual Recognition service.

A Tornado Path is detected from an Arial Photograph

Once complete, all results are visualized within the browser in a heatmap-like visualization, where colorization is based on the confidence scores being returned by the Visual Recognition service’s custom classifier.

This is the GitHub example where Rust on Metal is Detected.

Now you should have a good understanding of creating and training a visual recognition model and testing it against real world examples. All in the cloud, all made available to you via API calls.

--

--