Wednesday, 17 January 2018

Object Recognition using Microsoft CNTK

Lately I've been working on some advanced object recognition scenarios. I was looking for appropriate tools to do the job. Obviously I thought about using some kind of deep learning solution, but I was afraid of its complexity. I'm not a data scientist and have a very limited knowledge of neural networks. I needed a tool, which I could use as a black box: feed some data into it and then consume the result, without even understanding what's going on under the hood.

I was advised by domain experts to check out the Microsoft Cognitive Toolkit (CNTK). The big advantage of CNTK is rich documentation and end-2-end examples available. In my scenario (Object Recognition) I found following resources particularly useful:

  • Object Detection using CNTK

    This tutorial was my entry point. You will find there exact, step-by-step instructions on how to build an object recognition solution together with sample data sets. It also provides some scientific background for those who want to learn how this works.

  • End-2-end solution

    This one bases on the original tutorial from the first link. However, it takes it further by introducing the complete E2E solution, including:

    • managing reference pictures
    • building repository for metadata
    • training the object recognition model
    • managing training results
    • Advanced reporting based on PowerBI
    • Publishing the CNTK model as a web service, so recognition results can be easily consumed.

  • Upgrade to Faster R-CNN

    First 2 tutorials are using an algorithm called Fast R-CNN. It is good, but CNTK released also its improved version called Faster R-CNN. This tutorial provides scripts that utilize that improved algorithm. Because it bases on the same sample dataset, you can simply compare results from both tutorials. From my experience, Faster R-CNN is not actually quicker, but it provides better recognition rates.

After going through these 3 you should be able to build an object recognition solution on your own, without having any data-science background.