This post demonstrates a Onepanel use case using an example of license plate detection and OCR.
As deep learning models are getting more accurate in a myriad of tasks in computer vision and natural language processing, more and more companies are adopting these new more accurate methods for various business use cases. Since most of these deep learning models are compute-intensive, it becomes increasingly difficult to manage these systems as more and more users use these systems. That is why there is an increasing demand for systems that not only help to create end-to-end pipelines but also scales as the compute requirement increases. In this post, we will use an example of license plate detection and license number detection (OCR) to demonstrate how Onepanel can help you streamline this process of building deep learning pipelines on Kubernetes which are easily scalable and portable.
We will be building a license plate detector and then apply optical character recognition to identify the number from the license plate. For this, we will start by collecting the required data to train both of these models, annotate these samples, train models, and finally create Onepanel Workflows. These Workflows can be run on any cloud provider or a local machine with auto-scaling. Since Onepanel runs on the Kubernetes, it is very easy to scale your Workspaces or Workflows or change machines for your Workspaces (i.e JupyterLab) at any time without losing your data.
For license plate detection, we will be using a mixture of data from Kaggle and other sources such as this one. The following steps demonstrate how to train object detection models on Onepanel without writing any code.
First of all, we need to launch a CVAT workspace where we will annotate our images and we can also train a model from CVAT. Go to WORKSPACES page, click on CREATE WORKSPACE, and select CVAT. For more information on various parameters, see this guide.
Now that the CVAT workspace is up and running. Next, create a task in CVAT with those license plate images. There will be only one label license. In the demo environment, the data can be found inside raw-input/license-plates directory.
Once the annotation is done, you can train the object detection (or semantic segmentation) model from CVAT. Click on Execute training Workflow from a CVAT task and select TF Object Detection Training workflow. Here, you can change some parameters or just use as it is. For this demo, Faster RCNN 101 model was trained on a K-80 GPU.
If you want to annotate more images, you can use this model to pre-annotate these images by clicking on Automatic annotation. Note that you need to upload a model first.
Since we will be creating a Workflow eventually, we will need a Python script that can run inference on images. When you run a TF Object Detection Training Workflow on Onepanel, it also exports a frozen graph for inference. We can write a script to run the inference as follows.
We consider bounding boxes with confidence greater than 0.5 only. You can find the complete script here to run the inference.
Now that we have a model that can detect the license plates. We will train a model that can detect text from those detected license plates. We will be using a popular model called Attention-OCR for this task. TensorFlow has a nice implementation of Attention-OCR here, which we will be using for this demo. The following steps demonstrate how to train this model on Onepanel.
This implementation requires an image and corresponding text file. So, this cannot be annotated on CVAT. However, you can use VS Code or JupyterLab to annotate these images. These images were cropped from the same dataset we used for license plate detection. We tried with two different annotation methods. In one, we included all the information such as the state or other text present on the license plate. In the second one, we only considered the license plate number. An annotated dataset can be found in the Onepanel demo environment in annotation-dump/ directory.
Also, since we just need a license plate in this case, original images need to be cropped. You can dump the annotations from CVAT and use it for cropping as follows.
Like many other TensorFlow implementations, this one also requires input data to be in TFRecord format. Note that the input needs to be of a fixed size, so we pad the input with nulls if its length is less than the max-length. You may use JupyterLab workspace to perform such operations. A Python script used to generate TFRecords can be found here.
The implementation used here requires a config file like this one for each dataset we want to train the model on. Once this is done, model training can be started using train.py script. We can use the provided pre-trained model to fine-tune our model.
The TensorFlow implementation comes with the demo inference script that we can use to run the inference on test images. This requires a dataset config file (custom.py). The script needs to be modified as shown below to run the inference on a separate image and write the output to a file.
The complete script can be found here.
The Workflow for Object Detection training comes by default in the Onepanel CE. But since we are working on a custom model, we need to create a Workflow so that it is easily reproducible and scalable. If you are new to Onepanel, it might be helpful to take a look at this and this guide to better understand the concept of Workflows.
Since Onepanel does not have a training Workflow for this model by default, we need to create a new Workflow for the same. We can use TF Object Detection Training Workflow as a base template. We already have everything in place, we just need to update the parameters, and commands we execute.
We added some parameters that we may take from the user. The dump-format parameter is used when the workflow is being executed from CVAT since CVAT needs to know the dump format. The current implementation accepts the COCO format.
Now that both models are ready, we can go ahead and create a final Workflow which takes images as input and generates the OCR output. Unlike other Workflows, this Workflow will have multiple steps.
We will first define a container that detects license plates, then this output will be written in a JSON file. This is done by the script which was referenced earlier.
Here is how our DAG step looks like for this container.
Next, this will be followed by an OCR model as follows.
Note that this container takes original data as an input as well since it crops license plates using coordinates produced by the previous model. Once you run this Workflow, the output will be saved in the /mnt/output/ which we can see directly from the Onepanel Workflow page by clicking on the detect-ocr node, then Artifacts. Also note that the final Workflow has one more node preprocess-input-data which we are not using in this demo. But if you want to add any pre-processing, feel free to modify that.
In this post, you saw how you can leverage various components of Onepanel to easily build end-to-end deep learning pipelines that are portable and scalable.
You can also try out this Workflow and other components such as CVAT on Onepanel.