Create a custom Appsody stack with support for Python Flask and Tesseract

When you need to extract text out of scanned documents to update them or for further processing, you can use Tesseract, an Optical Character Recognition (OCR) engine that supports more than 100 languages. In this tutorial, I show you how to create a custom Appsody stack with Python Flask and Tesseract support, so you can quickly create a OCR service for any of the supported languages. I also show you how to build and test the stack with sample code.

As a refresher, Appsody is is an open source project that inclues a command line interface (CLI) and a set of preconfigured technology stacks. The stacks, like the Python Flask stack in this tutorial, build a pre-configured Docker image which is ready for you to deploy in a cloud environment. These Docker images can include any amount of customized content and allow stack builders to decide which parts are fixed (stack image) and which parts application developers can modify or extend (templates).

Tesseract is an OCR engine with support for Unicode and the ability to recognize more than 100 languages out of the box. It can be trained to recognize other languages. Learn more about this open source project.

Learning objectives

After competing this tutorial, you will understand how to:

  • Create a custom Appsody Python Flask stack with Tesseract support.
  • Build and test the stack with sample code.
  • Deploy an image to an OpenShift cluster on IBM Cloud.

Prerequisites

To complete the steps in this tutorial, you need to:

Estimated time

Completing this tutorial should take about 30 minutes.

Steps

  1. Create copy of Python Flask Appsody stack.
  2. Modify the Python Flask stack to add support for Tesseract.
  3. Build the stack.
  4. Create an Appsody project using the new stack.
  5. Test the stack.
  6. Deploy to an OpenShift cluster on IBM Cloud.

1. Create a copy of an Appsody Python Flask stack

Run the command to make a copy:

appsody stack create python-flask-tesseract --copy incubator/python-flask

You should see a python-flask-tesseract folder created.

2. Modify the Python Flask stack to add support for Tesseract

Now that you have your stack, let’s add support for Tesseract.

  1. Use the following command to initiate the customization:

     $ cd python-flask-tesseract
    
  2. Open the file Dockerfile-stack under the image folder.

  3. In the file, under FROM python:3.7, add the code below. Tesseract has support for many languages, so for the purpose of this tutorial, we chose to test for hin (Hindi), which is what you see in the code.

     RUN apt-get update
     RUN apt-get -y install \
         tesseract-ocr \
         tesseract-ocr-hin
     RUN apt-get clean
     RUN pip install --upgrade pip; \
         pip install \
         pillow \
         pytesseract \
         argparse
    

    Note: Based on the language support you need, you will need to change the entry tesseract-ocr-hin that appears in the below script with the entry for the language support that you want.

  4. Save the file

  5. Next, open the file Dockerfile under folder image/project. Add the following lines after the first line FROM python:3.7 as the code below shows.

FROM python:3.7

RUN apt-get update
RUN apt-get -y install \
    tesseract-ocr \
    tesseract-ocr-hin
RUN apt-get clean
RUN pip install --upgrade pip; \
    pip install \
    pillow \
    pytesseract \
    argparse

Congratulations! You’ve added support for Tesseract to your Python Flask stack. Now let’s package the stack.

3

Build the stack

Go to the python-flask-tesseract folder in your project and run the below command:

  appsody stack package

This builds the stack into a local Appsody repository (called dev.local). You can now create Appsody projects based on the newly created stack.

4

Create an Appsody project using the new stack

  1. Create a new empty folder anywhere on your local file system and name it; for this tutorial, we named our folder example.

  2. Create an Appsody project inside the newly created folder by running the following command:

      $ cd example
      $ appsody init dev.local/python-flask-tesseract
    
  3. Create a folder named templates.

      $ mkdir templates
      $ cd templates
    
  4. Add a file index.html to the templates folder with the below content:

      <!doctype html>
     <html lang="en">
        <p class="text-left">Demonstration of OCR using Python, Tesseract 4.0.</p>
         <p>Upload an image of a hindi document for OCR.<p>
         </p>
         </p>
         <div class="upload-form">
           <form action = "/uploader" method = "POST"
             enctype = "multipart/form-data">
             <input type = "file" name = "file" />
             <input type = "submit"/>
           </form>
         </div>
      </html>
    
  5. Add a file text.html to the templates folder with the below content:

     <!doctype html>
     <html lang="en">
         <div>
             <p class="text-left">OCR Text from processed Image</p>
             <textarea cols="80" rows="60">{{ displaytext }}</textarea>
         </div>
     </html>
    
  6. Modify the __init__.py file

  7. Make changes to the existing import statements and add other required import statements. The import statements section should look like the one below:

     from flask import Flask, redirect, render_template, request
     from werkzeug import secure_filename
     import os
     import sys
     from PIL import Image
     import pytesseract
     import argparse
     from flasgger import Swagger
     from server import app
     from server.routes.prometheus import track_requests
    
  8. Create and initialize variables

    Add the following statements below the import section. These statements tell the Flask application that the HTML files are in the templates folder. They also indicate the upload folder path for the images or scanned documents from which the text needs to be extracted.

     app=Flask(__name__,template_folder='templates')
     UPLOAD_FOLDER = '.'
     app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
    
  9. Create a function and add a route to index.html

     @app.route("/home")
     def home():
       return render_template("index.html")
    
  10. Create a function and add a route for uploader

     @app.route('/uploader', methods = ['GET', 'POST'])
     @track_requests
     def upload_file():
        if request.method == 'POST':
           f = request.files['file']
    
           # create a secure filename
           filename = secure_filename(f.filename)
    
           # save file 
           filepath = os.path.join(app.config['UPLOAD_FOLDER'],filename)
           f.save(filepath)
    
           # perform OCR on the processed image with HINDI text
           text = pytesseract.image_to_string(Image.open(filepath),lang = 'hin')
    
           return render_template("text.html", displaytext=text, fname=filename)
    

5. Test your stack

  1. Go to the example folder in your project diretory and run the following commands to build and run the project:

    $ appsody build
    $ appsody run
    
  2. Open the URL: http://localhost:8080/home.

    home

  3. To test the service for the image, follow these steps:

    tree

    1. Click on Browse and upload the image.
    2. Click on Submit after selecting the image from a local folder.

      The extracted text is displayed as shown below:

      tree

  4. You can see the health of the container at : http://localhost:8080/health. If the status is “UP”, that means it’s healthy.

     {"status":"UP"}
    
  5. You can check your application’s metrics at: http://localhost:8080/metrics

     ...
     # HELP requests_for_routes_total Number of requests for specififed routes
     # TYPE requests_for_routes_total counter
     requests_for_routes_total{endpoint="/home",method="GET"} 2.0
     requests_for_routes_total{endpoint="/uploader",method="POST"} 2.0
     # TYPE requests_for_routes_created gauge
     requests_for_routes_created{endpoint="/home",method="GET"} 1.5712948702805943e+09
     requests_for_routes_created{endpoint="/uploader",method="POST"} 1.571294892532074e+09
    

6. Deploy to an OpenShift cluster on IBM Cloud

The `appsody build’ command will locally build a Docker image of your Appsody project. The following output shows what happens when we run the command:

$ docker images example
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
example             latest              e04e2c3f263f        12 seconds ago      1.09GB
  1. Log in to OpenShift.

     oc login https://xxxx.containers.cloud.ibm.com:xxxxx --token=xxxxxxxxxxx
    
  2. Create a route for your Docker registry if not already created.

      $ oc project default
      $ oc get svc
    

    The output appears as shown below:

      NAME               TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
      docker-registry    ClusterIP      172.21.xxx.xx    <none>           5000/TCP                     18h
      kubernetes         ClusterIP      172.21.x.x       <none>           443/TCP,53/UDP,53/TCP        18h
      myfirstosdeploy    ClusterIP      172.21.xx.xxx    <none>           5000/TCP                     17h
      registry-console   ClusterIP      172.21.xxx.xxx   <none>           9000/TCP                     18h
      router             LoadBalancer   172.21.xx.x      169.47.xxx.xxx   80:31297/TCP,443:30385/TCP   18h
    
  3. Run the following command to create a route to the Docker registry.

      $ oc create route reencrypt --service=docker-registry
    
  4. Check the create route details.

      $ oc get route docker-registry
    

    The output appears as shown below:

     NAME              HOST/PORT                                                                                                            PATH      SERVICES          PORT       TERMINATION   WILDCARD
      docker-registry   docker-registry-default.clustersiteam-5290cxxxxxxxxxxd1b85xxx-0001.us-east.containers.appdomain.cloud               docker-registry   5000-tcp   reencrypt     None
    
  5. Note the Docker registry URL that is displayed with the pattern — docker-registry-default.<cluster_name>-<ID_string>.<region>.containers.appdomain.cloud.

    Set it as a variable.

     export IMAGE_REGISTRY=docker-registry-default.<cluster_name>-<ID_string>.<region>.containers.appdomain.cloud
    
  6. Log in to the Docker registry.

      docker login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY
    
  7. Create a new project.

     oc new-project example
    
  8. Deploy the image to the registry on OpenShift.

     appsody deploy --tag example/example:latest --push-url $IMAGE_REGISTRY --push --pull-url docker-registry.default.svc:5000
    
  9. Create a new OpenShift app.

     oc new-app --image-stream=example --name=example
    
  10. Expose the route.

     oc expose svc/example
    

You can see the application deployed under the example project on the OpenShift web console.

OS

Balaji Kadambi
Rahul Reddy Ravipally