Unforseen Image Analysis With Google Cloud Vision And Python

Quite recently, I’ve built a web app to manage user’s personal expenses. Its main features are to scan shopping receipts and extract data for further processing. Google Vision API turned out to be a great tool to get a text from a photo. In this article, I will guide you through the development process with Python in a sample project.

If you’re a novice, don’t worry. You will only need a very basic knowledge of this programming language — with no other skills required.

Let’s get started, shall we?

Ever Heard Of Google Cloud Vision?

It’s an API that allows developers to analyze the content of an image through extracted data. For this purpose, Google utilizes machine learning models trained on a large dataset of images. All of that is available with a single API request. The engine behind the API classifies images, detects objects, people’s faces, and recognizes printed words within images.

To give you an example, let’s bring up the well-liked Giphy. They’ve adopted the API to extract caption data from GIFs, what resulted in significant improvement in user experience. Another example is realtor.com, which uses the Vision API’s OCR to extract text from images of For Sale signs taken on a mobile app to provide more details on the property.

Machine Learning:

Let’s start with answering the question many of you have probably heard before what is the Machine Learning?

The broad idea is to develop a programmable model that finds patterns in the data its given. The higher quality data you deliver and the better the design of the model you use, the smarter outcome will be produced. With ‘friendly machine learning’ (as Google calls their Machine Learning through API services), you can easily incorporate a chunk of Artificial Intelligence into your applications.

How To Get Started With Google Cloud

Let’s start with the registration to Google Cloud. Google requires authentication, but it’s simple and painless — you’ll only need to store a JSON file that’s including API key, which you can get directly from the Google Cloud Platform.

Download the file and add it’s path to environment variables:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/apikey.json

Alternatively, in development, you can support yourself with the from_serivce_account_json() method, which I’ll describe further in this article. To learn more about authentication, check out Cloud’s official documentation.

Google provides a Python package to deal with the API. Let’s add the latest version of google-cloud-vision==0.33 to your app. Time to code!

HOW TO COMBINE GOOGLE CLOUD VISION WITH PYTHON

Firstly, let’s import classes from the library.

from google.cloud import vision
from google.cloud.vision import types

When that’s taken care of, now you’ll need an instance of a client. To do so, you’re going to use a text recognition feature.

client = vision.ImageAnnotatorClient()

If you won’t store your credentials in environment variables, at this stage you can add it directly to the client.

client = vision.ImageAnnotatorClient.from_service_account_file(
'/path/to/apikey.json'
)

Assuming that you store images to be processed in a folder ‘images’ inside your project catalog, let’s open one of them.

An example of a simple receipt that could be processed by Google Cloud Vision.
image_to_open = 'images/receipt.jpg'

with open(image_to_open, 'rb') as image_file:
    content = image_file.read()

Next step is to create a Vision object, which will allow you to send a request to proceed with text recognition.

image = vision.types.Image(content=content)

text_response = client.text_detection(image=image)

The response consists of detected words stored as description keys, their location on the image, and a language prediction. For example, let’s take a closer look at the first word:

[
...
description: "SHOPPING"
bounding_poly {
  vertices {
    x: 1327
    y: 1513
  }
  vertices {
    x: 1789
    y: 1345
  }
  vertices {
    x: 1821
    y: 1432
  }
  vertices {
    x: 1359
    y: 1600
  }
}
...
]

As you can see, to filter text only, you need to get a description “on all the elements”. Luckily, with help comes Python’s powerful list comprehension.

texts = [text.description for text in 
text_response.text_annotations]
['SHOPPING STORE\nREG 12-21\n03:22 PM\nCLERK
2\n618\n1 MISC\n1
STUFF\n$0.49\n$7.99\n$8.48\n$0.74\nSUBTOTAL
\nTAX\nTOTAL\nCASH\n6\n$9. 22\n$10.00\nCHANGE\n$0.78\nNO REFUNDS\
nNO EXCHANGES\nNO
RETURNS\n', 'SHOPPING', 'STORE', 'REG',
'12-21', '03:22', 'PM', 'CLERK', '2', '618', '1', 'MISC', '1', 'STUFF',
'$0.49', '$1.99', '$1.48', '$0.74',
'SUBTOTAL', 'TAX', 'TOTAL', 'CASH',
'3', '$1.', '55', '$63.00', 'CHANGE',
'$0.00', 'NO', 'REFUNDS', 'NO', 'EXCHANGES', 'NO', 'RETURNS']

If you look carefully, you can notice that the first element of the list contains all text detected in the image stored as a string, while the others are separated words. Let’s print it out.

print(texts[0])

SHOPPING STORE
REG 12-21
03:22 PM
CLERK 2
618
1 MISC
1 STUFF
$0.49
$7.99
$8.48
$0.74
SUBTOTAL
TAX
TOTAL
CASH
6
$9. 22
$10.00
CHANGE
$0.78
NO REFUNDS
NO EXCHANGES
NO RETURNS

Pretty accurate, right? And obviously quite useful.



Leave a Reply

Your email address will not be published. Required fields are marked *