Hands-on Optical Character Recognition (OCR)

Vaishnavi Sonawane
4 min readMay 21, 2022

--

Lockdown era. It’s exam season. You get up. You set up your machines. Exam starts. You open the question paper. You jot down your answers. You scan your answers using — CamScanner (which later got banned. MAKE IN INDIA), Adobe Scan or some other application. Hit upload. Phew.

This very concept of scanning information from physical documents & *digitizing* it is known as Optical Character recognition (OCR). We are living in the era of data & making the data publicly available. For that, digitizing information is the way to go. Now, we can digitize it in 2 ways:

  • Typing it out (Sounds like work)
  • Scanning it & let the machine do the work (All Hail OCR)

Alright. OCR is awesome. Now, How does it basically work?

(Basic working of OCR)

It comprises of 2 main steps: Scanning & then applying OCR on the scanned document. The scanning part is not much of a hassle as we can do it using scanners or by taking pictures of documents. But we don’t know how to apply OCR yet. So, let’s take a deep-dive into it, shall we?

Let’s say we have to implement an OCR algorithm from scratch. How should we approach this? I need to “recognize” the text from an image. Ah, Machine Learning & Image Processing shall come in handy, right? So, now we know what to apply. Now, let’s see how to apply that.

This shall work in following steps:

  • Acquiring Training Data
  • Pre-processing Data
  • Image Segmentation & Feature Extraction
  • Model Training & Optimization

Classic ol’ Machine Learning life-cycle. But, hold up! Let’s not get all caught up in Machine Learning right now. My goal is to familiarize you with OCR & it’s basic working hands-on. So for now, we will be using pytesseract library which is a wrapper for Google’s Tesseract-OCR Engine. This open-source library is nothing short of a blessing as it supports wide range of image formats such as jpeg,png,gif, tiff etc. & helps automate the text extraction procedure.

Alright. Pytesseract makes the whole text extraction part easy. But, to make sure that the accuracy of text recognition & extraction is good, it’s on us to pre-process the images. So, what image pre-processing techniques we could use? Mostly, we have to customize techniques according to our input images but to generalize, we can do the following:

  • Convert the image to grayscale
  • Apply thresholding & binarization
  • Apply skew correction if the image is rotated

Now, let’s not get overwhelmed with these terms just yet. To give you an overview, here’s what they mean:

a. Grayscaling: To convert an image of other color spaces (RGB, HSV, CMYK etc.) to shades of gray

b. Thresholding: To separate foreground & background of an image. Meaning, to ‘segment’ the image in 2 parts. We can also call it ‘binarizing’ the image since the output is just 2 colors — Black & White.

c. Skew Correction: Skew is nothing but a measure of ‘symmetry’ in image distribution. We evaluate this said distribution by plotting a histogram of an image. Then, based on the histogram evaluation, we check if the image/text is rotated & we rotate it again to make the image/text recognizable.

Let’s try it out. Suppose, we have to extract text from this image:

We use OpenCV library for the image pre-processing,Tkinter to create a GUI for selecting input images & pytesseract for the text extraction.

Now, since our input image is already in B/W format, there’s no immediate significant difference visible after applying grayscaling & thresholding.

Now, I am saving the output of our algorithm to a text file. Here’s our output:

Our algorithm successfully extracted text from our input image. Now let’s try it on this image:

After applying grayscaling & thresholding, we get —

Now, since our image is rotated, let’s see how well our skew correction works.

We dint achieve a 100% accurate output. Meaning, we have to calibrate our skew correction parameters & also look into how pytesseract works with special characters like “!” & “)”.

Thanks for reading this far! Hit clap if you found this helpful, drop a comment if you would like to share your insights with us all, stay curious,stay hydrated & keep an eye out for upcoming articles, will you?

--

--

No responses yet