View on GitHub

OCR

OCR in the wild

OCR: Scene Text Detection and Recognition

Scene text detection and recognition:

This project implements text detection in the wild using the EAST (Efficient and Accurate Scene Text) detection technique and Tesseract OCR for text recognition.

Overview

The primary goal is to detect and recognize text from images, including scenes or natural environments where text may appear. The pipeline first detects text regions using the EAST model, and then extracts the text using Tesseract OCR.

Requirements

1. Install Tesseract OCR

To recognize text from an image, you must install Tesseract OCR. For best results, use Tesseract version 4 or higher.

Download and install the Tesseract binaries for your operating system from the Tesseract GitHub Wiki.
2. Download EAST Text Detection Model

You will also need the pre-trained EAST model, which is required for text detection:
Download frozen_east_text_detection.pb from a trusted source.

3. Additional Dependencies

Ensure that your environment has necessary Python libraries such as opencv-python, numpy, and pytesseract. These can be installed using pip:

Installation

Install the required dependencies:

  pip install opencv-python numpy pytesseract

Usage

After setting up the environment, you can use the project to detect and recognize text in images.

Run your script

  python ocr_text_detection.py --image path_to_image.jpg

# Some Testing Images: Here are some example images tested with this setup:

test Images

test Images2