Skip to content

raunak-shr/OCR-Projects

Repository files navigation

OCR-Projects

Toolkit


1. Tamil Nadu Voter Information Extraction

This project demonstrates Optical Character Recognition (OCR) using Python and Pytesseract to convert voter information provided in Tamil language. OCR is a technology that extracts text from images or scanned documents. In this project, we leverage the power of Pytesseract, along with other essential libraries like Numpy, Pandas, PyPDF2, PIL (Python Imaging Library), and Google Translator, to perform OCR tasks in a Jupyter Notebook environment.

Sample page - image

Prerequisites

Make sure you have the following installed:

Additionally, install the required Python libraries:

pip install numpy pandas PyPDF2 googletrans==4.0.0-rc1

Usage

Language Translation (Optional):

  • Modify the notebook to translate extracted text using Google Translator if multilingual support is needed.

Example

import pytesseract
from PIL import Image

# Read an image from file
image_path = 'images/sample_image.png'
image = Image.open(image_path)

# Perform OCR using Pytesseract
extracted_text = pytesseract.image_to_string(image)

# Print the extracted text
print("Extracted Text:")
print(extracted_text)

Contributor

Feel free to contribute, open issues, or provide feedback.🚀

About

Problems on OCR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published