OCR-Projects

Toolkit

1. Tamil Nadu Voter Information Extraction

This project demonstrates Optical Character Recognition (OCR) using Python and Pytesseract to convert voter information provided in Tamil language. OCR is a technology that extracts text from images or scanned documents. In this project, we leverage the power of Pytesseract, along with other essential libraries like Numpy, Pandas, PyPDF2, PIL (Python Imaging Library), and Google Translator, to perform OCR tasks in a Jupyter Notebook environment.

Sample page -

Prerequisites

Make sure you have the following installed:

Python: Download Python
Jupyter Notebook: Installation Guide
Pytesseract: pip install pytesseract
Tesseract OCR Engine: Tesseract Installation Guide

Additionally, install the required Python libraries:

pip install numpy pandas PyPDF2 googletrans==4.0.0-rc1

Usage

Language Translation (Optional):

Modify the notebook to translate extracted text using Google Translator if multilingual support is needed.

Example

import pytesseract
from PIL import Image

# Read an image from file
image_path = 'images/sample_image.png'
image = Image.open(image_path)

# Perform OCR using Pytesseract
extracted_text = pytesseract.image_to_string(image)

# Print the extracted text
print("Extracted Text:")
print(extracted_text)

Contributor

Shreeyansh Das

Feel free to contribute, open issues, or provide feedback.🚀

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Extract_Rectangles.ipynb		Extract_Rectangles.ipynb
Final.ipynb		Final.ipynb
README.md		README.md
Trial.ipynb		Trial.ipynb
Trial_Granular.ipynb		Trial_Granular.ipynb
Trial_Initial.ipynb		Trial_Initial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-Projects

Toolkit

1. Tamil Nadu Voter Information Extraction

Prerequisites

Usage

Example

Contributor

About

Releases

Packages

Languages

raunak-shr/OCR-Projects

Folders and files

Latest commit

History

Repository files navigation

OCR-Projects

Toolkit

1. Tamil Nadu Voter Information Extraction

Prerequisites

Usage

Example

Contributor

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages