- Here is the jupyter book: https://ancient-world-citation-analysis.github.io/OCR_Sumerian/
- Here is a Medium article: https://medium.com/@austinpereira6602/unearthing-the-past-the-journey-from-cuneiform-inscriptions-to-ai-translations-5948c2dccd45
Welcome to the OCR (Optical Character Recognition) JupyterBook! This comprehensive guide will walk you through the fascinating world of character recognition and translation, specifically focusing on transliterating text from cuneiform scripts into modern-day English using OCR tools like Tesseract through Python.
This repository contains code and documentation for Optical Character Recognition (OCR) using Tesseract, an open-source OCR engine maintained by Google.
In this JupyterBook, you'll embark on a journey to understand the intricate process of optical character recognition and how it can be harnessed to unlock ancient scripts and languages. We'll explore step-by-step how to use Python and powerful OCR libraries like Tesseract to perform these transformations.
To get started, simply run the following cell in Google Colaboratory. This will mount the Jupyter Notebooks to your Google Drive, allowing you to execute the code and follow along with the examples.
Cuneiform scripts, some of the earliest known systems of writing, have held the secrets of ancient civilizations for centuries. By learning how to transliterate and translate these scripts using modern OCR technology, you'll gain the ability to uncover and understand the rich history they contain.
This JupyterBook is designed for anyone curious about the world of OCR, from beginners who want to grasp the basics to advanced users seeking to tackle complex transliteration and translation tasks. Whether you're an archaeologist, historian, linguist, or simply an enthusiast eager to decode ancient texts, you'll find valuable insights here.
- Introduction to OCR
- Setting Up Your Environment
- Tesseract Walkthrough
- OCR Demonstration
- Advanced OCR
- Transliteration Techniques
- Statistics on Our Models
- Optional Modules
Prerequisites include Python libraries such as Tesseract, Pandas, and other required dependencies. You can install them using pip:
notebook.ipynb
: Jupyter Notebook containing the OCR implementation.
- Austin Pereira
- Adam Anderson
This project is licensed under the MIT License.
Special thanks to prior OCR tools used for cuneiform extraction.
We welcome contributions to improve this project. Feel free to provide feedback or seek support through GitHub issues.