Skip to content

A Jupyter Notebook that uses OCR for text transliterations of Sumerian and other languages written in cuneiform

Notifications You must be signed in to change notification settings

ancient-world-citation-analysis/OCR_Sumerian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR_Sumerian

Welcome to the OCR (Optical Character Recognition) JupyterBook! This comprehensive guide will walk you through the fascinating world of character recognition and translation, specifically focusing on transliterating text from cuneiform scripts into modern-day English using OCR tools like Tesseract through Python.

Overview

This repository contains code and documentation for Optical Character Recognition (OCR) using Tesseract, an open-source OCR engine maintained by Google.

What You'll Learn

In this JupyterBook, you'll embark on a journey to understand the intricate process of optical character recognition and how it can be harnessed to unlock ancient scripts and languages. We'll explore step-by-step how to use Python and powerful OCR libraries like Tesseract to perform these transformations.

Getting Started

To get started, simply run the following cell in Google Colaboratory. This will mount the Jupyter Notebooks to your Google Drive, allowing you to execute the code and follow along with the examples.

Why This Book?

Cuneiform scripts, some of the earliest known systems of writing, have held the secrets of ancient civilizations for centuries. By learning how to transliterate and translate these scripts using modern OCR technology, you'll gain the ability to uncover and understand the rich history they contain.

Who Should Read This Book

This JupyterBook is designed for anyone curious about the world of OCR, from beginners who want to grasp the basics to advanced users seeking to tackle complex transliteration and translation tasks. Whether you're an archaeologist, historian, linguist, or simply an enthusiast eager to decode ancient texts, you'll find valuable insights here.

Features

  • Introduction to OCR
  • Setting Up Your Environment
  • Tesseract Walkthrough
  • OCR Demonstration
  • Advanced OCR
  • Transliteration Techniques
  • Statistics on Our Models
  • Optional Modules

Usage

Prerequisites include Python libraries such as Tesseract, Pandas, and other required dependencies. You can install them using pip:

File Structure

  • notebook.ipynb: Jupyter Notebook containing the OCR implementation.

Contributors

  • Austin Pereira
  • Adam Anderson

License

This project is licensed under the MIT License.

Acknowledgments

Special thanks to prior OCR tools used for cuneiform extraction.

Feedback and Support

We welcome contributions to improve this project. Feel free to provide feedback or seek support through GitHub issues.

sumer

About

A Jupyter Notebook that uses OCR for text transliterations of Sumerian and other languages written in cuneiform

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published