Skip to content

Commit

Permalink
don't process if the PDF already has text
Browse files Browse the repository at this point in the history
  • Loading branch information
joecorall committed Oct 18, 2024
1 parent 2ccf3e0 commit 0f4c012
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion examples/ocrpdf/cmd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,15 @@ TMP_DIR=$(mktemp -d)

cd "$TMP_DIR"

cat > input.pdf

# don't process if the PDF already has text
if pdftotext input.pdf - | grep -q '[a-zA-Z0-9]'; then
exit 1
fi

# split pdf into PNG files
magick - page-%d.png > /dev/null 2>&1
magick input.pdf page-%d.png > /dev/null 2>&1

# add OCR to each PNG
for i in page-*.png; do
Expand Down

0 comments on commit 0f4c012

Please sign in to comment.