Running OCR gives no results and `NS_ERROR_FILE_NOT_FOUND` #88

TrakJohnson · 2024-12-17T16:50:49Z

Hi, just installed the plugin, when trying to OCR my first file I get the following error in the developer console:

NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]

I first thought that I had misconfigured tesseract/pdftoppm, but everything seems to look fine.. are there any ways to further investigate this ? I read through #87 but it doesn't seem related. Thanks !

Here's my configuration:

Zotero 7.0.11
Fedora Linux 41 / Linux 6.11.10-300.fc41.x86_64
Libraries:

❯ /usr/bin/pdftoppm -v                                                           
pdftoppm version 24.08.0
Copyright 2005-2024 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
❯ /usr/bin/tesseract -v                                                              ~
tesseract 5.4.1
 leptonica-1.84.1
  libgif 5.2.2 : libjpeg 6b (libjpeg-turbo 3.0.2) : libpng 1.6.40 : libtiff 4.6.0 : zlib 1.3.1.zlib-ng : libwebp 1.4.0
 Found AVX512BW
 Found AVX512F
 Found AVX512VNNI
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libcurl/8.9.1 OpenSSL/3.2.2 zlib/1.3.1.zlib-ng libidn2/2.3.7 nghttp2/1.62.1

Zotero-OCR settings:

The text was updated successfully, but these errors were encountered:

aborel · 2024-12-19T05:34:53Z

First we can check whether the problem happens at the pdftoppm or at the tesseract stage. Are the PNG images saved to the Zenodo item folder?

zzyzx-dc · 2025-01-03T20:54:05Z

I am having the same issue and came here to see if anyone else was. Zotero 7.0.11 on Fedora Workstation 41.

Could not get children of file(/opt) because it does not exist
Error code: NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory] zotero-ocr.js:87

Additionally, like the original post, I had to manually set the filepaths to /usr/bin/tesseract and /usr/bin/pdftoppm or it returned OperationError: Could not parse path (tesseract): NS_ERROR_FILE_UNRECOGNIZED_PATH but the documentation helped me realize I needed to locate the file paths myself. Thanks!

aborel · 2025-01-03T21:05:12Z

Since the OP didn't answer, maybe you can check whether pdftoppm did its job?

zzyzx-dc · 2025-01-03T21:20:10Z

Sure thing - I am not sure how to check so you might have to walk me through it. When I go to the item folder (Zotero item - right click - Show file) there is only the PDF file.

aborel · 2025-01-04T09:22:36Z

If you have selected "Save the intermediate PNGs as well in the folder" like the OP, then pdftoppm has not worked at all.
Can you post a screenshot of your Zotero-OCR settings?
What is the output in a shell if you run
/usr/bin/tesseract
and
/usr/bin/pdftoppm
?

zzyzx-dc · 2025-01-04T20:30:27Z

~~Do you think it is a problem with Fedora or with the dnf package for pdftoppm? I can try a different installation method if you wish, or I can see if I still have a ubuntu laptop...~~ Nevermind, I tried on a Ubuntu machine and pdftoppm also hanged on that system.

If you have selected "Save the intermediate PNGs as well in the folder" like the OP, then pdftoppm has not worked at all. Can you post a screenshot of your Zotero-OCR settings?

What is the output in a shell if you run /usr/bin/tesseract and /usr/bin/pdftoppm ?

pdftoppm seems to hang:

$ /usr/bin/tesseract
Usage:
  /usr/bin/tesseract --help | --help-extra | --version
  /usr/bin/tesseract --list-langs
  /usr/bin/tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

$ /usr/bin/pdftoppm
^C

$ pdftoppm
^C

$ pdftoppm -h
pdftoppm version 24.08.0
Copyright 2005-2024 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Usage: pdftoppm [options] [PDF-file [PPM-file-prefix]]
  -f <int>                                 : first page to print
  -l <int>                                 : last page to print
  -o                                       : print only odd pages
  -e                                       : print only even pages
  -singlefile                              : write only the first page and do not add digits
  -scale-dimension-before-rotation         : for rotated pdf, resize dimensions before the rotation
  -r <fp>                                  : resolution, in DPI (default is 150)

zzyzx-dc · 2025-01-04T20:35:04Z

pdftoppm version 0.86.1
Copyright 2005-2020 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC

zzyzx-dc · 2025-01-04T21:00:01Z

Some troubleshooting I tried:

installing the newer version from source - I tried, but compiling this was a bit above my abilities.
running poppler tools on the pdf myself, in a different directory. This worked. I have png files which are viewable and look good.

~/Downloads$ pdftoppm -png DeVries\ -\ 2006\ -\ Medieval\ Warfare\ and\ the\ Value\ of\ a\ Human\ Life.pdf testfile
~/Downloads$ ls testfile*
testfile-01.png  testfile-05.png  testfile-09.png  testfile-13.png  testfile-17.png  testfile-21.png  testfile-25.png  testfile-29.png  
testfile-02.png  testfile-06.png  testfile-10.png  testfile-14.png  testfile-18.png  testfile-22.png  testfile-26.png  
testfile-03.png  testfile-07.png  testfile-11.png  testfile-15.png  testfile-19.png  testfile-23.png  testfile-27.png  
testfile-04.png  testfile-08.png  testfile-12.png  testfile-16.png  testfile-20.png  testfile-24.png  testfile-28.png

TrakJohnson · 2025-01-05T09:51:07Z

Hi, sorry for the delay and thank you for your help! I have the same results as @zzyzx-dc. pdftoppm and tesseract work fine on their own (tested on the same pdf file), but using the plugin the PNGs don't get generated.

aborel · 2025-01-05T18:29:48Z

Thanks for the details!
pdftoppm being stuck (or waiting for some input) when run without any argument seems to be the normal behaviour, I wasn't aware of that.
I suspected an incorrect location for the tesseract executable as I have recently noticed a bug in the location check code (it fails to display an error window in some cases), but @zzyzx-dc 's report indicates that this is not the case.

pdftoppm manual execution: interesting. Are you sure that the command line is using the same executable? Without an explicit path there could be several versions on your system. Try to run
/usr/bin/pdftoppm -png DeVries\ -\ 2006\ -\ Medieval\ Warfare\ and\ the\ Value\ of\ a\ Human\ Life.pdf testfile ~/Downloads$ ls testfile*
and/or
which pdftoppm
to make sure it's the same one as for Zotero-OCR.

zzyzx-dc · 2025-01-05T18:44:31Z

Yeah it's the same one:

$ which pdftoppm
/usr/bin/pdftoppm

Running /usr/bin/pdftoppm yields viewable pngs.

aborel · 2025-01-05T19:03:26Z

So apparently your pdftoppm installation is OK, that's a good data point, thank you.
The only suggestion I have right now is to set the OCR language to eng . I am not sure that it will help, but with the current settings you might have a problem later.
I'll spend some more time on this in the next few days. The code should probably be improved to make diagnostics easier... but I hope we can solve this case without a new release (which might introduce a few new bugs as well). Sorry about the inconvenience!

aborel · 2025-01-10T08:44:57Z

@zzyzx-dc I'm rewriting the pdftoppm/tesseract detection code for cases where no full path has been provided, so problems like this won't happen so much in the future.
But at the moment, if you set the OCR language to eng, does the plugin work for you or not? If not, what are the exact error messages you see in the console log?

TrakJohnson · 2025-01-11T12:27:42Z

Hi, I reran things with the OCR language set to eng in Zotero settings, sadly I get the same results.

I've however just discovered the existence of the debug output feature in Zotero, if that can be helpful

[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]" {file: "jar:file:///home/theo/.zotero/zotero/ujicwv30.default/extensions/[email protected]!/zotero-ocr.js" line: 87}]

[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]" {file: "jar:file:///home/theo/.zotero/zotero/ujicwv30.default/extensions/[email protected]!/zotero-ocr.js" line: 87}]

[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]" {file: "jar:file:///home/theo/.zotero/zotero/ujicwv30.default/extensions/[email protected]!/zotero-ocr.js" line: 87}]

[JavaScript Error: "TypeError: this.gViewSourceUtils is undefined" {file: "resource://devtools/client/webconsole/webconsole.js" line: 223}]
viewSource@resource://devtools/client/webconsole/webconsole.js:223:5
onViewSource@resource://devtools/client/webconsole/service-container.js:43:35
onClick@resource://devtools/client/shared/components/Frame.js:265:18
invokeGuardedCallbackImpl@resource://devtools/client/shared/vendor/react-dom.js:74:10
invokeGuardedCallback@resource://devtools/client/shared/vendor/react-dom.js:111:29
invokeGuardedCallbackAndCatchFirstError@resource://devtools/client/shared/vendor/react-dom.js:125:25
executeDispatch@resource://devtools/client/shared/vendor/react-dom.js:346:42
executeDispatchesInOrder@resource://devtools/client/shared/vendor/react-dom.js:362:22
executeDispatchesAndRelease@resource://devtools/client/shared/vendor/react-dom.js:462:29
executeDispatchesAndReleaseTopLevel@resource://devtools/client/shared/vendor/react-dom.js:470:10
forEachAccumulated@resource://devtools/client/shared/vendor/react-dom.js:444:8
runEventsInBatch@resource://devtools/client/shared/vendor/react-dom.js:598:21
runExtractedEventsInBatch@resource://devtools/client/shared/vendor/react-dom.js:606:19
handleTopLevel@resource://devtools/client/shared/vendor/react-dom.js:4272:30
batchedUpdates$1@resource://devtools/client/shared/vendor/react-dom.js:15752:12
batchedUpdates@resource://devtools/client/shared/vendor/react-dom.js:1882:12
dispatchEvent@resource://devtools/client/shared/vendor/react-dom.js:4351:19
interactiveUpdates$1/<@resource://devtools/client/shared/vendor/react-dom.js:15803:14
unstable_runWithPriority@resource://devtools/client/shared/vendor/react.js:617:12
interactiveUpdates$1@resource://devtools/client/shared/vendor/react-dom.js:15802:12
interactiveUpdates@resource://devtools/client/shared/vendor/react-dom.js:1901:10
dispatchInteractiveEvent@resource://devtools/client/shared/vendor/react-dom.js:4328:21


[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]" {file: "jar:file:///home/theo/.zotero/zotero/ujicwv30.default/extensions/[email protected]!/zotero-ocr.js" line: 87}]

[JavaScript Error: "Could not get children of file(/opt) because it does not exist" {file: "chrome://zotero/content/xpcom/file.js" line: 339}]

[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]" {file: "jar:file:///home/theo/.zotero/zotero/ujicwv30.default/extensions/[email protected]!/zotero-ocr.js" line: 87}]

[JavaScript Error: "NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory]" {file: "jar:file:///home/theo/.zotero/zotero/ujicwv30.default/extensions/[email protected]!/zotero-ocr.js" line: 87}]

appName => Zotero, version => 7.0.11 (x64), os => Linux 6.12.8-200.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jan  2 19:26:03 UTC 2025, locale => en-US, extensions => Zotero OCR (0.8.1, extension)

aborel · 2025-01-12T07:30:23Z

The error is happening while the plugin is checking your tesseract path preference. I don't understand why this is the case, your screenshot says that it is /usr/bin/tesseract and it looks correct according to your shell tests.

However, I am not convinced that the failing code is really necessary - I am prepared to remove it. Still, a similar error could happen in a more useful check that is executed a few steps later, so I'd really like to understand the underlying situation. Before I create a new pre-version, could you try to run the following?

In your normal shell:

ls -l /usr/bin/tesseract

In Zotero (menu Tools > Developer > Error Console):

let ocrEngine = '/usr/bin/tesseract';
let pathOrFile = FileUtils.File(ocrEngine);
pathOrFile.isDirectory()

alex-ca1123 · 2025-01-20T07:46:04Z

I am on Ubuntu 24.04.1, and I have a same result. This is because I am fuked by the snap package that isolates the application runtime. You should advise against linux users against snap packages or any containerized deployment.

https://forums.zotero.org/discussion/108471/installation-and-use-of-libreoffice-plugin-fails-on-ubuntu-22-04-3-using-zotero-snap

alex-ca1123 · 2025-01-20T07:53:14Z

btw, there are probably some workarounds to let snap see paths of the base system, but I dont think it worth the hassle. advise users to use https://github.com/retorquere/zotero-deb

aborel · 2025-01-20T07:54:46Z

@alex-ca1123 While this could indeed be useful, I still wish the other users could provide the requested information.

alex-ca1123 · 2025-01-20T08:05:19Z

@aborel fedora has flatpak, same basic principal of evil vendorization efforts to fragment opensource community. https://discussion.fedoraproject.org/t/zotero-bibliography-manager-tarball-on-fedora-40-kde-how-i-got-it-working/132509 and I ran your directives, containerized app can't see host raw paths as expected.

aborel · 2025-01-20T12:36:46Z

I get your point, it is certainly relevant, but it doesn't tell me what I wanted to know. The output of the requested commands is welcome.

q-wertz · 2025-01-29T14:30:00Z

Having the same issue on Manjaro with Gnome Desktop and Zotero is installed as Flatpak.

I also get the following message on Browser Console:

NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.isDirectory] 2 zotero-ocr.js:87

Running the commands returns:

$ ls -l /usr/bin/tesseract
-rwxr-xr-x 1 root root 47256 11. Nov 09:22 /usr/bin/tesseract

Zotero:

In my very limited understanding of Flatpak It requires either to bundle the binaries with the application or using flatpak-spawn

aborel self-assigned this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running OCR gives no results and `NS_ERROR_FILE_NOT_FOUND` #88

Running OCR gives no results and `NS_ERROR_FILE_NOT_FOUND` #88

TrakJohnson commented Dec 17, 2024 •

edited

Loading

aborel commented Dec 19, 2024

zzyzx-dc commented Jan 3, 2025

aborel commented Jan 3, 2025

zzyzx-dc commented Jan 3, 2025

aborel commented Jan 4, 2025

zzyzx-dc commented Jan 4, 2025 •

edited

Loading

zzyzx-dc commented Jan 4, 2025

zzyzx-dc commented Jan 4, 2025

TrakJohnson commented Jan 5, 2025 •

edited

Loading

aborel commented Jan 5, 2025 •

edited

Loading

zzyzx-dc commented Jan 5, 2025 •

edited

Loading

aborel commented Jan 5, 2025

aborel commented Jan 10, 2025 •

edited

Loading

TrakJohnson commented Jan 11, 2025

aborel commented Jan 12, 2025

alex-ca1123 commented Jan 20, 2025

alex-ca1123 commented Jan 20, 2025

aborel commented Jan 20, 2025

alex-ca1123 commented Jan 20, 2025

aborel commented Jan 20, 2025

q-wertz commented Jan 29, 2025 •

edited

Loading

Running OCR gives no results and NS_ERROR_FILE_NOT_FOUND #88

Running OCR gives no results and NS_ERROR_FILE_NOT_FOUND #88

Comments

TrakJohnson commented Dec 17, 2024 • edited Loading

aborel commented Dec 19, 2024

zzyzx-dc commented Jan 3, 2025

aborel commented Jan 3, 2025

zzyzx-dc commented Jan 3, 2025

aborel commented Jan 4, 2025

zzyzx-dc commented Jan 4, 2025 • edited Loading

zzyzx-dc commented Jan 4, 2025

zzyzx-dc commented Jan 4, 2025

TrakJohnson commented Jan 5, 2025 • edited Loading

aborel commented Jan 5, 2025 • edited Loading

zzyzx-dc commented Jan 5, 2025 • edited Loading

aborel commented Jan 5, 2025

aborel commented Jan 10, 2025 • edited Loading

TrakJohnson commented Jan 11, 2025

aborel commented Jan 12, 2025

alex-ca1123 commented Jan 20, 2025

alex-ca1123 commented Jan 20, 2025

aborel commented Jan 20, 2025

alex-ca1123 commented Jan 20, 2025

aborel commented Jan 20, 2025

q-wertz commented Jan 29, 2025 • edited Loading

Running OCR gives no results and `NS_ERROR_FILE_NOT_FOUND` #88

Running OCR gives no results and `NS_ERROR_FILE_NOT_FOUND` #88

TrakJohnson commented Dec 17, 2024 •

edited

Loading

zzyzx-dc commented Jan 4, 2025 •

edited

Loading

TrakJohnson commented Jan 5, 2025 •

edited

Loading

aborel commented Jan 5, 2025 •

edited

Loading

zzyzx-dc commented Jan 5, 2025 •

edited

Loading

aborel commented Jan 10, 2025 •

edited

Loading

q-wertz commented Jan 29, 2025 •

edited

Loading