Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test script for issue #104 (with fix/workaround) #157

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Test script for issue #104 (with fix/workaround) #157

wants to merge 9 commits into from

Conversation

danielweck
Copy link

#104 (comment)

Stacktrace / debug info:

events.js:174
      throw er; // Unhandled 'error' event
      ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event at:
    at lazyFs.read (internal/fs/streams.js:165:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:467:17)

node --version
=>
v10.16.3

https://github.com/nodejs/node/blob/v10.16.3/lib/fs.js#L467

  function wrapper(err, bytesRead) {
    // Retain a reference to buffer so that it can't be GC'ed too soon.
    callback && callback(err, bytesRead || 0, buffer);
  }

https://github.com/nodejs/node/blob/v10.16.3/lib/internal/fs/streams.js#L165

  // the actual read.
  lazyFs().read(this.fd, pool, pool.used, toRead, this.pos, (er, bytesRead) => {
    if (er) {
      if (this.autoClose) {
        this.destroy();
      }
      this.emit('error', er);
    }

@danielweck
Copy link
Author

danielweck commented Oct 17, 2019

The bug occurs randomly, so it can take a few tries to observe it. Sometimes it triggers immediately.

I use the following CLI to repeat the process:

node "issue-104.js" "./testData/" "100" "1"

CLI arguments:

  • 100 is the number of iterations for each ZIP file inside the testData folder. Each iteration consists in reading the entire ZIP directory, i.e. to stream each ZIP entry into /dev/null (so to speak, as the actual technique consists in reading incoming data buffers, and to ignore them)
  • 1 means that only the unzipper lib is used to perform the test. If this argument is removed, then the libs node-zip-stream and yauzl are used as well (this is useful to compare performance metrics, as printed to stdout).
  • note that "./testData/" can be replaced with an absolute or relative path to a single ZIP file (it does not have to be a folder)

@danielweck
Copy link
Author

danielweck commented Oct 17, 2019

Here is an example output (MacOS):

Note that each dot . is a single ZIP entry from the directory.

node "issue-104.js" "./testData" "100" "1"
=>

process.cwd():
/xx/node-unzipper
__dirname:
/xx/node-unzipper
args:
[ './testData', '100', '1' ]
./testData
=====================================
testData/accessible_epub_3.epub
=====================================
1/100 ......................................
2/100 ......................................
3/100 ......................................
4/100 ......................................
5/100 ......................................
6/100 ......................................
7/100 ......................................
8/100 ......................................
9/100 ......................................
10/100 ......................................
11/100 ......................................
12/100 ......................................
13/100 ......................................
14/100 ......................................
15/100 ......................................
16/100 ......................................
17/100 ......................................
18/100 ......................................
19/100 ......................................
20/100 ......................................
21/100 ......................................
22/100 ......................................
23/100 ......................................
24/100 ......................................
25/100 ...................events.js:174
      throw er; // Unhandled 'error' event
      ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event at:
    at lazyFs.read (internal/fs/streams.js:165:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:467:17)

@danielweck
Copy link
Author

danielweck commented Oct 17, 2019

Another hit:

Note that this is with a single iteration instead of 100, and all 3 zip libs are used (only unzipper randomly crashes, I have never managed to make node-zip-stream or yauzl crash).

node "issue-104.js" "./testData" "1"
=>

process.cwd():
/xx/node-unzipper
__dirname:
/xx/node-unzipper
args:
[ './testData', '1' ]
./testData
=====================================
testData/accessible_epub_3.epub
=====================================
1/1 ......................................
1/1 ......................................
1/1 ..events.js:174
      throw er; // Unhandled 'error' event
      ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event at:
    at lazyFs.read (internal/fs/streams.js:165:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:467:17)

@danielweck
Copy link
Author

danielweck commented Oct 17, 2019

I managed to reproduce the issue for a single ZIP file (which eliminates the possibility of folder scanning being a factor). See output below.

Note that this PR includes accessible_epub_3.epub (public domain ebook publication) because it is representative of the type of ZIP files I deal with. I am otherwise able to reproduce the unzipper crash on other files, albeit randomly. Feel free to try with other ZIPs (just make sure the ZIP entries directory contains many small/medium files, rather than few large ones, as we need to stress-test the opening/closing of streams).

node "issue-104.js" "./testData/accessible_epub_3.epub" "500" "1"
=>

process.cwd():
/xx/node-unzipper
__dirname:
/xx/node-unzipper
args:
[ './testData/accessible_epub_3.epub', '500', '1' ]
./testData/accessible_epub_3.epub
=====================================
./testData/accessible_epub_3.epub
=====================================
1/500 ......................................
2/500 ......................................
...
...
464/500 ......................................
465/500 .......events.js:174
      throw er; // Unhandled 'error' event
      ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event at:
    at lazyFs.read (internal/fs/streams.js:165:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:467:17)

@danielweck
Copy link
Author

I implemented a fix by replacing fs.createReadStream() (graceful-fs or regular fs) with fs.open() (which provides a raw file descriptor) and a simple "subclass" of Stream.Readable called DataReaderStream (code shamelessly copied from the yauzl source code) which pulls data from the raw file descriptor (fs.read(fd, ...)).

The PR commit which introduces this change:

c7f6243

Note that the USE_ALTERNATIVE_STREAM_METHOD constant controls whether or not to activate this bugfix. I am not able to reproduce the bug when I use this alternative method.

Caveat: the raw file descriptor must be manually closed, thus the introduction of a new closeFileDescriptor() "method" on the ZIP directory API (note that yauzl and node-zip-stream expose a similar close() function).

@danielweck danielweck changed the title Test script for issue #104 Test script for issue #104 (with fix/workaround) Oct 17, 2019
@ZJONSSON
Copy link
Owner

Awesome @danielweck thank you so much for doing this!! I'm still keeping backward compatibility on the basis that unzipper is a drop-in replacement for unzip. Would you mind fixing the PR for the old node versions (mostly renaming const and let to var)

@danielweck
Copy link
Author

I see:
https://travis-ci.org/ZJONSSON/node-unzipper/builds/599077802
I can certainly fix that, but please note that this PR is really just a test-bed to investigate the cause of the issue / to attempt to reproduce the random crash more consistently.

This wasn't intended to provide a robust, properly thought-out solution. By process of elimination, I came up with the conclusion that fs.createReadStream() is the root of the problem, so I tested an alternative strategy to open streams ... the problem now seems gone, but there are concerning side-effects when using this new method.

Most notably, this PR introduces a "breaking change" in that the unzipper API now requires explicitly invoking a new zip.close() function in order to tear-down the zip instance and to gracefully release resources (before that change, it was assumed that all created streams would be automatically handled by garbage collection).

This lifecycle pattern is actually aligned with the two other well-known performant implementations yauzl and node-zip-stream, but this is a departure from the original unzipper design.

Plus, my implementation is kinda dirty, as you can see I attach the .close() function to the source object with the expectation that the _fd file descriptor was previously allocated, so I rely on the implicit call chain .size() (promise) -> followed by .stream() (not a promise). It all smells a bit like "spaghetti code", if you know what I mean :)
(in fact, this breaks one of the unit tests, probably because .stream() is called directly ... I'm not sure, see 75894cc )

@ZJONSSON
Copy link
Owner

Thanks @danielweck this is very helpful. Of course we could close fd automatically if it hasn't been used for a small amount of time (and reopen if it has been closed if/when we need it again, i.e. caching the fd temporarily based on last-touch). That should make the behavior backwards compatible, i.e. explicit close would not be required, but slightly more hacky. Will take a closer look this weekend

@dreamerblue
Copy link

Is there any progress? @ZJONSSON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants