Skip to content

Conversation

@felixSchl
Copy link
Contributor

As per the documentation, the archive.bulk() call promises to avoid
EMFILE errors (too many open files).

As per the documentation, the `archive.bulk()` call promises to avoid
EMFILE errors (too many open files).
@adam-lynch
Copy link
Contributor

Class. Have you manually tested this yeah? I assume we can't realistically create a test for this.

@felixSchl
Copy link
Contributor Author

Sorry I should have mentioned that I have only had the chance to test this on windows for now. I can test on osx at work tomorrow.

@adam-lynch
Copy link
Contributor

Cool. So you have a test app on Windows which causes an EMFILE error and it's gone with this change? That's good.

Unfortunately there are still EMFILE errors when we do copying with ncp. I created graceful-ncp (which depends on graceful-fs instead of plain fs) to get around this. It's implemented on the graceful-ncp branch. I'm not sure if this is the right way to go and I haven't tested it thoroughly enough with a test app with a lot of files, yet.

@felixSchl
Copy link
Contributor Author

Just from skimming the readme, it appears as if ncp.limit could be of value here?

@adam-lynch
Copy link
Contributor

I saw that but we have no idea how many files the user has. Ideally, we'd love the limit to be infinite 😛.

Once graceful-fs encounters an EMFILE error, it just retries after a timeout. This seems like a better way to do it; it doesn't matter which platform the end user is on / how many files they have. I looked through the ncp issues and it seems the maintainer is really keen on avoiding dependencies so that's why he won't use graceful-fs.

@felixSchl
Copy link
Contributor Author

I suppose then it comes down to just testing. I will pull down your branch tomorrow, apply this patch and see if there are any errors. I can invoke an EMFILE 100% of the time on our project if my glob patterns are generous enough. Should be possible to write a test for it as well, don't you think? Create a bunch of files with junk in it, and then zip them and copy them. However, it almost feels like these tests should live in the respective repositories.

@adam-lynch
Copy link
Contributor

Sounds good. I got all of that except for this bit:

However, it almost feels like these tests should live in the respective repositories.

@felixSchl
Copy link
Contributor Author

What I meant was that it's probably more useful if the graceful-npc repo was to have a test that verifies the expected behaviour, that is copying a lot of files without hitting EMFILE. And likewise, it would probably more useful, if the archiver repo had a test verifying that the archive.bulk(...) call does not produce EMFILE.

@felixSchl
Copy link
Contributor Author

I have tested this by the way and it works fine.

@adam-lynch
Copy link
Contributor

Yeah that makes sense.

What exactly did you test so?

@felixSchl
Copy link
Contributor Author

I tested using your graceful ncp branch with my patch applied on a project that through EMFILEs 100% of the time and now doesn't.

@adam-lynch
Copy link
Contributor

Great. This is what I think needs to be done:

  • Merge this into graceful-ncp branch
  • Manually test by another person (me) and on different platforms
  • Add EMFILE tests to graceful-ncp
  • Add EMFILE tests to archive module if they don't have one already

@felixSchl
Copy link
Contributor Author

Sounds good, to bridge the interim we are using my fork's graceful-ncp branch which has the patch applied in our package.json. Works fine.

Let me know if I can be of further help to push this along.

@adam-lynch
Copy link
Contributor

If you could create a PR from your patch into our graceful-ncp branch, then I'll merge it right away and test when I get a chance.

@adam-lynch
Copy link
Contributor

What was the app like that gave the EMFILE for you? How many files did you have roughly? (that'll also be handy to know for tests for graceful-ncp itself). I tried with a lot of files but in the end resorted to switching from Windows to Mac and running ulimit -n 120 to reduce how many files could be open at once. I got the EMFILE and it's gone when using the graceful-ncp branch, but not I get the error mentioned in #122 :/

@ctalkington
Copy link

archiver uses lazystream this module essentially wraps the fs.createReadStream in a way that only when its actually first used does it open a file on the system. with that said, its prevention, not total protection, it technically is possible that if you have a bunch of small files that archiver could have several open at once to fill the buffer (1M). i haven't personally seen EMFILE issues reported in some time though. the latest versions of archiver also do fs.stat in an async worker fashion (ie 4 at a time) as to further avoid such warnings.

lazystream also has its own suite of tests for basic functionality so its not really up to archiver to double test such things.

@felixSchl
Copy link
Contributor Author

Sounds like we need a queuing / throttling mechanism that has a max size
which correlates to ulimit minus some arbitrary threshold for files opened
by other applications... There's always a certain degree of uncertainty
when dealing with IO after all.
On 27/11/2014 4:08 PM, "Chris Talkington" notifications@github.com wrote:

archiver uses lazystream this module essentially wraps the
fs.createReadStream in a way that only when its actually first used does
it open a file on the system. with that said, its prevention, not total
protection, it technically is possible that if you have a bunch of small
files that archiver could have several open at once to fill the buffer
(1M). i haven't personally seen EMFILE issues reported in some time though.
the latest versions of archiver also do fs.stat in an async worker
fashion (ie 4 at a time) as to further avoid such warnings.

lazystream also has its own suite of tests for basic functionality so its
not really up to archiver to double test such things.


Reply to this email directly or view it on GitHub
#118 (comment)
.

@ctalkington
Copy link

i may have spoken a bit prematurely. i was thinking at it a little differently. zip creation is a serial process and as such archiver does stream source by source. thus at any given time, there should only be 6 files open from FS ( src, dest, up to 4 stat workers) assuming your using v0.12. that said there could be multiple files contents in the internal buffer at one time waiting to be drained to your dest.

are you sure the EMFILE originates from archiver?

@ctalkington
Copy link

i should mention that if archiver didn't use lazystream that the number of open files just from creating a readstream would be equal to amount of FS sourced entries passed to archiver.

@adam-lynch
Copy link
Contributor

Pretty sure we have this in a very good state now. Works reliably for me now, even if I reduce my ulimit. I think we should just merge and play by ear. No harm, we've only made it a bit safer, we haven't changed anything too drastic.

@ctalkington
Copy link

there is a slight overhead using bulk which i don't really think should be required being that file internally uses the same lazystream logic. have you double checked that your archiver is up to date? in previous versions, there was a sync stat check on every bulk and file call. this has been moved to a more async process.

@adam-lynch
Copy link
Contributor

@ctalkington we're using 0.10. I'll bump to 0.12 when I get a chance.

@adam-lynch
Copy link
Contributor

Bumped to archiver@0.13.

Closing this in favour of #123

@adam-lynch adam-lynch closed this Nov 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants