Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@siu
Copy link
Contributor

@siu siu commented Oct 11, 2017

Bad news, one of the test runs in pull request #290 failed in a call to os.replace(), see https://ci.appveyor.com/project/frerich/clcache/build/1.0.1113/job/88r9w6ar5plwo7uc/tests. I believe that it failed because another process/thread had the file open for reading.

Unfortunately windows does not allow to replace a file that is opened
for reading by another process/thread. Reads and writes need to be
serialized by using the locks.

This reverts the locking of manifests and statistics to the state before
pull request #286.

siu added 2 commits October 11, 2017 07:44
Unfortunately windows does not allow to replace a file that is opened
for reading by another process/thread. Reads and writes need to be
serialized by using the locks.

This reverts the locking of manifests and statistics to the state before
pull request frerich#286.
@siu
Copy link
Contributor Author

siu commented Oct 11, 2017

Hmmm some tests failed in the first commit in a way that I didn't expect. Something seems wrong in processNoDirect().

@frerich
Copy link
Owner

frerich commented Oct 12, 2017

I'm a little confused - the test run in #290 passes. Is there a race condition, i.e. is there an issue or not? :-)

@siu
Copy link
Contributor Author

siu commented Oct 12, 2017

This is all I have found until now:

Reading the files without locking is not a good idea because os.replace(...) writing to a file that is open will yield an error. It is easy to check with two interactive sessions of python. This is a limitation of windows and we cannot do much about it. This is what the PR addresses, I think it is safe to merge.

There is another problem in processNoDirect that I don't understand yet. I created a branch in my fork of the repo and enabled AppVeyor to investigate https://github.com/siu/clcache/commits/revertLockFreeReadsAppveyor. The last commit makes some tests run 100 times in an attempt to trigger the race. These are the results: https://ci.appveyor.com/project/siu/clcache/build/1.0.14

What is surprising is that it only failed on Python 3.3, it could be luck or it could be something else. For the moment I could not see anything wrong in the code.

@frerich frerich merged commit b880939 into frerich:master Oct 12, 2017
@frerich
Copy link
Owner

frerich commented Oct 12, 2017

Merged, thanks a lot for following-up so quickly! I'm somewhat embarrased that our tests did not notice this peculiar behaviour behaviour of os.replace. :-(

@siu
Copy link
Contributor Author

siu commented Oct 12, 2017

Same here, I wish the automatic tests were more intensive and showed synchronization issues early.

I will start a PR with this commit to discuss about the issue in processNoDirect.

siu pushed a commit to siu/clcache that referenced this pull request Oct 16, 2017
The signal handlers were introduced to avoid cache corruption when the
clcache process is stopped in the middle of a write to the cache
(statistics, manifests or objects). See frerich#233. Even if SIGINT and SIGTERM
were ignored the cache still had a chance to be corrupted in the event
of a SIGTERM (which cannot be ignored).

Since frerich#233 the writing of files to the cache has been improved to
replace the files atomically, reducing the risk of storing corrupted
files in the cache. See pull requests frerich#286, frerich#292 and frerich#296. Therefore
ignoring these signals is not needed anymore.
@siu siu mentioned this pull request Oct 16, 2017
siu pushed a commit to siu/clcache that referenced this pull request Oct 30, 2017
The signal handlers were introduced to avoid cache corruption when the
clcache process is stopped in the middle of a write to the cache
(statistics, manifests or objects). See frerich#233. Even if SIGINT and SIGTERM
were ignored the cache still had a chance to be corrupted in the event
of a SIGTERM (which cannot be ignored).

Since frerich#233 the writing of files to the cache has been improved to
replace the files atomically, reducing the risk of storing corrupted
files in the cache. See pull requests frerich#286, frerich#292 and frerich#296. Therefore
ignoring these signals is not needed anymore.
siu pushed a commit to siu/clcache that referenced this pull request Oct 30, 2017
The signal handlers were introduced to avoid cache corruption when the
clcache process is stopped in the middle of a write to the cache
(statistics, manifests or objects). See frerich#233. Even if SIGINT and SIGTERM
were ignored the cache still had a chance to be corrupted in the event
of a SIGTERM (which cannot be ignored).

Since frerich#233 the writing of files to the cache has been improved to
replace the files atomically, reducing the risk of storing corrupted
files in the cache. See pull requests frerich#286, frerich#292 and frerich#296. Therefore
ignoring these signals is not needed anymore.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants