-
Notifications
You must be signed in to change notification settings - Fork 83
Revert reading manifests and statistics without acquiring the lock #292
Conversation
Unfortunately windows does not allow to replace a file that is opened for reading by another process/thread. Reads and writes need to be serialized by using the locks. This reverts the locking of manifests and statistics to the state before pull request frerich#286.
|
Hmmm some tests failed in the first commit in a way that I didn't expect. Something seems wrong in processNoDirect(). |
|
I'm a little confused - the test run in #290 passes. Is there a race condition, i.e. is there an issue or not? :-) |
|
This is all I have found until now: Reading the files without locking is not a good idea because os.replace(...) writing to a file that is open will yield an error. It is easy to check with two interactive sessions of python. This is a limitation of windows and we cannot do much about it. This is what the PR addresses, I think it is safe to merge. There is another problem in processNoDirect that I don't understand yet. I created a branch in my fork of the repo and enabled AppVeyor to investigate https://github.com/siu/clcache/commits/revertLockFreeReadsAppveyor. The last commit makes some tests run 100 times in an attempt to trigger the race. These are the results: https://ci.appveyor.com/project/siu/clcache/build/1.0.14 What is surprising is that it only failed on Python 3.3, it could be luck or it could be something else. For the moment I could not see anything wrong in the code. |
|
Merged, thanks a lot for following-up so quickly! I'm somewhat embarrased that our tests did not notice this peculiar behaviour behaviour of |
|
Same here, I wish the automatic tests were more intensive and showed synchronization issues early. I will start a PR with this commit to discuss about the issue in processNoDirect. |
The signal handlers were introduced to avoid cache corruption when the clcache process is stopped in the middle of a write to the cache (statistics, manifests or objects). See frerich#233. Even if SIGINT and SIGTERM were ignored the cache still had a chance to be corrupted in the event of a SIGTERM (which cannot be ignored). Since frerich#233 the writing of files to the cache has been improved to replace the files atomically, reducing the risk of storing corrupted files in the cache. See pull requests frerich#286, frerich#292 and frerich#296. Therefore ignoring these signals is not needed anymore.
The signal handlers were introduced to avoid cache corruption when the clcache process is stopped in the middle of a write to the cache (statistics, manifests or objects). See frerich#233. Even if SIGINT and SIGTERM were ignored the cache still had a chance to be corrupted in the event of a SIGTERM (which cannot be ignored). Since frerich#233 the writing of files to the cache has been improved to replace the files atomically, reducing the risk of storing corrupted files in the cache. See pull requests frerich#286, frerich#292 and frerich#296. Therefore ignoring these signals is not needed anymore.
The signal handlers were introduced to avoid cache corruption when the clcache process is stopped in the middle of a write to the cache (statistics, manifests or objects). See frerich#233. Even if SIGINT and SIGTERM were ignored the cache still had a chance to be corrupted in the event of a SIGTERM (which cannot be ignored). Since frerich#233 the writing of files to the cache has been improved to replace the files atomically, reducing the risk of storing corrupted files in the cache. See pull requests frerich#286, frerich#292 and frerich#296. Therefore ignoring these signals is not needed anymore.
Bad news, one of the test runs in pull request #290 failed in a call to os.replace(), see https://ci.appveyor.com/project/frerich/clcache/build/1.0.1113/job/88r9w6ar5plwo7uc/tests. I believe that it failed because another process/thread had the file open for reading.
Unfortunately windows does not allow to replace a file that is opened
for reading by another process/thread. Reads and writes need to be
serialized by using the locks.
This reverts the locking of manifests and statistics to the state before
pull request #286.