Per-section locking #217

frerich · 2016-08-24T13:33:52Z

This PR attempts to improve concurrency by making clcache not lock the entire cache but only affected sections of it.

At the same time, it actually increases the duration for which a section is locked in case of a cache miss: we will hold the lock even while the compiler is running, something which seems more correct but which was not viable before because it would greatly impact concurrency.

This PR is meant to supersede #208. It resolves #160 .

codecov-io · 2016-08-24T13:42:46Z

Current coverage is 87.33% (diff: 84.53%)

Merging #217 into master will decrease coverage by 0.69%

@@             master       #217   diff @@
==========================================
  Files             1          1          
  Lines           969       1003    +34   
  Methods           0          0          
  Messages          0          0          
  Branches        160        159     -1   
==========================================
+ Hits            853        876    +23   
- Misses           86         94     +8   
- Partials         30         33     +3

Powered by Codecov. Last update 129561a...0954a47

siu · 2016-08-24T18:35:02Z

I run the performancetests.py with the current PR and I got this:

Compiling 30 source files sequentially, cold cache: 8.87619953641643 seconds
Compiling 30 source files sequentially, hot cache: 1.84543584907858 seconds
Compiling 30 source files concurrently via /MP12, hot cache: 0.6135658206123011 seconds

With a bigger codebase (hardlink=1, 6 real cores, 12 logical cores, 1223 object files) I am experiencing crashes with this stack trace:

Traceback (most recent call last):
File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in
exec(code, m.dict)
File "clcache.py", line 1604, in
File "clcache.py", line 1483, in main
File "clcache.py", line 1514, in processCompileRequest
File "clcache.py", line 1550, in processDirect
File "clcache.py", line 250, in enter
File "clcache.py", line 266, in acquire
main.CacheLockException: Error! WaitForSingleObject returns 258, last error 183

After retrying a few times the whole codebase compiled. I cleaned the working dir and recompiled with the hot cache. The speedup is about 15x compared to 5.6x in previous versions. A very similar speedup to #208.

frerich · 2016-08-24T18:57:26Z

Ah, interesting! That WaitForSingleObject error you get is a timeout: by default, a clcache process will wait for up to 10 seconds to acquire a lock. You can customise this timeout using the CLCACHE_OBJECT_CACHE_TIMEOUT_MS environment variable: setting it to e.g. 20000 will raise the timeout to 20 seconds.

Did you change this variable to a lower value by any chance? It's surprising that ten seconds are not sufficient...

Jimilian · 2016-08-25T09:37:40Z

@frerich I got same error multiple times in row:

           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1596, in processNoDirect
           File "clcache.py", line 1340, in processCacheHit
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183

I used default value (10 seconds), so, I think it's kind of deadlock, because it happened in same moment in several projects, it's not "dead" only because of timeout.

p.s. 16 cores.

siu · 2016-08-25T10:02:07Z

I am using the default timeout, I had to increase it in the past but with #208 it wasn't needed anymore so it is not set in the environment.

Some of our targets take almost a minute to compile (that's one of the reasons why we need clcache). The locks still have the chance of being shared (less often), so if the lock is held during compilation other processes will have to wait more than 10 seconds.

frerich · 2016-08-25T10:14:10Z

@Jimilian That's very interesting, I agree - it looks like a deadlock which is eventually resolved when the timeout hits. It's very interesting that you manage to trigger this even with NODIRECT set -- in that case, the code path is a lot simpler: all the relevant code is in the short processNoDirect function.

There are two locks involved for every NODIRECT invocation:

One lock which synchronises access to one of the sections of the cache (the section is determined by the first two characters of the cached object hash key). All invocations which access the same section are synchronised, but since there are up to 256 sections you hopefully don't hit this scenario very often.
A second lock which synchronises accesses to the statistics file. This is a global lock, i.e. all clcache invocations share the same lock to make sure they don't step on each others toes when writing to the statistics.

My first idea was that maybe the order in which these locks are acquired is mixed up somewhere, i.e. if process A would first acquire 1 and then 2 and process B would first acquire 2 and then 1, then the two might deadlock each other. Alas, I couldn't spot such a bug in the code on first glance...

frerich · 2016-08-25T10:49:31Z

@Jimilian & @siu -- I just extended this PR with a little temporary commit adding some debug output for acquiring/releasing locks. This will generate a lot of output, but it may help us with telling what's going on.

The output will cause clcache to print its PID and the name of the lock everytime it starts to wait for a lock, acquires a lock and releases a lock. I hope that this allows reconstructing the order of events which lead to the deadlock.

It would be much appreciated if you could give it a try and then upload the generated output somewhere.

frerich · 2016-08-25T11:41:10Z

@siu wrote:

Some of our targets take almost a minute to compile (that's one of the reasons why we need clcache). The locks still have the change of being shared (less often), so if the lock is held during compilation other processes will have to wait more than 10 seconds.

That's correct, but other process will only have to wait more than 10 seconds if you a) have a cache miss and b) you only block other processes accessing the same cache section (a 1/256 chance assumingn an even distribution of hash sums).

Jimilian · 2016-08-25T12:06:24Z

It's not related to waiting during compilation. Because I didn't have same problem before, and timeout was same and in similar place. I had broken json files - it's true. But not a deadlocks.

@frerich, I also added information about mutexName to exception locally. So, now it prints:

_main__.CacheLockException: Error! WaitForSingleObject returns 258 for Local\C--clcache-stats.txt, last error 0

I think that deadlock was created on statistics. Also bug happens only on cold cache. I failed to repeat on hot.

Tail (I hope it's enough): https://gist.github.com/frerich/60b424950f66b3e41a4831d2f1e70b56 (edited by @frerich such that the log is a gist instead of shown inline)

Jimilian · 2016-08-25T12:09:23Z

Also it could be related to cleanup, probably. When we need to lock all sections.

frerich · 2016-08-25T13:47:19Z

Thanks for posting those logs - I believe I can explain this issue now: in non-direct mode, the following locks are acquired and released during a cache miss:

affected cache section lock is acquired to test whether cached data is available
global cache statistics lock is acquired, number of cache misses are bumped
object is added to cache; if the cache is too big, this causes all cache sections to get locked in order to shrink the cache
global statistics lock is released
affected cache section lock is released

So what can happen is this: two parallel clcache instances 'A' and 'B' can get executed e.g. like this:

'A' locks cache section 'x'
'A' locks global statistics lock
'B' locks cache section 'y'
'A' needs to lock all sections to do a cleanup

At this point, 'B' cannot proceed because 'A' still holds the statistics lock, but 'A' cannot proceed because 'B' still holds the lock on section 'y'. So they deadlock, and the situation is resolved by 'B' timing out while waiting for the statistics.

Jimilian · 2016-08-25T14:06:05Z

@frerich How are you going to solve it? I see only one option - check cleanup after unlocking/before locking. Probably, then script starts/ends.

frerich · 2016-08-25T14:14:04Z

I think one alternative is to not nest the locks, i.e. only update the statistics after working on the actual cache. This means that there's a small time window in which the statistics don't reflect the actual state visible to other clcache processes, but I think that should not hurt.

siu · 2016-08-25T14:26:59Z

FYI when you are back to direct mode. The problem I reported before vanishes if I configure a big timeout (5 mins) so I don't think it is a deadlock in my case.

frerich · 2016-08-25T16:01:41Z

@Jimilian I now implemented the idea of cleaning the cache before starting a compile job, not after it. This is more like what e.g. Git does (checking whether it needs to GC before actually doing its work). It's a regression for cache misses which now may be a little bit slower than before (because they need to open two JSON files to test whether any cleaning is to be done), but it's conceptually a lot simpler.

@siu @Jimilian Maybe you could give this new version a try?

Jimilian · 2016-08-26T14:50:31Z

So, I failed to reproduce deadlock. I tested it multiple times with cold cache (code was completely new, but cache itself was full) and compilation passed without any issues as expected.

siu · 2016-08-26T16:34:48Z

I got a new stacktrace with the current version:

cold_edited.zip

I hope it helps.

frerich · 2016-08-26T18:43:39Z

@siu Hmm, that's interesting. I see

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in <module>
    exec(code, m.__dict__)
  File "clcache.py", line 1606, in <module>
  File "clcache.py", line 1483, in main
  File "clcache.py", line 1516, in processCompileRequest
  File "clcache.py", line 1552, in processDirect
  File "clcache.py", line 249, in __enter__
  File "clcache.py", line 265, in acquire
__main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183

I.e. a timeout while waiting for a cache section lock in processDirect -- but this would mean that it took 10 seconds (by default) to get the lock, so a lot of compile requests would go to the same section. Unlikely, but possible.

In master, the error message given for timeouts is a lot more useful because it tells which lock it wanted to acquired. This might shed some light - I'll rebase this PR onto master, could you then maybe try it again?

frerich · 2016-08-26T18:58:41Z

@siu Ok, rebased now - would be great if you could give it another shot. :-)

siu · 2016-08-28T17:04:20Z

@frerich sure, I will try tomorrow

siu · 2016-08-28T20:52:46Z

I am analyzing the code trying to understand if deadlocks can occur.

Deadlocks cannot occur if all possible executions of the software acquire the locks in the same order. For that reason I wrote down here all functions that take locks and checked if they respect the same order:

Cache.lock
    with [manifestSection, artifactsSection, statistics]


printStatistics
    with [stats]
resetStatistics
    with [stats]
cleanCache
    with [stats]
        with [Cache.lock]
clearCache
    with [stats]
addObjectToCache
    with [artifactsSection]
processCacheHit
    with [artifactsSection, stats]


== Postprocess functions

postprocessObjectEvicted
    with [stats]
        calls addObjectToCache
postprocessHeaderChangedMiss
    with [stats]
        calls addObjectToCache
postprocessNoManifestMiss
    with [stats]
        calls addObjectToCache

updateCacheStatistics
    with [stats]


== Process functions

processDirect
    with [manifestSection]:
        calls postprocessNoManifestMiss
        calls postprocessHeaderChangedMiss
        with [artifactSection]:
            calls processCacheHit
            calls postprocessObjectEvicted

processNoDirect
    with [artifactsSection]:
        calls processCacheHit
        with [stats]
            calls addObjectToCache

With this information we can see that there are code paths that acquire the locks in different order and that will lead to deadlocks. I know, manifestSection and artifactSection are different in the majority of cases, but for sure there will be collisions every once in a while. In addition the operations on the whole cache need to acquire all of them.

For example, let's say one process A has to clean the cache, which involves:

lock stats
lock all manifestSections
lock all artifactsSections
lock stats (again).

Another process B may be running processDirect which may involve:

lock manifestSection
call postprocessNomanifestMiss
lock stats
call addObjectToCache
lock artifactsSection

In this scenario, if A executes until A2 and B executes until B3, they will deadlock, A needs a manifestSection locked by B and B needs the stats locked by A. In reality one of them would timeout and crash and the other process would succeed. Note that increasing the timeout delay would not help in this situation, one of the processes would still timeout.

There are a few fixes that need to be applied here and there but the most problematic one seems to be the lock of the artifactsSection inside addObjectToCache. It could be removed from there and acquired in the postprocessing functions before the stats are locked (in the same with for example).

We can elaborate a full list of things to change and iterate if you wish.

siu · 2016-08-29T07:53:47Z

More logs with stacktrace:

20160829-cold-01-edited.zip
20160829-cold-02-edited.zip

The first one is the original in the repository and the second contains timestamps in addition.

siu · 2016-09-01T09:28:14Z

I've run some preliminary tests and this time the compilation finished properly with cold and hot caches.

I am running clcache with hardlink=1, 12 cores, timeout=300000ms, 1222 targets. The speedup with this PR is 9.7x compared to 5.6x of my last tests of master. I was expecting faster times, I will enable profiling and post the results.

frerich · 2016-09-01T10:01:40Z

@siu Thanks for sharing those numbers! The strongest Windows build machine I ahve access to unfortunately only has two cores, so I very much depend on people with better hardware to try this stuff. I really appreciate it, and am looking forward to the profiling output.

Did your build run all the way to the end without any exceptions this time? If so, that would be encouraging since it suggests we fixed all the locking issues. I agree that 'just' 9.7x is less than expected, but at least this is a good argument to merge the PR and then implement additional improvements (maybe to the statistics locking) in a separate PR.

I'm curious how things are going for @Jimilian with this version. :-)

siu · 2016-09-01T10:16:49Z

Here are the results of the profiling with cold and hot caches:
20160901-pr17-profile.zip

I will keep using this branch and report any issue if there is any.

Did your build run all the way to the end without any exceptions this time? If so, that would be encouraging since it suggests we fixed all the locking issues. I agree that 'just' 9.7x is less than expected, but at least this is a good argument to merge the PR and then implement additional improvements (maybe to the statistics locking) in a separate PR.

Yes, this time it worked without problems, the only thing is that I have a huge timeout. I agree it is looking good for merging, let's see what the others say.

frerich · 2016-09-01T11:05:10Z

@siu Ah, that's interesting! Looking at the profiling for the hot cache, I see it starts with this:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1177    0.041    0.000  287.120    0.244 {built-in method exec}
     1177    0.095    0.000  287.079    0.244 <string>:1(<module>)
     1177    0.057    0.000  286.972    0.244 clcache.py:1454(main)
     1177    0.236    0.000  192.237    0.163 clcache.py:1532(processCompileRequest)
     1177    0.133    0.000  178.124    0.151 clcache.py:1588(processDirect)
     1177    5.733    0.005  153.371    0.130 clcache.py:220(getIncludesContentHashForFiles)
  1189217   29.296    0.000  147.018    0.000 clcache.py:739(getFileHash)
    46915   96.204    0.002   96.204    0.002 {method 'write' of '_io.BufferedWriter' objects}
     2354    0.008    0.000   93.095    0.040 clcache.py:69(printBinary)
  1197456   70.514    0.000   70.548    0.000 {built-in method open}
  1236132   28.167    0.000   28.167    0.000 {method 'read' of '_io.BufferedReader' objects}
  1191571   16.949    0.000   16.949    0.000 {method 'update' of '_hashlib.HASH' objects}
     1177    0.047    0.000   16.348    0.014 clcache.py:1357(processCacheHit)

I.e. of the 287s seconds the whole build took, I think a huge portion is spent in getFileHash which probably also accounts for some of the 70 seconds spent in open() (and for a lot of the time spent in _io.BufferedReader.read and _hashlib.HASH.udpate). So maybe if we can make getFileHash faster, that would help the 'hot cache' case the most.

I wonder about the 96 seconds spent in _io.BufferedWriter.write though... I wish I knew where that is coming from. Maybe printBinary? But why would it be so slow...

webmaster128 · 2016-09-01T11:42:36Z

I'm planing to provide additional test results of this PR in a 6-core local SSD, hardlink machine within 2-3 days.

Jimilian · 2016-09-01T12:44:43Z

I have bad news... Now it's significantly slower (13 minutes vs 10 minutes with hot cache).
I will try to profile as well.
NO_DIRECT, hardlink, 16 cores, SSD.

frerich · 2016-09-01T14:41:35Z

@siu In my experiments, simply changing 'HashAlgorithm' at the top of clcache.py to hashlib.sha1 made things about 10% faster in the 'hot cache' scenario (a result which would contradict a5b72cb though that commit was done with Python 2.x).

@Jimilian That's really surprising, looking forward to the profile data.

webmaster128 · 2016-09-01T16:21:13Z

In my experiments, simply changing 'HashAlgorithm' at the top of clcache.py to hashlib.sha1 made things about 10% faster in the 'hot cache' scenario (a result which would contradict a5b72cb though that commit was done with Python 2.x).

This is very implementation dependent. The openssl implementation for Linux64 has a SHA1 faster than MD5

$ openssl speed md5 sha1
[...]
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              38764.20k   121777.26k   298165.85k   471874.90k   563161.77k
sha1             53112.71k   157630.83k   397270.19k   636987.39k   751763.46k

For our purpose, we don't need a cryptographic hash function. So we could switch to something much faster. xxHash looks very interesting to me, especially because there is an implementation for Python in pip, that uses the hashlib interface. This would allow us to easily toggle between md5 and xxHash using an environment variable.

But what I would like to verify first is, if the hashing function has an impact at all. As far as I know, md5 and SHA1 are usually faster than available persistent IO devices.

frerich · 2016-09-01T17:34:42Z

With this PR being at 71 comments now, I think we should -- at least in this thread -- concentrate on getting the per-section locking right and then use separate PRs for other performance improvements so that this doesn't get ouf of hand. :-}

This PR is already blocking some other work from progressing.

This centralises the code for creating a system-wide lock given some path. The function also respects the lock timeout environment variable. I factor this code out into a separate method since I plan to introduce more, finer-grained locks, which will very much use the same logic.

The plan is that these locks synchronise access to an individual section of the cache.

This is a system-wide lock to be used for synchronising accesses to the statistics of a cache.

This reimplements the global 'Cache.lock' lock such that it's defined in terms of the individual section and the statistics locks. This means that acquiring and releasing Cache.lock acquires and releases locks for all sections and for the statistics. This is slower than before (because it requires acquiring and releasing up to 513 locks) but it should be only rarely needed - mostly, when cleaning the cache.

This (big) patch attempts to improve concurrency by avoiding the global cache lock. Instead, we will lock only those manifest and compiler artifact sections which we deal with. The only cases where we need to synchronize all concurrenct processes is when updating the statistics file, because there's only a single statistics file. At least for cache hits this could be avoided by tracking the number of cache hits per section, too. To avoid deadlocks, the locks have to acquired in the same order for all execution paths (the order is defined in Cache.lock, i.e. first manifests, then artifacts, then statistics). Hence, locking of the artifact section had to be pulled out of the addObjectToCache() function since that function was called with the stats locked already - a violation of the locking order. Furthermore, we can no longer perform cache.clean() in addObjectToCache() because cache.clean() acquires the global lock, so e.g. this sequence of steps was possible in non-direct mode: 1. 'A' locks cache section 'x' 2. 'A' locks global statistics lock 3. 'B' locks cache section 'y' 4. 'A' needs to lock all sections to do a cleanup At this point, 'B' cannot proceed because 'A' still holds the statistics lock, but 'A' cannot proceed because 'B' still holds the lock on section 'y'. This issue is caused by -- from B's vie -- the statistics lock being locked before a section lock. This must never happen. At the point addObjectToCache() is called, we already have the statistics locked and we know that it may be that the cache size limit was just exceeded, so it's a good moment to determine that a cleanup is needed. It's not a good moment to *perform* the cleanup though. Instead, let the function return a flag which is propagated all the way back to processCompileRequest(). The flag indicates whether cleanup is needed, and if so, processCompileRequest() will acquire Cache.lock (which acquires all sections and statistics in the correct order) to do the cleanup.

By checking for 'manifest == None' early and returning early, we can reduce the indentation depth for the larger 'manifest != None' branch.

It's really an exceptional issue, let's handle that on the caller side so that we get the 'forward to real compiler' logic for free.

It's just ManifestRepository.getIncludesContentHashForFiles() which can raise IncludeChangedException, so only that needs to be covered by try/except; doing so, moving code out of the 'try', allows reducing the indentation depth.

These functions, just like postprocessObjectEvicted(), don't need to lock the cache section: they are already called while the cache section is locked (in processDirect()).

webmaster128 · 2016-09-02T16:58:55Z

clcache.py

        printTraceStatement("Cannot cache invocation as {}: called for preprocessing".format(cmdLine))
        updateCacheStatistics(cache, Statistics.registerCallForPreprocessing)
+    except IncludeNotFoundException:
+        pass


Could you add a simple tracing message here? I just run into this and did not have a clue, why there are no hits.

Good point, will do!

Actually, I think I rather won't do it right now. The code seems fishy and there might be a patch needed here which is worth its own PR.

This exception can apparently be raised in two situations:

During a cache miss, in case an include file which cl.exe printed (via /showIncludes) was deleted (or otherwise became unreadable) right after the compiler finished. Very unlikely, I guess.

During a cache hit, in case a manifest references a non-existent include file. In this case however, shouldn't the code rather update (i.e. rewrite) the manifest?

webmaster128 · 2016-09-02T18:00:26Z

Okay I guess my 2011 CPU Phenom II X6 1055T is just too slow for this kind of measurements. I get no significant difference in performance (this vs. master) using #226. In the 13 seconds of restoring, all 6 cores are under full load, both on master and using this.

In the performancetests.py, this at brings some 22 %.

However, everything compiles solid as expected. So feel free to merge, if you want to.

frerich · 2016-09-02T18:03:29Z

Two people other confirmed this brings a modest improvement, and does not seem to hurt. So let's merge this and see how it goes.

frerich changed the title ~~[DRAFT} Per section locking~~ Per-section locking Aug 24, 2016

frerich added the wip label Aug 24, 2016

This was referenced Aug 24, 2016

Don't hold the cache lock while computing the manifest hash for the current object #208

Closed

Global cache lock hurts cache hits during heavily concurrent builds #160

Closed

frerich mentioned this pull request Aug 24, 2016

Friendlier error messages in case of cache access timeouts #218

Merged

frerich removed the wip label Aug 26, 2016

siu mentioned this pull request Aug 26, 2016

Atomic manifest updates and multiple entries per manifest #222

Merged

frerich force-pushed the per_section_locking branch from b95c123 to ef688a7 Compare August 26, 2016 18:46

frerich self-assigned this Aug 31, 2016

frerich force-pushed the per_section_locking branch from 64e50fe to 0954a47 Compare September 2, 2016 10:09

Frerich Raabe added 10 commits September 2, 2016 12:11

Introduced (yet unused) locks for artifact/manifest sections

1bc537e

The plan is that these locks synchronise access to an individual section of the cache.

Introduced Statistics.lock field

8b51fdd

This is a system-wide lock to be used for synchronising accesses to the statistics of a cache.

Reorder branches in processDirect()

56b6349

By checking for 'manifest == None' early and returning early, we can reduce the indentation depth for the larger 'manifest != None' branch.

Don't handle IncludeNotFoundException in processDirect()

ab0dde3

It's really an exceptional issue, let's handle that on the caller side so that we get the 'forward to real compiler' logic for free.

Reduced indentation depth in processDirect()

abab8ce

It's just ManifestRepository.getIncludesContentHashForFiles() which can raise IncludeChangedException, so only that needs to be covered by try/except; doing so, moving code out of the 'try', allows reducing the indentation depth.

Avoid unneeded locking in postprocess{NoManifest|HeaderChanged}Miss

fbc9435

These functions, just like postprocessObjectEvicted(), don't need to lock the cache section: they are already called while the cache section is locked (in processDirect()).

Document section-based locking improvements

0954a47

frerich mentioned this pull request Sep 2, 2016

Resolve code duplication #225

Merged

webmaster128 mentioned this pull request Sep 2, 2016

[WIP] Add benchmarking project: Botan #226

Closed

webmaster128 reviewed Sep 2, 2016
View reviewed changes

frerich merged commit d3bc5c0 into master Sep 2, 2016

frerich deleted the per_section_locking branch September 2, 2016 18:03

hubx mentioned this pull request Nov 12, 2016

Switch file hashing to xxhash #243

Closed

frerich mentioned this pull request Oct 4, 2017

Cache corruption protection: Atomic replacement of manifests, statistics and object files #286

Merged

Per-section locking #217

Per-section locking #217

Uh oh!

Conversation

frerich commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 87.33% (diff: 84.53%)

Uh oh!

siu commented Aug 24, 2016

Uh oh!

frerich commented Aug 24, 2016

Uh oh!

Jimilian commented Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siu commented Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frerich commented Aug 25, 2016

Uh oh!

frerich commented Aug 25, 2016

Uh oh!

frerich commented Aug 25, 2016

Uh oh!

Jimilian commented Aug 25, 2016 • edited by frerich Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jimilian commented Aug 25, 2016

Uh oh!

frerich commented Aug 25, 2016

Uh oh!

Jimilian commented Aug 25, 2016

Uh oh!

frerich commented Aug 25, 2016

Uh oh!

siu commented Aug 25, 2016

Uh oh!

frerich commented Aug 25, 2016

Uh oh!

Jimilian commented Aug 26, 2016

Uh oh!

siu commented Aug 26, 2016

Uh oh!

frerich commented Aug 26, 2016

Uh oh!

frerich commented Aug 26, 2016

Uh oh!

siu commented Aug 28, 2016

Uh oh!

siu commented Aug 28, 2016

Uh oh!

siu commented Aug 29, 2016

Uh oh!

siu commented Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frerich commented Sep 1, 2016

Uh oh!

siu commented Sep 1, 2016

Uh oh!

frerich commented Sep 1, 2016

Uh oh!

webmaster128 commented Sep 1, 2016

Uh oh!

Jimilian commented Sep 1, 2016

Uh oh!

frerich commented Sep 1, 2016

Uh oh!

webmaster128 commented Sep 1, 2016

Uh oh!

frerich commented Sep 1, 2016

Uh oh!

webmaster128 Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

frerich Sep 2, 2016

Choose a reason for hiding this comment

frerich commented Aug 24, 2016 •

edited

Loading

codecov-io commented Aug 24, 2016 •

edited

Loading

Jimilian commented Aug 25, 2016 •

edited

Loading

siu commented Aug 25, 2016 •

edited

Loading

Jimilian commented Aug 25, 2016 •

edited by frerich

Loading

siu commented Sep 1, 2016 •

edited

Loading