Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@frerich
Copy link
Owner

@frerich frerich commented Aug 24, 2016

This PR attempts to improve concurrency by making clcache not lock the entire cache but only affected sections of it.

At the same time, it actually increases the duration for which a section is locked in case of a cache miss: we will hold the lock even while the compiler is running, something which seems more correct but which was not viable before because it would greatly impact concurrency.

This PR is meant to supersede #208. It resolves #160 .

@codecov-io
Copy link

codecov-io commented Aug 24, 2016

Current coverage is 87.33% (diff: 84.53%)

Merging #217 into master will decrease coverage by 0.69%

@@             master       #217   diff @@
==========================================
  Files             1          1          
  Lines           969       1003    +34   
  Methods           0          0          
  Messages          0          0          
  Branches        160        159     -1   
==========================================
+ Hits            853        876    +23   
- Misses           86         94     +8   
- Partials         30         33     +3   

Powered by Codecov. Last update 129561a...0954a47

@siu
Copy link
Contributor

siu commented Aug 24, 2016

I run the performancetests.py with the current PR and I got this:

Compiling 30 source files sequentially, cold cache: 8.87619953641643 seconds
Compiling 30 source files sequentially, hot cache: 1.84543584907858 seconds
Compiling 30 source files concurrently via /MP12, hot cache: 0.6135658206123011 seconds

With a bigger codebase (hardlink=1, 6 real cores, 12 logical cores, 1223 object files) I am experiencing crashes with this stack trace:

Traceback (most recent call last):
File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in
exec(code, m.dict)
File "clcache.py", line 1604, in
File "clcache.py", line 1483, in main
File "clcache.py", line 1514, in processCompileRequest
File "clcache.py", line 1550, in processDirect
File "clcache.py", line 250, in enter
File "clcache.py", line 266, in acquire
main.CacheLockException: Error! WaitForSingleObject returns 258, last error 183

After retrying a few times the whole codebase compiled. I cleaned the working dir and recompiled with the hot cache. The speedup is about 15x compared to 5.6x in previous versions. A very similar speedup to #208.

@frerich
Copy link
Owner Author

frerich commented Aug 24, 2016

Ah, interesting! That WaitForSingleObject error you get is a timeout: by default, a clcache process will wait for up to 10 seconds to acquire a lock. You can customise this timeout using the CLCACHE_OBJECT_CACHE_TIMEOUT_MS environment variable: setting it to e.g. 20000 will raise the timeout to 20 seconds.

Did you change this variable to a lower value by any chance? It's surprising that ten seconds are not sufficient...

@Jimilian
Copy link
Contributor

Jimilian commented Aug 25, 2016

@frerich I got same error multiple times in row:

           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1596, in processNoDirect
           File "clcache.py", line 1340, in processCacheHit
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183
         Failed to execute script clcache
         Traceback (most recent call last):
           File "clcache.py", line 1609, in <module>
           File "clcache.py", line 1483, in main
           File "clcache.py", line 1512, in processCompileRequest
           File "clcache.py", line 1594, in processNoDirect
           File "clcache.py", line 250, in __enter__
           File "clcache.py", line 266, in acquire
         __main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183

I used default value (10 seconds), so, I think it's kind of deadlock, because it happened in same moment in several projects, it's not "dead" only because of timeout.

p.s. 16 cores.

@siu
Copy link
Contributor

siu commented Aug 25, 2016

I am using the default timeout, I had to increase it in the past but with #208 it wasn't needed anymore so it is not set in the environment.

Some of our targets take almost a minute to compile (that's one of the reasons why we need clcache). The locks still have the chance of being shared (less often), so if the lock is held during compilation other processes will have to wait more than 10 seconds.

@frerich
Copy link
Owner Author

frerich commented Aug 25, 2016

@Jimilian That's very interesting, I agree - it looks like a deadlock which is eventually resolved when the timeout hits. It's very interesting that you manage to trigger this even with NODIRECT set -- in that case, the code path is a lot simpler: all the relevant code is in the short processNoDirect function.

There are two locks involved for every NODIRECT invocation:

  1. One lock which synchronises access to one of the sections of the cache (the section is determined by the first two characters of the cached object hash key). All invocations which access the same section are synchronised, but since there are up to 256 sections you hopefully don't hit this scenario very often.
  2. A second lock which synchronises accesses to the statistics file. This is a global lock, i.e. all clcache invocations share the same lock to make sure they don't step on each others toes when writing to the statistics.

My first idea was that maybe the order in which these locks are acquired is mixed up somewhere, i.e. if process A would first acquire 1 and then 2 and process B would first acquire 2 and then 1, then the two might deadlock each other. Alas, I couldn't spot such a bug in the code on first glance...

@frerich
Copy link
Owner Author

frerich commented Aug 25, 2016

@Jimilian & @siu -- I just extended this PR with a little temporary commit adding some debug output for acquiring/releasing locks. This will generate a lot of output, but it may help us with telling what's going on.

The output will cause clcache to print its PID and the name of the lock everytime it starts to wait for a lock, acquires a lock and releases a lock. I hope that this allows reconstructing the order of events which lead to the deadlock.

It would be much appreciated if you could give it a try and then upload the generated output somewhere.

@frerich
Copy link
Owner Author

frerich commented Aug 25, 2016

@siu wrote:

Some of our targets take almost a minute to compile (that's one of the reasons why we need clcache). The locks still have the change of being shared (less often), so if the lock is held during compilation other processes will have to wait more than 10 seconds.

That's correct, but other process will only have to wait more than 10 seconds if you a) have a cache miss and b) you only block other processes accessing the same cache section (a 1/256 chance assumingn an even distribution of hash sums).

@Jimilian
Copy link
Contributor

Jimilian commented Aug 25, 2016

It's not related to waiting during compilation. Because I didn't have same problem before, and timeout was same and in similar place. I had broken json files - it's true. But not a deadlocks.

@frerich, I also added information about mutexName to exception locally. So, now it prints:

_main__.CacheLockException: Error! WaitForSingleObject returns 258 for Local\C--clcache-stats.txt, last error 0

I think that deadlock was created on statistics. Also bug happens only on cold cache. I failed to repeat on hot.

Tail (I hope it's enough): https://gist.github.com/frerich/60b424950f66b3e41a4831d2f1e70b56 (edited by @frerich such that the log is a gist instead of shown inline)

@Jimilian
Copy link
Contributor

Also it could be related to cleanup, probably. When we need to lock all sections.

@frerich
Copy link
Owner Author

frerich commented Aug 25, 2016

Thanks for posting those logs - I believe I can explain this issue now: in non-direct mode, the following locks are acquired and released during a cache miss:

  1. affected cache section lock is acquired to test whether cached data is available
  2. global cache statistics lock is acquired, number of cache misses are bumped
  3. object is added to cache; if the cache is too big, this causes all cache sections to get locked in order to shrink the cache
  4. global statistics lock is released
  5. affected cache section lock is released

So what can happen is this: two parallel clcache instances 'A' and 'B' can get executed e.g. like this:

  • 'A' locks cache section 'x'
  • 'A' locks global statistics lock
  • 'B' locks cache section 'y'
  • 'A' needs to lock all sections to do a cleanup

At this point, 'B' cannot proceed because 'A' still holds the statistics lock, but 'A' cannot proceed because 'B' still holds the lock on section 'y'. So they deadlock, and the situation is resolved by 'B' timing out while waiting for the statistics.

@Jimilian
Copy link
Contributor

@frerich How are you going to solve it? I see only one option - check cleanup after unlocking/before locking. Probably, then script starts/ends.

@frerich
Copy link
Owner Author

frerich commented Aug 25, 2016

I think one alternative is to not nest the locks, i.e. only update the statistics after working on the actual cache. This means that there's a small time window in which the statistics don't reflect the actual state visible to other clcache processes, but I think that should not hurt.

@siu
Copy link
Contributor

siu commented Aug 25, 2016

FYI when you are back to direct mode. The problem I reported before vanishes if I configure a big timeout (5 mins) so I don't think it is a deadlock in my case.

@frerich
Copy link
Owner Author

frerich commented Aug 25, 2016

@Jimilian I now implemented the idea of cleaning the cache before starting a compile job, not after it. This is more like what e.g. Git does (checking whether it needs to GC before actually doing its work). It's a regression for cache misses which now may be a little bit slower than before (because they need to open two JSON files to test whether any cleaning is to be done), but it's conceptually a lot simpler.

@siu @Jimilian Maybe you could give this new version a try?

@frerich frerich removed the wip label Aug 26, 2016
@Jimilian
Copy link
Contributor

So, I failed to reproduce deadlock. I tested it multiple times with cold cache (code was completely new, but cache itself was full) and compilation passed without any issues as expected.

@siu
Copy link
Contributor

siu commented Aug 26, 2016

I got a new stacktrace with the current version:

cold_edited.zip

I hope it helps.

@frerich
Copy link
Owner Author

frerich commented Aug 26, 2016

@siu Hmm, that's interesting. I see

Traceback (most recent call last):
  File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in <module>
    exec(code, m.__dict__)
  File "clcache.py", line 1606, in <module>
  File "clcache.py", line 1483, in main
  File "clcache.py", line 1516, in processCompileRequest
  File "clcache.py", line 1552, in processDirect
  File "clcache.py", line 249, in __enter__
  File "clcache.py", line 265, in acquire
__main__.CacheLockException: Error! WaitForSingleObject returns 258, last error 183

I.e. a timeout while waiting for a cache section lock in processDirect -- but this would mean that it took 10 seconds (by default) to get the lock, so a lot of compile requests would go to the same section. Unlikely, but possible.

In master, the error message given for timeouts is a lot more useful because it tells which lock it wanted to acquired. This might shed some light - I'll rebase this PR onto master, could you then maybe try it again?

@frerich frerich force-pushed the per_section_locking branch from b95c123 to ef688a7 Compare August 26, 2016 18:46
@frerich
Copy link
Owner Author

frerich commented Aug 26, 2016

@siu Ok, rebased now - would be great if you could give it another shot. :-)

@siu
Copy link
Contributor

siu commented Aug 28, 2016

@frerich sure, I will try tomorrow

@siu
Copy link
Contributor

siu commented Aug 28, 2016

I am analyzing the code trying to understand if deadlocks can occur.

Deadlocks cannot occur if all possible executions of the software acquire the locks in the same order. For that reason I wrote down here all functions that take locks and checked if they respect the same order:

Cache.lock
    with [manifestSection, artifactsSection, statistics]


printStatistics
    with [stats]
resetStatistics
    with [stats]
cleanCache
    with [stats]
        with [Cache.lock]
clearCache
    with [stats]
addObjectToCache
    with [artifactsSection]
processCacheHit
    with [artifactsSection, stats]


== Postprocess functions

postprocessObjectEvicted
    with [stats]
        calls addObjectToCache
postprocessHeaderChangedMiss
    with [stats]
        calls addObjectToCache
postprocessNoManifestMiss
    with [stats]
        calls addObjectToCache

updateCacheStatistics
    with [stats]


== Process functions

processDirect
    with [manifestSection]:
        calls postprocessNoManifestMiss
        calls postprocessHeaderChangedMiss
        with [artifactSection]:
            calls processCacheHit
            calls postprocessObjectEvicted

processNoDirect
    with [artifactsSection]:
        calls processCacheHit
        with [stats]
            calls addObjectToCache

With this information we can see that there are code paths that acquire the locks in different order and that will lead to deadlocks. I know, manifestSection and artifactSection are different in the majority of cases, but for sure there will be collisions every once in a while. In addition the operations on the whole cache need to acquire all of them.

For example, let's say one process A has to clean the cache, which involves:

  1. lock stats
  2. lock all manifestSections
  3. lock all artifactsSections
  4. lock stats (again).

Another process B may be running processDirect which may involve:

  1. lock manifestSection
  2. call postprocessNomanifestMiss
  3. lock stats
  4. call addObjectToCache
  5. lock artifactsSection

In this scenario, if A executes until A2 and B executes until B3, they will deadlock, A needs a manifestSection locked by B and B needs the stats locked by A. In reality one of them would timeout and crash and the other process would succeed. Note that increasing the timeout delay would not help in this situation, one of the processes would still timeout.

There are a few fixes that need to be applied here and there but the most problematic one seems to be the lock of the artifactsSection inside addObjectToCache. It could be removed from there and acquired in the postprocessing functions before the stats are locked (in the same with for example).

We can elaborate a full list of things to change and iterate if you wish.

@siu
Copy link
Contributor

siu commented Aug 29, 2016

More logs with stacktrace:

20160829-cold-01-edited.zip
20160829-cold-02-edited.zip

The first one is the original in the repository and the second contains timestamps in addition.

@frerich frerich self-assigned this Aug 31, 2016
@siu
Copy link
Contributor

siu commented Sep 1, 2016

I've run some preliminary tests and this time the compilation finished properly with cold and hot caches.

I am running clcache with hardlink=1, 12 cores, timeout=300000ms, 1222 targets. The speedup with this PR is 9.7x compared to 5.6x of my last tests of master. I was expecting faster times, I will enable profiling and post the results.

@frerich
Copy link
Owner Author

frerich commented Sep 1, 2016

@siu Thanks for sharing those numbers! The strongest Windows build machine I ahve access to unfortunately only has two cores, so I very much depend on people with better hardware to try this stuff. I really appreciate it, and am looking forward to the profiling output.

Did your build run all the way to the end without any exceptions this time? If so, that would be encouraging since it suggests we fixed all the locking issues. I agree that 'just' 9.7x is less than expected, but at least this is a good argument to merge the PR and then implement additional improvements (maybe to the statistics locking) in a separate PR.

I'm curious how things are going for @Jimilian with this version. :-)

@siu
Copy link
Contributor

siu commented Sep 1, 2016

Here are the results of the profiling with cold and hot caches:
20160901-pr17-profile.zip

I will keep using this branch and report any issue if there is any.

Did your build run all the way to the end without any exceptions this time? If so, that would be encouraging since it suggests we fixed all the locking issues. I agree that 'just' 9.7x is less than expected, but at least this is a good argument to merge the PR and then implement additional improvements (maybe to the statistics locking) in a separate PR.

Yes, this time it worked without problems, the only thing is that I have a huge timeout. I agree it is looking good for merging, let's see what the others say.

@frerich
Copy link
Owner Author

frerich commented Sep 1, 2016

@siu Ah, that's interesting! Looking at the profiling for the hot cache, I see it starts with this:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1177    0.041    0.000  287.120    0.244 {built-in method exec}
     1177    0.095    0.000  287.079    0.244 <string>:1(<module>)
     1177    0.057    0.000  286.972    0.244 clcache.py:1454(main)
     1177    0.236    0.000  192.237    0.163 clcache.py:1532(processCompileRequest)
     1177    0.133    0.000  178.124    0.151 clcache.py:1588(processDirect)
     1177    5.733    0.005  153.371    0.130 clcache.py:220(getIncludesContentHashForFiles)
  1189217   29.296    0.000  147.018    0.000 clcache.py:739(getFileHash)
    46915   96.204    0.002   96.204    0.002 {method 'write' of '_io.BufferedWriter' objects}
     2354    0.008    0.000   93.095    0.040 clcache.py:69(printBinary)
  1197456   70.514    0.000   70.548    0.000 {built-in method open}
  1236132   28.167    0.000   28.167    0.000 {method 'read' of '_io.BufferedReader' objects}
  1191571   16.949    0.000   16.949    0.000 {method 'update' of '_hashlib.HASH' objects}
     1177    0.047    0.000   16.348    0.014 clcache.py:1357(processCacheHit)

I.e. of the 287s seconds the whole build took, I think a huge portion is spent in getFileHash which probably also accounts for some of the 70 seconds spent in open() (and for a lot of the time spent in _io.BufferedReader.read and _hashlib.HASH.udpate). So maybe if we can make getFileHash faster, that would help the 'hot cache' case the most.

I wonder about the 96 seconds spent in _io.BufferedWriter.write though... I wish I knew where that is coming from. Maybe printBinary? But why would it be so slow...

@webmaster128
Copy link
Contributor

I'm planing to provide additional test results of this PR in a 6-core local SSD, hardlink machine within 2-3 days.

@Jimilian
Copy link
Contributor

Jimilian commented Sep 1, 2016

I have bad news... Now it's significantly slower (13 minutes vs 10 minutes with hot cache).
I will try to profile as well.
NO_DIRECT, hardlink, 16 cores, SSD.

@frerich
Copy link
Owner Author

frerich commented Sep 1, 2016

@siu In my experiments, simply changing 'HashAlgorithm' at the top of clcache.py to hashlib.sha1 made things about 10% faster in the 'hot cache' scenario (a result which would contradict a5b72cb though that commit was done with Python 2.x).

@Jimilian That's really surprising, looking forward to the profile data.

@webmaster128
Copy link
Contributor

In my experiments, simply changing 'HashAlgorithm' at the top of clcache.py to hashlib.sha1 made things about 10% faster in the 'hot cache' scenario (a result which would contradict a5b72cb though that commit was done with Python 2.x).

This is very implementation dependent. The openssl implementation for Linux64 has a SHA1 faster than MD5

$ openssl speed md5 sha1
[...]
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              38764.20k   121777.26k   298165.85k   471874.90k   563161.77k
sha1             53112.71k   157630.83k   397270.19k   636987.39k   751763.46k

For our purpose, we don't need a cryptographic hash function. So we could switch to something much faster. xxHash looks very interesting to me, especially because there is an implementation for Python in pip, that uses the hashlib interface. This would allow us to easily toggle between md5 and xxHash using an environment variable.

But what I would like to verify first is, if the hashing function has an impact at all. As far as I know, md5 and SHA1 are usually faster than available persistent IO devices.

@frerich
Copy link
Owner Author

frerich commented Sep 1, 2016

With this PR being at 71 comments now, I think we should -- at least in this thread -- concentrate on getting the per-section locking right and then use separate PRs for other performance improvements so that this doesn't get ouf of hand. :-}

This PR is already blocking some other work from progressing.

@frerich frerich force-pushed the per_section_locking branch from 64e50fe to 0954a47 Compare September 2, 2016 10:09
Frerich Raabe added 10 commits September 2, 2016 12:11
This centralises the code for creating a system-wide lock given some
path. The function also respects the lock timeout environment variable.
I factor this code out into a separate method since I plan to introduce
more, finer-grained locks, which will very much use the same logic.
The plan is that these locks synchronise access to an individual section
of the cache.
This is a system-wide lock to be used for synchronising accesses to the
statistics of a cache.
This reimplements the global 'Cache.lock' lock such that it's defined in
terms of the individual section and the statistics locks. This means
that acquiring and releasing Cache.lock acquires and releases locks for
all sections and for the statistics. This is slower than before
(because it requires acquiring and releasing up to 513 locks) but it
should be only rarely needed - mostly, when cleaning the cache.
This (big) patch attempts to improve concurrency by avoiding the global
cache lock. Instead, we will lock only those manifest and compiler
artifact sections which we deal with. The only cases where we need to
synchronize all concurrenct processes is when updating the statistics
file, because there's only a single statistics file. At least for cache
hits this could be avoided by tracking the number of cache hits per
section, too.

To avoid deadlocks, the locks have to acquired in the same order for all
execution paths (the order is defined in Cache.lock, i.e. first
manifests, then artifacts, then statistics). Hence, locking of the artifact
section had to be pulled out of the addObjectToCache() function since
that function was called with the stats locked already - a violation of
the locking order.

Furthermore, we can no longer perform cache.clean() in
addObjectToCache() because cache.clean() acquires the global lock, so
e.g. this sequence of steps was possible in non-direct mode:

1. 'A' locks cache section 'x'
2. 'A' locks global statistics lock
3. 'B' locks cache section 'y'
4. 'A' needs to lock all sections to do a cleanup

At this point, 'B' cannot proceed because 'A' still holds the statistics
lock, but 'A' cannot proceed because 'B' still holds the lock on section
'y'.

This issue is caused by -- from B's vie -- the statistics lock being
locked before a section lock. This must never happen.

At the point addObjectToCache() is called, we already have the
statistics locked and we know that it may be that the cache size limit
was just exceeded, so it's a good moment to determine that a cleanup is
needed. It's not a good moment to *perform* the cleanup though.

Instead, let the function return a flag which is propagated all the way
back to processCompileRequest(). The flag indicates whether cleanup is
needed, and if so, processCompileRequest() will acquire Cache.lock
(which acquires all sections and statistics in the correct order) to do
the cleanup.
By checking for 'manifest == None' early and returning early, we can
reduce the indentation depth for the larger 'manifest != None' branch.
It's really an exceptional issue, let's handle that on the caller side
so that we get the 'forward to real compiler' logic for free.
It's just ManifestRepository.getIncludesContentHashForFiles() which can
raise IncludeChangedException, so only that needs to be covered by
try/except; doing so, moving code out of the 'try', allows reducing the
indentation depth.
These functions, just like postprocessObjectEvicted(), don't need to
lock the cache section: they are already called while the cache section
is locked (in processDirect()).
printTraceStatement("Cannot cache invocation as {}: called for preprocessing".format(cmdLine))
updateCacheStatistics(cache, Statistics.registerCallForPreprocessing)
except IncludeNotFoundException:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a simple tracing message here? I just run into this and did not have a clue, why there are no hits.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I rather won't do it right now. The code seems fishy and there might be a patch needed here which is worth its own PR.

This exception can apparently be raised in two situations:

  1. During a cache miss, in case an include file which cl.exe printed (via /showIncludes) was deleted (or otherwise became unreadable) right after the compiler finished. Very unlikely, I guess.
  2. During a cache hit, in case a manifest references a non-existent include file. In this case however, shouldn't the code rather update (i.e. rewrite) the manifest?

@webmaster128
Copy link
Contributor

Okay I guess my 2011 CPU Phenom II X6 1055T is just too slow for this kind of measurements. I get no significant difference in performance (this vs. master) using #226. In the 13 seconds of restoring, all 6 cores are under full load, both on master and using this.

In the performancetests.py, this at brings some 22 %.

However, everything compiles solid as expected. So feel free to merge, if you want to.

@frerich frerich merged commit d3bc5c0 into master Sep 2, 2016
@frerich
Copy link
Owner Author

frerich commented Sep 2, 2016

Two people other confirmed this brings a modest improvement, and does not seem to hurt. So let's merge this and see how it goes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Global cache lock hurts cache hits during heavily concurrent builds

6 participants