Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@frerich
Copy link
Owner

@frerich frerich commented Sep 2, 2016

This PR is based on #217 and takes advantage of the refactored code to save about 40 lines of code (Git says ' 1 file changed, 27 insertions(+), 70 deletions(-)') in comparison to #217 . This PR should be merged only after #217 was merged.

Frerich Raabe added 21 commits September 2, 2016 12:11
This centralises the code for creating a system-wide lock given some
path. The function also respects the lock timeout environment variable.
I factor this code out into a separate method since I plan to introduce
more, finer-grained locks, which will very much use the same logic.
The plan is that these locks synchronise access to an individual section
of the cache.
This is a system-wide lock to be used for synchronising accesses to the
statistics of a cache.
This reimplements the global 'Cache.lock' lock such that it's defined in
terms of the individual section and the statistics locks. This means
that acquiring and releasing Cache.lock acquires and releases locks for
all sections and for the statistics. This is slower than before
(because it requires acquiring and releasing up to 513 locks) but it
should be only rarely needed - mostly, when cleaning the cache.
This (big) patch attempts to improve concurrency by avoiding the global
cache lock. Instead, we will lock only those manifest and compiler
artifact sections which we deal with. The only cases where we need to
synchronize all concurrenct processes is when updating the statistics
file, because there's only a single statistics file. At least for cache
hits this could be avoided by tracking the number of cache hits per
section, too.

To avoid deadlocks, the locks have to acquired in the same order for all
execution paths (the order is defined in Cache.lock, i.e. first
manifests, then artifacts, then statistics). Hence, locking of the artifact
section had to be pulled out of the addObjectToCache() function since
that function was called with the stats locked already - a violation of
the locking order.

Furthermore, we can no longer perform cache.clean() in
addObjectToCache() because cache.clean() acquires the global lock, so
e.g. this sequence of steps was possible in non-direct mode:

1. 'A' locks cache section 'x'
2. 'A' locks global statistics lock
3. 'B' locks cache section 'y'
4. 'A' needs to lock all sections to do a cleanup

At this point, 'B' cannot proceed because 'A' still holds the statistics
lock, but 'A' cannot proceed because 'B' still holds the lock on section
'y'.

This issue is caused by -- from B's vie -- the statistics lock being
locked before a section lock. This must never happen.

At the point addObjectToCache() is called, we already have the
statistics locked and we know that it may be that the cache size limit
was just exceeded, so it's a good moment to determine that a cleanup is
needed. It's not a good moment to *perform* the cleanup though.

Instead, let the function return a flag which is propagated all the way
back to processCompileRequest(). The flag indicates whether cleanup is
needed, and if so, processCompileRequest() will acquire Cache.lock
(which acquires all sections and statistics in the correct order) to do
the cleanup.
By checking for 'manifest == None' early and returning early, we can
reduce the indentation depth for the larger 'manifest != None' branch.
It's really an exceptional issue, let's handle that on the caller side
so that we get the 'forward to real compiler' logic for free.
It's just ManifestRepository.getIncludesContentHashForFiles() which can
raise IncludeChangedException, so only that needs to be covered by
try/except; doing so, moving code out of the 'try', allows reducing the
indentation depth.
These functions, just like postprocessObjectEvicted(), don't need to
lock the cache section: they are already called while the cache section
is locked (in processDirect()).
The majority of this code appears to be repeated in
postprocessNoManifestMiss(). The only difference is the statistics field
which is updated. So in order to consolidate this code, let's introduce
a new postprocessHeaderChangedMiss() function which is parametrised on
the statistics field to update.
The logic for getting the includes used by a source file and invoking
the compiler in case a manifest is not usable was duplicated in
processDirect(). Centralize this in postprocessUnusableManifestMiss().
These two were really just forwarding to
postprocessUnusableManifestMiss(), they didn't pull they own weight. So
let's remove one level of indirection by inlining these functions.
Inlining this function shows that there is a strong similarity between
processDirect() and processNoDirect(), a duplication I'd like to capture
in a shared function in a subsequent commit.
As inlining processObjectEvictedMiss() shows, we already locked the
artifact section here. We don't need to do it again.
...by enlarging the scope of the cleanupRequired variable in
processDirect() and by renaming 'section' to 'artifactSection' in
processNoDirect().

By making the code more similar, fixing the code duplication between
these two should be safer to do, i.e. mistakes are hopefully easier to
spot.
Not so minor actually, it makes processDirect() and processNoDirect()
more similar.
The version in processNoDirect() could actually be simplified by adding
two tuples. De-denting the return statement in processDirect() makes the
code more similar to what processNoDirect() now does.
...by factoring it out into a getOrSetArtifacts() function (I'm not very
happy with that name...). The function is much like processNoDirect()
except that it expects to be given an artifact section cache key and a
function which bumps the statistics in case of a miss.

I suspect that this function can also be reused in processDirect().
Saves some code at the expense of a trace statement which is now lost. A
small price to pay for the code savings, I think.
@codecov-io
Copy link

codecov-io commented Sep 2, 2016

Current coverage is 88.53% (diff: 93.82%)

Merging #225 into master will increase coverage by 0.50%

@@             master       #225   diff @@
==========================================
  Files             1          1          
  Lines           969        968     -1   
  Methods           0          0          
  Messages          0          0          
  Branches        160        154     -6   
==========================================
+ Hits            853        857     +4   
+ Misses           86         82     -4   
+ Partials         30         29     -1   

Powered by Codecov. Last update 129561a...077b357

@frerich frerich merged commit bdf41a6 into master Sep 2, 2016
@frerich frerich deleted the resolve_code_duplication branch September 2, 2016 18:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants