-
Notifications
You must be signed in to change notification settings - Fork 83
Don't hold the cache lock while computing the manifest hash for the current object #208
Conversation
Current coverage is 78.40% (diff: 69.23%)@@ master #208 diff @@
==========================================
Files 1 1
Lines 974 968 -6
Methods 0 0
Messages 0 0
Branches 162 162
==========================================
- Hits 856 759 -97
- Misses 86 172 +86
- Partials 32 37 +5
|
|
I think this is safe to do, i.e. after looking at the diff I did not find an issue. Looks like a great catch to me. Could you help us be reducing the diff? If you restore the original the diff should get much smaller and the logic is easier to follow. The additional method assignment is irrelevant for the locking time. |
|
Sure, done. |
|
Hi @siu , thanks a lot for your analysis and your proposed patch! I suspect this introduces a race condition though. What happens though when two concurrent clcache invocations both compute the same manifest hash and then look for the same manifest - but it does not exist? In that case, won't they both end up trying to add the same manifest file, potentially at the same time? Does that work? For what it's worth, an alternative solution I was toying with is to not have a cache-global lock but rather have one lock per cache (and mianfest) 'section'. That way, the chance of two concurrent invocations to be blocking each other is reduced from 100% to, uh, a lot less -- only if the manifestHash of the two invocations starts with the same two characters. |
The manifest content is generated during the compiler call (no lock anyway). Storing the manifest happens in a postprocess method where the cache is locked. This would lead to one new manifest overriding the other new manifest, which isn't an issue I think. |
|
@frerich , I make the same analysis as @webmaster128 , the cache is locked for any writing and it should be safe even if two processes generate the same manifest/object files. This PR attempts to reduce contention while using the same lock, having per-section locking will further reduce the chance of using the same lock. |
|
Giving this more thought I think you are right, there are some race conditions in the handling of the manifest files but it is not a problem introduced by this PR. postprocess functions should not assume that the object or manifest cache are in the same state as before. I.e. postprocessNoManifestMiss is run because the manifest didn't exist but by the time it is run the manifest file may exist already. The same applies to postprocessHeaderChangedMiss. That means that the read-update-write of the manifest files needs to be atomic and re-tried by the postprocess functions for each change in the manifests. A writeOrUpdateManifest function could be used for this. I have the feeling that things are simpler for the object cache as they don't aggregate data of several processes. |
|
Indeed, there is a general conflict between correctly coordinating concurrent accesses to the cache and avoiding that concurrent invocations block each other, dragging down performance. I'm currently contemplating a different, more 'transactional' way of managing concurrency. The downside of using locks is that it's a very pessimistic model: it pays a price (creating, acquiring, releasing locks) to handle the exceptional case of two concurrenct processes accessing the same cache entry. Right now, it does so very naively by just managing a single lock -- that's reasonably easy to do, but greatly impacts performance (see #160). A while ago, we started deescalating this issue by not using a single lock but rather dividing the cache into multiple (up to 256) sections, based on the hash key computed for some invocation. That way, only invocations of clcache hitting the same cache section would block. However, this work is not finished yet (we did introduce the notion of sections, but they have no dedicated locks yet), and it's also not trivial since some operations (e.g. cleaning the cache) affect all sections. In other software projects (notably the ones written in Haskell), I got to enjoy STM (software transactional memory) for managing concurrency a lot. It is an optimistic model in that a 'transaction' (a modification of some sort) only becomes expensive if there actually is a conflict. I'm not sure this can be pulled off though. Yet another idea might be to use dedicated readers-writers locks, such that e.g. concurrent reads to the cache are perfectly fine and can happen 100% concurrently, but write accesses are synchronised. This could maybe be combined with the idea of dividing the cache into sections. |
|
Anyway, I saw broken manifest and statistics file several times (that's why I introduced #199). It means that we already had some race conditions before this PR. |
|
By now, I'm really curious how well a good database (e.g. PostgreSQL) would work. It might take care of a couple of issues which came up in the past (or still exist):
My main concern really is performance: I wonder how long it would take to query a local database (or a database which is on a LAN, i.e. with reasonably low latency)... I suspect the average size of object files (a couple of MB in debug builds) would permit storing the object files straight away in the database, but to keep the possibility of hardlinking (i.e. reducing the number of bytes to shovel around), it's probably better to keep a file name and some meta information (file size, date of last access, number of hits) in the database. |
|
When working on the lock file based locking with the possibility in mind to use the cache across several machines (I promise I will get back to work on it soon) I also thought about a server application taking care about synchronization and statistics. I looked a bit into DBMS with regards of storing blobs but couldn't find much. Also I searched a bit about existing NoSql key-value stores for this purpose but also could not find something fitting to what I had in mind. And lastly I thought about writing a custom Python app acting as a server explicitly for clcache. I think that should not be to hard, at least to judge if it might be a reasonable solution for sharing a cache to multiple machines. But first I will get back to the lock file based solution. |
|
How does this relate to #78? I never worked with memchached but it might solve some of the problems. Can those those storage and locking interfaces be generalized in a way that supports different backends? I guess we don't want every user to install huge amounts of dependencies that only pay out with octa cores, ramdisk and InfiniBand. |
|
@akleber One of (in fact: the) earliest clcache contributors, @vchigrin , already implemented some sort of daemon (called 'cacodaemon'). I never tried it myself, but AFAIK he uses it with great success. That was a couple of years ago though, but I see that his fork is still active, so there might be something to reuse. @webmaster128 indeed, I'd like to keep the benefit of clcache of being reasonably easy to deploy. However, it's (as always) a matter of striking the right balance. I'd be willing to make things a 'little' (maybe even optionally) more complex to setup if we get more correctness, speed - or less code. At this point, I think it's too early to really exclude any alternatives. I guess we'll just have to try it out. In one of my local branches, I'm experimenting with something which uses PostgreSQL as the backend. So far, the setup was quite straightforward, let's see where this goes. |
|
@webmaster128 As for how this relates to memcached support: I have no idea, but it's a good point! I have no experience with memcached myself. |
|
I don't think the current approach is too far from the correctness point of view, it should be possible to fix the remaining issues before going to more complex implementations. The plan to add locks per sections seems solid, I think it will bring the biggest improvement with fewer changes. I still think this PR is worth merging, I will update it later on today to sync it with the latest changes in master. I will also start working on a PR to fix the issues described in #208 (comment). |
|
Patch updated. |
|
@siu I now had a first shot at implementing finer-grained locks in order to improve concurrency. The code is available in #217 -- it seems to work, but I only have a dual-core system at hand, where it doesn't seem to make much of a difference. Somebody with more cores at hand would need to give it a try -- can you maybe see whether it helps for you? The code is still a bit rough, but I think conceptually it's sound. |
|
Testing #217. Where do you prefer me to report? I just run the performancetests.py and I got this:
Let me try with a bigger code base. |
|
@siu any place is equally good for reporting, I'm happy that volunteers are willing to try things for me. :-) Those numbers are encouraging and disappointing at the same time:
In any case (I'm biased, of course), I tend to prefer #217 over #208 since it achieves a speedup in concurrent scenarios without moving code out of the locked section (in fact, more code is executed under the lock, something which was not viable before because there was just a single lock). So if you agree, I'd like to eventually close #208 in favor of #217. |
|
#217 was just merged which, to the best of my knowledge, obsoletes this PR - hence, I'll close it. |
First of all, thanks for developing clcache and maintaining it. We are checking the feasibility of using clcache for everyday use and our jenkins server and while the performance boost is great we noticed that it is still far from ccache.
We are profiling ccache (#207)(hardlink=on, directmode=on) and we found that runs that have everything in cache spend a lot of time in processDirect. More revealing is that a lot of time is spent trying to acquire the lock at the beginning of the function. Which probably means that other processes are holding the lock for too long. This change is an attempt to reduce the time spent in the critical section, releasing the lock while computing the manifest hash. The access to the cache is still protected with the lock but I am not sure if the computation of the hash is still 100% safe.
With this change the performance test goes from:
to
This has a big impact in our main build times making them a lot faster (14.8x faster compared to 5.6x faster with the version in master, with hot cache).
Could you help reviewing if these changes are safe?