Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@siu
Copy link
Contributor

@siu siu commented Aug 26, 2016

A bit late to the party now that #217 is ongoing and will conflict in a few places, nevertheless this fixes a few bugs and concurrency issues.

This PR tries to solve the following issues:

  • Handle alternating include files Deal with alternating headers #212.
  • Fix race condition updating manifest files in postprocess functions (from From Don't hold the cache lock while computing the manifest hash for the current object #208 (comment)):

    postprocess functions should not assume that the object or manifest cache are in the same state as before. I.e. postprocessNoManifestMiss is run because the manifest didn't exist but by the time it is run the manifest file may exist already. The same applies to postprocessHeaderChangedMiss. That means that the read-update-write of the manifest files needs to be atomic and re-tried by the postprocess functions for each change in the manifests.

If the whole operation of clcache is performed under a lock as proposed in #217 the race condition described above disappears and this PR makes some extra operations (re-reading the manifest file) but still fix the problems with alternating headers.

There are still a few things to consider:

  • In the general case the order of includes in C/C++ is important. There are no tests checking if the file is re-compiled when the order of includes change. For the same reason if a file is included more than once the includesContentHash should be affected.
  • I did not push the release of the lock while computing the current hash of the include files as in Don't hold the cache lock while computing the manifest hash for the current object #208 because all this is going to change with Per-section locking #217 but it is easy to apply.
  • Reduce the integration tests to the minimum subset (help?).
  • When a matching ManifestEntry is found the entry can be pushed to the top of the Manifest entries to implement LRU.

@webmaster128
Copy link
Contributor

In the general case the order of includes in C/C++ is important. There are no tests checking if the file is re-compiled when the order of includes change. For the same reason if a file is included more than once the includesContentHash should be affected.

Since the source code as well as all include contents are hashed, the oder is implicitly fixed.

@codecov-io
Copy link

codecov-io commented Aug 26, 2016

Current coverage is 88.82% (diff: 100%)

Merging #222 into master will increase coverage by 0.28%

@@             master       #222   diff @@
==========================================
  Files             1          1          
  Lines           968        984    +16   
  Methods           0          0          
  Messages          0          0          
  Branches        154        156     +2   
==========================================
+ Hits            857        874    +17   
- Misses           82         83     +1   
+ Partials         29         27     -2   

Powered by Codecov. Last update bdf41a6...66fa5e8

@siu
Copy link
Contributor Author

siu commented Aug 26, 2016

@webmaster128 oh yes, that even works for transitive includes, still it would be good to add tests.

@frerich
Copy link
Owner

frerich commented Aug 26, 2016

@siu Good stuff! Indeed, I'm afraid half of this will conflict once #217 is in, and the 'alternating headers' stuff kind of overlaps with #212 . My understanding is that @webmaster128 has something in the pipe for #212 (which is why I'm not merging #216 yet), so I can't quite tell what to do with this PR just yet.

My hope is that we can get #217 ready real soon, so that this PR can be rebased and then we can compare what's left with #212.

All these PR numbers make me dizzy. %-)

@webmaster128
Copy link
Contributor

@frerich I did not start working on anything apart the test in #212 and am happy to see you both working on it. I think no matter how #212 is solved, #216 becomes obsolete.

I personally would not touch #212 before #155 is solved, but that might be a matter of taste or use case.

@frerich
Copy link
Owner

frerich commented Aug 26, 2016

@webmaster128 Ah, thanks for the clarification. Since @siu already picked the test from #212 -- doesn't that render #212 obsolete in favour of this PR?

@siu
Copy link
Contributor Author

siu commented Aug 28, 2016

That works for me, let's wait until #217 is merged and then rebase this one. In the meanwhile I will probably add more tests. You can ignore the changes until then, I am also learning how clcache works with this PR.

As @webmaster128 said, #216 will be obsolete in any case.

@webmaster128 In addition to the two reasons that you point in #155 not updating the Manifest file but creating a new one was also triggering #155. It was the case in your integration test and that is why it is passing now. You pointed that one way to trigger #155 is by cleaning the manifests but not removing the cached objects, I haven't looked at how cleaning the cache works so I am not able to create a test that triggers that. Maybe adding such a test would be a good way to proceed with #155.

@siu siu force-pushed the atomicAndMultipleEntriesPerManifest branch from cac060c to 387e773 Compare August 30, 2016 09:24
@frerich
Copy link
Owner

frerich commented Sep 2, 2016

#217 was now merged, so I think this one could be rebased.

@siu siu force-pushed the atomicAndMultipleEntriesPerManifest branch from 387e773 to d3b0ba8 Compare September 5, 2016 10:21
@siu
Copy link
Contributor Author

siu commented Sep 5, 2016

Updated, I'll wait for your reviews.

I think it can still be improved by pushing the manifest entry to the top of the list when there is a hit. I will add more tests too.

listOfHashes.append(getFileHash(path))
except FileNotFoundError:
includeMissing = True
raise IncludeNotFoundException
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does this change mean that in case the statistics are counted slightly differently in case some included file has a different hash, and another included does not exist anymore?

@frerich
Copy link
Owner

frerich commented Sep 5, 2016

Looks promising! I hope it wasn't too much of a pain to update this PR after #217 was merged.

I only did a first pass through the code now, I think I'll need to do a second pass after reading the tests very carefully since I must admit that I'm not quite sure which issue this PR tries to fix exactly (and how). :-]

@siu siu force-pushed the atomicAndMultipleEntriesPerManifest branch from d3b0ba8 to 59dc154 Compare September 5, 2016 16:19
@siu
Copy link
Contributor Author

siu commented Sep 5, 2016

I updated the pull request with (most of) your suggestions. I switched to lists in createManifestEntry because I was worried about the ordering and duplicity of includes but as @webmaster128 was pointing it is already handled by the hashing of the contents.

@siu
Copy link
Contributor Author

siu commented Sep 5, 2016

I also added the code for reordering the manifest entries when there is a hit in b7cac4c

@siu siu force-pushed the atomicAndMultipleEntriesPerManifest branch 2 times, most recently from 8455bb4 to db79a52 Compare September 5, 2016 20:34
@siu siu force-pushed the atomicAndMultipleEntriesPerManifest branch from db79a52 to c3ba9b2 Compare September 5, 2016 20:45
clcache.py Outdated
"""Moves entry in entryIndex position to the top of entries()"""
entry = self._entries[entryIndex]
self._entries.pop(entryIndex)
self._entries.insert(0, entry)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pop() already returns the entry which was popped, so you could simplify to

self._entries.insert(0, self._entries.pop(entryIndex))

@frerich
Copy link
Owner

frerich commented Sep 7, 2016

To me, it looks like it's good to go! There are just smaller stylistic things which came to my mind, but nothing really which warrants blocking this improvement from being incorporated.

The main piece missing, to me: an entry for the ChangeLog file. However, even after reading the test code and the description of this PR as well as the description of #212 I'm not totally sure what "alternating headers" are. So I don't even know what to write. :-)

What I gathered is that this PR improves things such that a cache hit is performed in some cases where we'd previously unnecessarily resulted in a cache miss, and this is somehow related to two compilations using... 'alternating headers'?

@siu
Copy link
Contributor Author

siu commented Sep 7, 2016

I tried my best to update the changelog.

Just to rephrase it: these changes try to improve the hit rate when changing between branches.

Let's say I am in branch1 which contains:

  • A.cpp
  • B.h (version 1)

Compiling with clcache (as in master) will create a manifest file for A.cpp with the contents hash of the current includes. Changing then to branch2 with the following contents:

  • A.cpp
  • B.h (version 2)

clcache will find the previous manifest but the contents hash of the include files will not match. This produces a cache miss and the contents of the manifest will be overwritten with the new includesContentHash -> objectHash.

Switching back to branch1 will generate another cache miss as the manifest does not match the content of the includes. This could be avoided as the object is already stored in the cache.

This PR solves that by having multiple entries per manifest file. Each manifest entry contains the previous contents of the manifest: the path to the includes, the hash of the contents of the includes and the hash of the cached object. Note that having the full list of includes in the manifest entries is needed because the full list may have changed (through transitive includes) and will be needed during the validation phase.

@frerich frerich merged commit c99332f into frerich:master Sep 7, 2016
@frerich
Copy link
Owner

frerich commented Sep 7, 2016

Aaah, thanks for the explanation! Makes sense. Thanks a lot for the contribution, merged it now!

@siu siu deleted the atomicAndMultipleEntriesPerManifest branch September 7, 2016 20:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants