Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@siu
Copy link
Contributor

@siu siu commented Oct 24, 2016

I was doing some preliminary work trying to understand #155 and I found that createManifestEntry does not give consistent results between runs. This is probably broken since #222. In this PR I added the unit tests to show the (old) bad behavior and fixed it. As the order of the includes changes in the manifest files the MANIFEST_FILE_FORMAT_VERSION is bumped too.

@codecov-io
Copy link

codecov-io commented Oct 24, 2016

Current coverage is 88.86% (diff: 100%)

Merging #235 into master will increase coverage by 0.01%

@@             master       #235   diff @@
==========================================
  Files             1          1          
  Lines           996        997     +1   
  Methods           0          0          
  Messages          0          0          
  Branches        158        158          
==========================================
+ Hits            885        886     +1   
  Misses           83         83          
  Partials         28         28          

Powered by Codecov. Last update 02be13a...2479ad6

@siu siu force-pushed the deterministicCreateManifestEntry branch 2 times, most recently from b7d2513 to 087e1c6 Compare October 24, 2016 21:46
@frerich
Copy link
Owner

frerich commented Oct 25, 2016

It might be too early for me, but is the manifest creation really broken right now? My understanding is that the order of the include files certainly should influence the manifest hash, since the order of includes is relevant for what code the preprocessor generates. Consider e.g. the two header files

// a.h
void f()
{
#ifdef SOMEDEFINE
    std::cout << 'f(): 1\n';
#else
    std::cout << 'f(): 2\n';
#endif
}
// b.h
#define SOMEDEFINE

If a.h is included before b.h, the function f will print f(): 2. If the header files are included in the reverse order, otherwise it will print f(): 1. Hence, different object files should be restored from the cache depending on the order of the #include statements.

Or am I missing something, and this isn't only about the order of the include files?

@siu
Copy link
Contributor Author

siu commented Oct 25, 2016

The example scenario should still be handled after the modifications. @webmaster128 explained somewhere else that the order and duplicity of the includes is taken care of by the hashing of the includes and the cpp file. In the specific case of direct mode, the manifesthash contains the contents of the cpp file and the objectHash stored per ManifestEntry also uses the manifesthash. There are a few integration tests that show this behavior in integrationTests.py TestHits.

I have another question: I had some problems with line endings

  • the default configuration of git for windows sets autocrlf to true, which converts all line endings to crlf on checkout
  • the Appveyor configuration sets it to input which means that it won't convert the line endings to crlf on checkout (search for "autocrlf" in https://www.appveyor.com/docs/appveyor-yml/).

I wrote the md5sum of some files in the repository assuming crlf line endings which didn't match when run in Appveyor. For the moment I changed the appveyor configuration with autocrlf=true which fixed the problem but I would like to know your opinion.

@siu
Copy link
Contributor Author

siu commented Oct 25, 2016

I am also planning to do some cleanup on this branch today.

@siu siu force-pushed the deterministicCreateManifestEntry branch 2 times, most recently from 7e6ed04 to f03606f Compare October 25, 2016 09:39
@webmaster128
Copy link
Contributor

I wrote the md5sum of some files in the repository assuming crlf line endings which didn't match when run in Appveyor. For the moment I changed the appveyor configuration with autocrlf=true which fixed the problem but I would like to know your opinion.

I think we cannot assume that any specific option is set in the user's environment. But if clcache really needs to rely on a specific line break value, we can use a per-repo setting like this: https://help.github.com/articles/dealing-with-line-endings/#per-repository-settings

@webmaster128
Copy link
Contributor

I think the include paths should be represented as a set whenever possible because this is the proper data structure: unsorted, no duplicates.

There are two interfaces where the set needs to be converted:

  • Set to JSON: there is no set type in JSON, so we need to store to list. I always sorted the list when exporting the set to JSON for convenience when debugging
  • Includes set to hash: When generating the hash of a set, we must guarantee that that two equal sets return the same hash. One way to implement this is to convert to a sorted list first.

This I would pass a set into ManifestRepository.getIncludesContentHashForHashes and and implement it such that two equal sets return the same hash.

@siu siu mentioned this pull request Oct 25, 2016
@webmaster128
Copy link
Contributor

One other thing: Could we avoid testing concrete hashes? Those values just change too often, e.g. when file format version changes or we add some salt to the hash. Instead I would test something like: If we put in the same thing in, do we get the same hash out?

@siu
Copy link
Contributor Author

siu commented Oct 25, 2016

@webmaster128 I agree that testing concrete hashes is not a good idea, it is harder to maintain. I will change the tests, remove the include files and the problem with the line endings will disappear. I will look at your other suggestions too.

@siu siu force-pushed the deterministicCreateManifestEntry branch from 3c81de3 to 2479ad6 Compare October 25, 2016 13:21
@siu siu force-pushed the deterministicCreateManifestEntry branch from 2479ad6 to fc41196 Compare October 25, 2016 13:45
@frerich frerich merged commit c852c14 into frerich:master Oct 25, 2016
@frerich
Copy link
Owner

frerich commented Oct 25, 2016

I get it now - I think this makes sense. Thanks for cleaning up the commits. Merged!

@siu
Copy link
Contributor Author

siu commented Oct 25, 2016

@frerich thanks for merging, I was about to report the latest changes:

  • Check if the hash is consistent between calls to createManifesEntry instead of expecting a concrete hash
  • The files are not included in the repository but generated by code in the test.
  • @webmaster128 while the logic of cleaning the includes can be displaced to ManifestRepository.getIncludesContentHashForHashes I think it makes more sense to keep it in createManifestEntry to save computing the hashes of duplicated includes and storing the sorted includes in the ManifestEntry. We can revisit this later.

Sorry for all the rebases ;)

@siu siu deleted the deterministicCreateManifestEntry branch October 25, 2016 13:59
@frerich
Copy link
Owner

frerich commented Oct 25, 2016

@siu The rebases are greatly appreciated, it was only at the very end, with three commits, that I somehow managed to connect all the dots in my mind. :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants