-
Notifications
You must be signed in to change notification settings - Fork 83
Handle base dir in get manifest hash #238
Conversation
Current coverage is 88.96% (diff: 100%)@@ master #238 diff @@
==========================================
Files 1 1
Lines 997 1006 +9
Methods 0 0
Messages 0 0
Branches 158 164 +6
==========================================
+ Hits 886 895 +9
Misses 83 83
Partials 28 28
|
frerich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks basically perfect to me, except for a missed opportunity to reuse the existing collapseBasedirToPlaceholder function.
This is the first time I use the new GitHub code review feature, let's see how it goes. :-)
clcache.py
Outdated
| commandLine = [arg for arg in commandLine if not arg.startswith("/MP")] | ||
| baseDir = os.environ.get('CLCACHE_BASEDIR') | ||
| if baseDir: | ||
| commandLine = [arg.replace(baseDir, BASEDIR_REPLACEMENT) for arg in commandLine] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could reuse the existing collapseBasedirToPlaceholder here and use
commandLine = [collapseBasedirToPlaceholder(arg) for arg in commandLine if not arg.startswith("/MP")]in line 234.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't work with the current implementation of collapseBasedirToPlaceholder as it assumes that the BASEDIR appears in the first position of the argument and that the argument is in normcase.
Here arg can be of the form -IC:\basedir... which doesn't satisfy any of those conditions. We shouldn't change the case either.
collapseBasedirToPlaceholder could be changed like this to make it work:
def collapseBasedirToPlaceholder(path):
baseDir = normalizeBaseDir(os.environ.get('CLCACHE_BASEDIR'))
- if baseDir is None:
- return path
- else:
- assert path == os.path.normcase(path)
- assert baseDir == os.path.normcase(baseDir)
- if path.startswith(baseDir):
- return path.replace(baseDir, BASEDIR_REPLACEMENT, 1)
- else:
- return path
+ if baseDir:
+ replacement = re.compile(re.escape(baseDir), re.IGNORECASE)
+ return replacement.sub(BASEDIR_REPLACEMENT, path)
+ return path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, true - as it is, collapseBasedirToPlaceholder is not applicable. While thinking about whether it makes sense to adjust it as you suggested, it occurred to me that maybe clcache shouldn't always blindly replace the basedir value.
For instance, should it really replace the basedir in preprocessor definitions passed via -D on the command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siu -- ping? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't think it is correct to apply the replacement blindly. I can instead add a list of parameters and apply the replacement only to those.
In that case, which arguments should be in the white list? I had a look here (https://msdn.microsoft.com/en-us/library/fwkeyyhe.aspx), these are the arguments that seem good candidates:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the first three sound like good candidates. I suggest the replacement should be applied to their argument -- we can extend (or shorten) that list later as we learn about new cases.
/Yu doesn't seem important right now since clcache doesn't support invocations making use of precompiled headers anyway.
d9b4ba3 to
cc20435
Compare
|
Summary of the changes in this PR:
I also noticed that the integration tests have a lot of duplicated code, this could be simplified if all of them inherited from a common base class. Anyway that is for another PR. Note: github shows the commits in a different order than they were applied. |
frerich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's looking promising - but I must admit that with some of the changes, the Why isn't clear to me. It would be great if you could be a bit more verbose in your commit logs such that they don't just consist of one sentence saying What was changed (the What is usually evident from the diff, the commits are very small) but also Why.
Thanks a lot though for not only improving the basedir feature but also adding tests!
| if not baseDir.endswith(os.path.sep): | ||
| baseDir += os.path.sep | ||
| if baseDir.endswith(os.path.sep): | ||
| baseDir = baseDir[0:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed to make the integration test TestBasedir.testBasedirIncludeArg pass. There the include path is generated by os.getcwd() which doesn't include the trailing separator. As normalizedBasedir is used in collapseBasedirToPlaceholder the version with the trailing slash would not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first wanted to insist on rather adjusting the test such that it appends a path separator (if needed) to the os.getcwd() return value instead of adjusting this function, but I guess if os.getcwd() does not return a trailing slash then that's a good indicator that we shouldn't do so either. So I'm okay with this. 👍
clcache.py
Outdated
| if baseDir: | ||
| replacement = re.compile(re.escape(baseDir), re.IGNORECASE) | ||
| return replacement.sub(BASEDIR_REPLACEMENT, path) | ||
| return path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you dropped the assert statements? They seem to express that the function expects that both the given path as well as the baseDir are normalized, i.e. have not only the same upper/lowercase but also forward slashes converted to backslashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I applied this change as a quick test and didn't give it too much thought. With the proposed implementation the assert path == os.path.normcase(path) does not hold. I can add back the other assert assert baseDir == os.path.normcase(baseDir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the assert path == os.path.normcase(path) not hold anymore? Ah, I guess it's because this function is not given paths anymore, but command line arguments, e.g. /IC:\Foo\Bar. That suggests that either the argument should be renamed (path does not fit anymore), or maybe the newly introduced collapseBasedirInCmd should not pass arguments to collapseBasedirToPlaceholder but rather the actual paths.
To extract the paths from the arguments, the code in CommandLineAnalyzer.parseArgumentsAndFiles could maybe get refactored to be more reusable (it already solves the issue of getting the values for command line arguments).
I guess for now, simply renaming the path argument to e.g. argument or maybe even just s would be fine, at least it's not misleading anymore.
| PYTHON_BINARY = sys.executable | ||
| CLCACHE_SCRIPT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clcache.py") | ||
| ASSETS_DIR = os.path.join("tests", "integrationtests") | ||
| ASSETS_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "tests", "integrationtests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suspicion, but I'm not quite sure - why did you change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed only to simplify the code in TestBasedir. The cwd is changed with cd() and the relative path to the assets wouldn't work. If I rearrange the code there a bit this change is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense! 👍
|
That is true, some changes look pretty random without an explanation. I will try to explain myself better. I can reply you in the PR this time, but just to be clear, in the future do you prefer to have the explanation in each of the commits? |
|
Thanks - yes, it's very much appreciated if the commits themselves explain the reasoning behind the change. This is very useful when debugging things later on, where you typically check the history to see why some changes were done -- and it might be in a time or place where GitHub is not even available, so all the things discussed here are inaccessible. I very much recommend this page for some good guidelines on how to write commit logs, all future contributors (and your future self :-)) will be grateful! :-) For what it's worth, it's very easy to update your commits with logs. With the command line client, you can run |
|
Thanks for the clarification. I will amend the commits once the review is done. |
This change makes easier to write integration tests that change the cmd.
cc20435 to
982b8f9
Compare
|
@frerich I rewrote the PR with your suggestions.
|
Show a few scenarios where using CLCACHE_BASEDIR should produce hits and misses.
In direct mode the command line arguments are hashed to compute the manifest hash. CLCACHE_BASEDIR should be used to unify arguments in order to produce hits between different invocations. This should only be applied to arguments that specify where to find the source code to the compiler or we risk creating hits for compiler tasks that generate different object files. For example, these two invocations should reuse the object in the cache: "cl.exe /IC:\Project1\include\lib1 main.cpp" with CLCACHE_BASEDIR=C:\Project1 "cl.exe /IC:\Project2\include\lib1 main.cpp" with CLCACHE_BASEDIR=C:\Project2 But these two should not reuse it: "cl.exe /IC:\Project1\include\lib1 /DFOLDER=C:\Project1\ main.cpp" with CLCACHE_BASEDIR=C:\Project1 "cl.exe /IC:\Project2\include\lib1 /DFOLDER=C:\Project2\ main.cpp" with CLCACHE_BASEDIR=C:\Project2 In this example the /D argument does not specify the compiler where to find the source code but it may specify a default path (for example).
The include paths in the command line may not include a trailing separator. normalizedBaseDir should not include a trailing separator in order to match.
982b8f9 to
ea01aef
Compare
|
|
||
| commandLineArgs = [] | ||
| projectSpecificArgs = ("AI", "I", "FU") | ||
| for k in sorted(arguments.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use sorted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the sorted is very important to have repeatable manifest hashes. As dictionaries in python are unordered the only way to guarantee that we generate the same hash every invocation is to iterate over the keys in the same order. It is the same case that #235 solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so this basically improves the manifest cache hit rate, i.e. it's an improvement but not directly related to the BASEDIR feature? Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is going to have that side effect but at same time it is needed for correctness since the implementation changed in this PR to use a dictionary.
|
Thanks for being so patient and explaining all the changes to me, I don't see any reason not to merge this nice improvement - thanks a lot! :-) |
|
I am happy to iterate and give explanations if things are not clear, the end result is better ;) Thanks for merging. |
CLCACHE_BASEDIR is not used to cleanup the command line when computing the manifest hash. This is a problem because invocations of clcache from different basedirs will create different manifests for equivalent object files.