-
Notifications
You must be signed in to change notification settings - Fork 83
BUG: fix source file duplication in command line #290
Conversation
Previously, if the /Tp or /Tc option were passed, clcache would duplicate the source file in the command line, making MSVC think that it was compiling two separate files (/Tcexample.c and example.c), when it was really compiling one. The problem with this was that MSVC only allows some options if it is compiling one file, which caused some invocations of clcache to fail. Closes #289.
|
Test failure seems unrelated. |
clcache.py
Outdated
|
|
||
| sourceFile = realCmdline[-1] | ||
| if '/Tc' + sourceFile in realCmdline or '/Tp' + sourceFile in realCmdline: | ||
| printTraceStatement("Removing last argument becuase of /Tc (Issue #289)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo here (becuase) and I guess /Tc is imprecise because it seems this could also be because of /Tp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to ignore this for now because it's not relevant to the discussion.
clcache.py
Outdated
| sourceFile = realCmdline[-1] | ||
| if '/Tc' + sourceFile in realCmdline or '/Tp' + sourceFile in realCmdline: | ||
| printTraceStatement("Removing last argument becuase of /Tc (Issue #289)") | ||
| realCmdline = realCmdline[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really safe to assume that the last argument is always a source file argument? I wonder whether maybe this fix should be done on the caller side - but it's not clear to me on first glance where clcache would implicitly add a source file argument to the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really safe to assume that the last argument is always a source file argument?
No. This is an example that fixes the problem but I had hoped that you would have a suggestion about a better fix.
I wonder whether maybe this fix should be done on the caller side
Yes. Do you have any pointers for where to look?
|
@frerich Found it (I think): https://github.com/xoviat/clcache/blob/f0227bb98a16a9804e2b71ef2a6d100c1c7b2719/clcache.py#L1633 Would it be okay to move the |
|
@xoviat Good catch, that place indeed looks like it might end up constructing incorrect command lines like the one you noticed. I think an However, maybe it would be even nicer to not use an |
|
Test failure is due to "too many local variables." Not sure what to do about that. |
|
The typical fix for the "too many local variables" error is, well, to not use as many local variables. :-) Instead of removing variables though, it's usually better to rather introduce a new function (i.e. the check is a kind of code complexity measure). However, I must admit that I like the older version with the
My hope was that it might be viable to change |
|
@frerich Before I fix up the unit tests, does that seem okay? |
frerich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your latest version is a lot better already! However, the formBaseCmdLine seems tricky enough to justify a few unit tests in unittests.py.
clcache.py
Outdated
|
|
||
| def formBaseCommandLine(cmdLine, sourceFiles): | ||
| # type: (List[str], List[Tuple[str, str]]) -> List[str] | ||
| setOfSources = set([sourceFile for sourceFile, _ in sourceFiles]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor cosmetic: you could shorten this by using a generator expression:
setOfSources = set(sourceFile for sourceFile, _ in sourceFiles)
clcache.py
Outdated
| def formBaseCommandLine(cmdLine, sourceFiles): | ||
| # type: (List[str], List[Tuple[str, str]]) -> List[str] | ||
| setOfSources = set([sourceFile for sourceFile, _ in sourceFiles]) | ||
| skippedArgs = ('/MP', '/Tc', '/Tp') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that -Tc and -Tp work, too. At least we support either form in CommandLineAnalyzer.analyze, and with Visual Studio 2013 (the only compiler I have at hand right now) you can indeed use either form.
clcache.py
Outdated
| baseCmdLine = [ | ||
| arg for arg in cmdLine | ||
| if not (arg in setOfSources or arg.startswith(skippedArgs)) | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list comprehension works for cases like /Tcfoo.c but it's also possible to have an optional(!) space, i.e. /Tc foo.c. In this case both elements should get removed (since just /Tc on its own is a command line error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, but in the second case, it would be in setOfSources.
clcache.py
Outdated
| jobs = [] | ||
| for srcFile, objFile in zip(sourceFiles, objectFiles): | ||
| jobCmdLine = baseCmdLine + [srcFile] | ||
| for srcFile, srcLanguage, objFile in zip(zip(*sourceFiles), objectFiles): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This double-zip is confusing to me. How about just
for (srcFile, srcLanguage), objFile in zip(sourceFiles, objectFiles):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
clcache.py
Outdated
| for srcFile, objFile in zip(sourceFiles, objectFiles): | ||
| jobCmdLine = baseCmdLine + [srcFile] | ||
| for srcFile, srcLanguage, objFile in zip(zip(*sourceFiles), objectFiles): | ||
| jobCmdLine = baseCmdLine + list(filter(None, [srcLanguage, srcFile])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since srcLanguage is either an empty string or one of /Tp or /Tc, couldn't you drop the composition of list and filter and just use
jobCmdLine = baseCmdLine + [srcLanguage + srcFile]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| if not (arg in setOfSources or arg.startswith("/MP")): | ||
| baseCmdLine.append(arg) | ||
| # type: (???, str, List[str], ???, List[Tuple[str, str]], List[str]) -> int | ||
| # Filter out all source files from the command line to form baseCmdLine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is misleading: the formBaseCommandLine function does not only filter out all source files, it also filters out /MP. However, I agree that the function should filter out just the source files because then it could get the more descriptive name filterSourceFiles and the /MP could be filtered on the caller side:
baseCmdLine = [arg for arg in filterSourceFiles(cmdLine, sourceFiles) if arg != '/MP']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
|
@frerich Third round of concerns addressed. |
|
Thanks for your persistence! 👍 I think by now, the patch looks plausible and conceptually consistent. I believe that after fixing up the unit tests (and maybe adding four, five small tests for |
unittests.py
Outdated
| def _testFull(self, cmdLine, expectedSourceFiles, expectedOutputFile): | ||
| sourceFiles, outputFile = CommandLineAnalyzer.analyze(cmdLine) | ||
| self.assertEqual(sourceFiles, expectedSourceFiles) | ||
| self.assertEqual([s for s, _ in sourceFiles], expectedSourceFiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be nicer to rather adjust the caller of _testFull such that expectedSourceFiles is actually a list of tuples, specifying the expected language (if any) specification for every source file?
|
Unfortunately, one of the integration tests failed. I'm not sure I understand this failure but the cache statistics may be omitting the fully specified source file arguments. I looked at |
|
In fact clcache appears not to reuse the cache files. I looked at this some more and unfortunately it appears to be a bit too dense for me. |
| compl = True | ||
|
|
||
| # Now collect the inputFiles into the return format | ||
| inputFiles = list(inputFiles.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that the failure in tests is introduced here because inputFiles.items() does not always return the items in the same order? either sorting them or using OrderedDict in line 1253 could help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it didn't fix the tests I think it still makes sense to keep the OrderedDict. The output of analyze() should return the same order across executions 1) to keep the tests passing and 2) to generate the same hash for the compilation task (if it is computed after, I am not sure about this). I don't know enough about this part of the code, to be checked with @frerich.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failures have nothing to do with the order, and if I understand correctly, the hash is calculated per-file ("This causes clcache to reinvoke itself recursively for each of the source files"), making the order irrelevant. There is no reason to tell Python to preserve the order when it doesn't matter.
|
@siu That still did not fix the problem. There are still zero cache hits on the concurrent test (ignore the stdout tests, they're not relevant). |
clcache.py
Outdated
| baseCmdLine.append(arg) | ||
| # type: (???, str, List[str], ???, List[Tuple[str, str]], List[str]) -> int | ||
| # Filter out all source files from the command line to form baseCmdLine | ||
| baseCmdLine = [arg for arg in filterSourceFiles(cmdLine, sourceFiles) if arg != '/MP'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arg != '/MP' test is what breaks the TestRunParallel.testHitsViaMpConcurrent test.
The test first compiles two source files individually using the command lines
/nologo /EHsc /c fibonacci01.cpp/nologo /EHsc /c fibonacci02.cpp
The test then tries to compile both files again via
/nologo /EHsc /c /MP2 fibonacci01.cpp fibonacci02.cpp
...expecting that this triggers two cache hits. This causes clcache to reinvoke itself recursively for each of the source files. However, due to the arg != '/MP' test, the recursive invocations still include the /MP2 flag, i.e. the two nested instances use the command lines
/nologo /EHsc /c /MP2 fibonacci01.cpp/nologo /EHsc /c /MP2 fibonacci02.cpp
Thus, the command line are different than they were when the files were originally compiled, hence the hash sum for the cache entry is different and hence there is no cache hit.
It seems this can be fixed by using not args.startswith('/MP') instead of arg != '/MP'.
The lists are modified to use the new tuple format
|
Okay, this should finally be ready to merge. |
|
Looks good, thanks! Just two questions which cam to my mind during a last review:
|
Done.
First, these type specifiers are the pet project of the BDFL himself (see mypy). I won't provide an in-depth explanation here, but the short version is that most developers spend time reading code to understand what is happening (as I had to do here to repair the bug, and you had to do to review my changes), and the biggest problem with Python is that you have to figure out what the types are of the function arguments to understand what is happening. Static typing solves that problem: it's not for computers, it's for people.
I'm going to disagree here. Of course, at the end of the day, it's your project, so if you insist, I will remove them. But what I will say is that the typing is done per-file, and it won't be useful until most of the file has it. That means that you need to build the typing over time (for example, the question marks |
As someone who greatly enjoys writing programs in Haskell in his free time and C++ for a living, I very much agree - the lack of static typing is one of my main griefs with Python. Thanks for mentioning mypy, I wasn't even aware of that effort - it sounds exciting! I'd love to learn how expressive it is. After looking at the mypy homepage, it seems to me that it's built on top of type hints as mentioned in PEP 483, PEP 484 and variable annotations as described in PEP 526. The typing module seems to implement this as of Python 3.5. Does that sounds accurate? If so, what do you think about omitting the comments you wrote from the PR for now and I'll instead raise the minimum supported Python version to 3.5 such that we can use 'real' type annotations as understood by mypy (and we can then also introduce mypy to the CI builds)? My impression is that the comments are a bit of a cludge due to the fact that we currently have to support Python 3.3. Or is there maybe a way to get type hints prior to Python 3.5? I greatly appreciate you contributing this type information, it just seems to me that 1.) the type hints are a bit 'out of scope' for this particular PR which attempts to address a very specific bug (#289) and 2.) my current understanding is that we could have 'real' type hints (not in a comment) which would require dropping support for Python < 3.5. What do you think? |
There's no need to drop support for Python < 3.5, as Python 3 in general supports static typing. There is a need to "fix" the package format (#288) before switching to the nicer inline type annotations, but removing the comments will just mean that someone has to re-discover the types the next time around. And the comments are real type annotations for Python 2 that are fully supported by mypy (just not the question marks: |
|
Ah, I just skimmed the mypy examples and noticed that mypy seems to recognise both comments as in
...so that would kinda answer my question: this |
|
The Fundamental building blocks section says that
...so instead of |
|
Will do. |
|
For the record, there's a dedicated discussion of type comments as one way to articulate type hints. |
|
Updated. |
|
Test failure appears unrelated. |
Previously, if the /Tp or /Tc option were passed, clcache would
duplicate the source file in the command line, making MSVC
think that it was compiling two separate files (/Tcexample.c
and example.c), when it was really compiling one.
The problem with this was that MSVC only allows some
options if it is compiling one file, which caused some
invocations of clcache to fail.
Closes #289.