-
Notifications
You must be signed in to change notification settings - Fork 83
[WIP] Investigate why os.replace fails in processNoDirect #294
Conversation
|
Thanks for your effort! I'm curious to see how this turns out in the end... |
|
It failed. Once again only in Python 3.3. It could be related to https://bugs.python.org/issue19575, more: https://www.python.org/dev/peps/pep-0446/. I don't know what to think, neither what to test next. I don't see anything wrong in the code. |
|
Based on PRB: Child Inherits Unintended Handles During CreateProcess Call my reading is that this is an issue with how Python 3.3 launches processes. The desired sequence of events when calling
...and then clcache waits for all subprocesses in parallel. However, there doesn't seem to be a lock around the section which creates the pipes and then creates the process. As a consequence, this sequence is possible:
The consequence of this is that the process launched by 'Thread 1' will inherit the pipe handles which were meant to be used by the new process launched in 'Thread 2'. Hence, when the process launched by 'Thread 2' terminates but the process launched by 'Thread 1' is still running, clcache cannot close the handles (or delete the files) related to 'Thread 2' since 'Thread 1' still has them open. Based on my reading of the Popen constructor, I suspect it might help to
A comment mentioned that these two steps solve the problem with Python 3.3 (though the proposed way to clear the 'inheritable' flag on the file descriptors depends on |
|
Maybe something like this works (untested, I don't have a Windows machine at hand right now): diff --git a/clcache.py b/clcache.py
index 860c28d..333c406 100644
--- a/clcache.py
+++ b/clcache.py
@@ -17,6 +17,7 @@ import errno
import hashlib
import json
import multiprocessing
+import msvcrt
import os
import pickle
import re
@@ -31,6 +32,7 @@ VERSION = "4.1.0-dev"
HashAlgorithm = hashlib.md5
OUTPUT_LOCK = threading.Lock()
+PROCESS_LOCK = threading.Lock()
# try to use os.scandir or scandir.scandir
# fall back to os.listdir if not found
@@ -67,6 +69,7 @@ BASEDIR_REPLACEMENT = '?'
# Define some Win32 API constants here to avoid dependency on win32pipe
NMPWAIT_WAIT_FOREVER = wintypes.DWORD(0xFFFFFFFF)
ERROR_PIPE_BUSY = 231
+HANDLE_FLAG_INHERIT = 1
# ManifestEntry: an entry in a manifest file
# `includeFiles`: list of paths to include files, which this source file uses
@@ -1324,13 +1327,25 @@ def invokeRealCompiler(compilerBinary, cmdLine, captureOutput=False, outputAsStr
if captureOutput:
# Don't use subprocess.communicate() here, it's slow due to internal
# threading.
- with TemporaryFile() as stdoutFile, TemporaryFile() as stderrFile:
+ with PROCESS_LOCK:
+ stdoutFile = TemporaryFile()
+ stderrFile = TemporaryFile()
+
+ # Make temporary file handles non-inheritable
+ windll.kernel32.SetHandleInformation(msvcrt.get_osfhandle(stdoutFile.fileno()), HANDLE_FLAG_INHERIT, 0)
+ windll.kernel32.SetHandleInformation(msvcrt.get_osfhandle(stderrFile.fileno()), HANDLE_FLAG_INHERIT, 0)
+
compilerProcess = subprocess.Popen(realCmdline, stdout=stdoutFile, stderr=stderrFile, env=environment)
+
+ try:
returnCode = compilerProcess.wait()
stdoutFile.seek(0)
stdout = stdoutFile.read()
stderrFile.seek(0)
stderr = stderrFile.read()
+ finally:
+ stdoutFile.close()
+ stderrFile.close()
else:
returnCode = subprocess.call(realCmdline, env=environment)
|
|
Great, thanks for deciphering the problem ;) Before going down this route, would you consider rising the minimum python version to 3.4? |
|
Yes, indeed - I briefly considered dropping support for Python 3.4. It's not an easy call to make though, so I hoped that the workaround for Python 3.3 is straightforward enough to allow me to weasel out of making that decision. :-] If a patch along the lines of what I wrote doesn't help, i.e. we can't make this work with reasonable effort in a reasonable amount of time, I guess I'll just throw the towel and drop support for Python 3.3. The alternative would be to revert all the changes related to {{os.replace}}, i.e. have more invasive locking again - but I think that would be a shame, and it keeps a risk which is just waiting to manifest as a problem in the future. |
clcache.py
Outdated
| stderrFile.seek(0) | ||
| stderr = stderrFile.read() | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation of this try defeats the purpose of compiling in parallel: as it is, waiting for the real compiler is done while holding PROCESS_LOCK, so the compilers will be launched one after the other. I think you mean to de-indent this by one level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, that was a mistake.
This code should not run under PROCESS_LOCK.
|
It failed with this stack trace: |
|
I fear this means that the proposed workaround didn't actually work around the issue. So maybe we're left with dropping support for Python 3.3. :-/ |
|
If I understood well, all file descriptors need to be closed in the child process, there is an argument close_fds in Popen, but it still has this problem https://docs.python.org/3/library/subprocess.html#popen-constructor:
I think it would be more reasonable to drop support for python 3.3, if possible. |
|
I agree; there does not seem to be an easy way to make this work correctly with Python 3.3. With Python 3.6 being out, we would still support the three most recent minor versions, which is not too bad. So let's close this PR and rather file a new one which removes the Python 3.3 testing in AppVeyor (and mentions the change in backwards compatibility in the ChangeLog). |
There's a pending patch for clcache which improves behaviour when clcache is run concurrently; however, this currently often (!) breaks the CI tests with Python 3.3 due to a known bug in how the Python subprocess module handles reading stdout/stderr. The Python bug report https://bugs.python.org/issue19575 has more on this topic. Alas, an inital workaround as posted on #294 (comment) didn't help. Since we already support Python 3.6 (i.e. we support the three most recent minor versions), let's just drop Python 3.3 support.
Since #286 was merged the test integrationtests.TestNoDirectCalls.testOutput has failed randomly on Python 3.3.
This is a WIP branch to investigate the issue.