Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@siu
Copy link
Contributor

@siu siu commented Oct 12, 2017

Since #286 was merged the test integrationtests.TestNoDirectCalls.testOutput has failed randomly on Python 3.3.

This is a WIP branch to investigate the issue.

@frerich
Copy link
Owner

frerich commented Oct 12, 2017

Thanks for your effort! I'm curious to see how this turns out in the end...

@siu
Copy link
Contributor Author

siu commented Oct 12, 2017

It failed. Once again only in Python 3.3.

It could be related to https://bugs.python.org/issue19575, more: https://www.python.org/dev/peps/pep-0446/.

I don't know what to think, neither what to test next. I don't see anything wrong in the code.

@frerich
Copy link
Owner

frerich commented Oct 13, 2017

Based on PRB: Child Inherits Unintended Handles During CreateProcess Call my reading is that this is an issue with how Python 3.3 launches processes. The desired sequence of events when calling invokeRealCompiler is this:

  1. Thread 1 creates pipes for stdout/stderr
  2. Thread 1 creates a new subprocess
  3. Thread 2 creates pipes for stdout/stderr
  4. Thread 2 creates a new subprocess

...and then clcache waits for all subprocesses in parallel. However, there doesn't seem to be a lock around the section which creates the pipes and then creates the process. As a consequence, this sequence is possible:

  1. Thread 1 creates pipes for stdout/stderr
  2. Thread 2 creates pipes for stdout/stderr
  3. Thread 1 creates a new subprocess
  4. Thread 2 creates a new subprocess

The consequence of this is that the process launched by 'Thread 1' will inherit the pipe handles which were meant to be used by the new process launched in 'Thread 2'. Hence, when the process launched by 'Thread 2' terminates but the process launched by 'Thread 1' is still running, clcache cannot close the handles (or delete the files) related to 'Thread 2' since 'Thread 1' still has them open.

Based on my reading of the Popen constructor, I suspect it might help to

  1. Create the files to pipe into such that the handles are not inheritable (there's some suggested code which should do this) and
  2. Synchronise all calls to the Popen constructor call using a common lock

A comment mentioned that these two steps solve the problem with Python 3.3 (though the proposed way to clear the 'inheritable' flag on the file descriptors depends on win32api).

@frerich
Copy link
Owner

frerich commented Oct 13, 2017

Maybe something like this works (untested, I don't have a Windows machine at hand right now):

diff --git a/clcache.py b/clcache.py
index 860c28d..333c406 100644
--- a/clcache.py
+++ b/clcache.py
@@ -17,6 +17,7 @@ import errno
 import hashlib
 import json
 import multiprocessing
+import msvcrt
 import os
 import pickle
 import re
@@ -31,6 +32,7 @@ VERSION = "4.1.0-dev"
 HashAlgorithm = hashlib.md5
 
 OUTPUT_LOCK = threading.Lock()
+PROCESS_LOCK = threading.Lock()
 
 # try to use os.scandir or scandir.scandir
 # fall back to os.listdir if not found
@@ -67,6 +69,7 @@ BASEDIR_REPLACEMENT = '?'
 # Define some Win32 API constants here to avoid dependency on win32pipe
 NMPWAIT_WAIT_FOREVER = wintypes.DWORD(0xFFFFFFFF)
 ERROR_PIPE_BUSY = 231
+HANDLE_FLAG_INHERIT = 1
 
 # ManifestEntry: an entry in a manifest file
 # `includeFiles`: list of paths to include files, which this source file uses
@@ -1324,13 +1327,25 @@ def invokeRealCompiler(compilerBinary, cmdLine, captureOutput=False, outputAsStr
     if captureOutput:
         # Don't use subprocess.communicate() here, it's slow due to internal
         # threading.
-        with TemporaryFile() as stdoutFile, TemporaryFile() as stderrFile:
+        with PROCESS_LOCK:
+            stdoutFile = TemporaryFile()
+            stderrFile = TemporaryFile()
+
+            # Make temporary file handles non-inheritable
+            windll.kernel32.SetHandleInformation(msvcrt.get_osfhandle(stdoutFile.fileno()), HANDLE_FLAG_INHERIT, 0)
+            windll.kernel32.SetHandleInformation(msvcrt.get_osfhandle(stderrFile.fileno()), HANDLE_FLAG_INHERIT, 0)
+
             compilerProcess = subprocess.Popen(realCmdline, stdout=stdoutFile, stderr=stderrFile, env=environment)
+
+        try:
             returnCode = compilerProcess.wait()
             stdoutFile.seek(0)
             stdout = stdoutFile.read()
             stderrFile.seek(0)
             stderr = stderrFile.read()
+        finally:
+            stdoutFile.close()
+            stderrFile.close()
     else:
         returnCode = subprocess.call(realCmdline, env=environment)
 

@siu
Copy link
Contributor Author

siu commented Oct 13, 2017

Great, thanks for deciphering the problem ;)

Before going down this route, would you consider rising the minimum python version to 3.4?

@frerich
Copy link
Owner

frerich commented Oct 13, 2017

Yes, indeed - I briefly considered dropping support for Python 3.4. It's not an easy call to make though, so I hoped that the workaround for Python 3.3 is straightforward enough to allow me to weasel out of making that decision. :-]

If a patch along the lines of what I wrote doesn't help, i.e. we can't make this work with reasonable effort in a reasonable amount of time, I guess I'll just throw the towel and drop support for Python 3.3. The alternative would be to revert all the changes related to {{os.replace}}, i.e. have more invasive locking again - but I think that would be a shame, and it keeps a risk which is just waiting to manifest as a problem in the future.

clcache.py Outdated
stderrFile.seek(0)
stderr = stderrFile.read()

try:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of this try defeats the purpose of compiling in parallel: as it is, waiting for the real compiler is done while holding PROCESS_LOCK, so the compilers will be launched one after the other. I think you mean to de-indent this by one level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, that was a mistake.

This code should not run under PROCESS_LOCK.
@siu
Copy link
Contributor Author

siu commented Oct 16, 2017

It failed with this stack trace:

Traceback (most recent call last):
  File "C:\projects\clcache\clcache.py", line 1790, in ensureArtifactsExist
    cleanupRequired = addObjectToCache(stats, cache, cachekey, artifacts)
  File "C:\projects\clcache\clcache.py", line 1498, in addObjectToCache
    cache.setEntry(cachekey, artifacts)
  File "C:\projects\clcache\clcache.py", line 638, in setEntry
    self.strategy.setEntry(key, value)
  File "C:\projects\clcache\clcache.py", line 549, in setEntry
    self.compilerArtifactsRepository.section(key).setEntry(key, value)
  File "C:\projects\clcache\clcache.py", line 400, in setEntry
    os.path.join(tempEntryDir, CompilerArtifactsSection.OBJECT_FILE))
  File "C:\projects\clcache\clcache.py", line 995, in copyOrLink
    os.replace(tempDst, dstFilePath)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'c:\\users\\appveyor\\appdata\\local\\temp\\1\\tmpvtzqn0\\objects\\77\\772a2f872fde8867e15356381d42bf1e.new\\object'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\projects\clcache\clcache.py", line 1801, in <module>
    sys.exit(main())
  File "C:\projects\clcache\clcache.py", line 1600, in main
    return processCompileRequest(cache, compiler, sys.argv)
  File "C:\projects\clcache\clcache.py", line 1627, in processCompileRequest
    return scheduleJobs(cache, compiler, cmdLine, environment, sourceFiles, objectFiles)
  File "C:\projects\clcache\clcache.py", line 1680, in scheduleJobs
    exitCode, out, err, doCleanup = future.result()
  File "c:\python33\lib\concurrent\futures\_base.py", line 394, in result
    return self.__get_result()
  File "c:\python33\lib\concurrent\futures\_base.py", line 353, in __get_result
    raise self._exception
  File "c:\python33\lib\concurrent\futures\thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\projects\clcache\clcache.py", line 1699, in processSingleSource
    return processNoDirect(cache, objectFile, compiler, cmdLine, environment)
  File "C:\projects\clcache\clcache.py", line 1777, in processNoDirect
    objectFile, compilerResult)
  File "C:\projects\clcache\clcache.py", line 1790, in ensureArtifactsExist
    cleanupRequired = addObjectToCache(stats, cache, cachekey, artifacts)
  File "C:\projects\clcache\clcache.py", line 756, in __exit__
    self._stats.save()
  File "C:\projects\clcache\clcache.py", line 666, in save
    json.dump(self._dict, f, sort_keys=True, indent=4)
  File "c:\python33\lib\contextlib.py", line 55, in __exit__
    next(self.gen)
  File "C:\projects\clcache\clcache.py", line 127, in atomicWrite
    os.replace(tempFileName, fileName)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'c:\\users\\appveyor\\appdata\\local\\temp\\1\\tmpvtzqn0\\stats.txt'

@frerich
Copy link
Owner

frerich commented Oct 16, 2017

I fear this means that the proposed workaround didn't actually work around the issue. So maybe we're left with dropping support for Python 3.3. :-/

@siu
Copy link
Contributor Author

siu commented Oct 16, 2017

If I understood well, all file descriptors need to be closed in the child process, there is an argument close_fds in Popen, but it still has this problem https://docs.python.org/3/library/subprocess.html#popen-constructor:

If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. (POSIX only). The default varies by platform: Always true on POSIX. On Windows it is true when stdin/stdout/stderr are None, false otherwise. On Windows, if close_fds is true then no handles will be inherited by the child process. Note that on Windows, you cannot set close_fds to true and also redirect the standard handles by setting stdin, stdout or stderr.

Changed in version 3.2: The default for close_fds was changed from False to what is described above.

I think it would be more reasonable to drop support for python 3.3, if possible.

@frerich
Copy link
Owner

frerich commented Oct 29, 2017

I agree; there does not seem to be an easy way to make this work correctly with Python 3.3. With Python 3.6 being out, we would still support the three most recent minor versions, which is not too bad.

So let's close this PR and rather file a new one which removes the Python 3.3 testing in AppVeyor (and mentions the change in backwards compatibility in the ChangeLog).

@frerich frerich closed this Oct 29, 2017
frerich pushed a commit that referenced this pull request Oct 29, 2017
There's a pending patch for clcache which improves behaviour when
clcache is run concurrently; however, this currently often (!) breaks
the CI tests with Python 3.3 due to a known bug in how the Python
subprocess module handles reading stdout/stderr. The Python bug report

  https://bugs.python.org/issue19575

has more on this topic. Alas, an inital workaround as posted on

  #294 (comment)

didn't help.

Since we already support Python 3.6 (i.e. we support the three most
recent minor versions), let's just drop Python 3.3 support.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants