Skip to content

Debugpy silently crashes because of Broken pipe or FileNotFoundError #1828

@jubueche

Description

@jubueche

Before creating a new issue, please check the FAQ to see if your question is answered there.

Environment data

  • debugpy version: 1.8.12
  • OS and version: Red Hat Enterprise Linux, 9.5 (Plow)
  • Python version (& distribution if applicable, e.g. Anaconda): 3.10.16, Anaconda
  • Using VS Code or Visual Studio: VS Code

Actual behavior

I am on a compute cluster that uses LSF and I launch an interactive job to get into a compute node. In that compute node, I start debugpy using
python -m debugpy --listen 0.0.0.0:1326 --wait-for-client -c "print('hello')"

And the serve is waiting for the client. However, when I try to connect to the server using VSCode, I get ECONNREFUSED. When I inspect the logs using python -m debugpy --log-to logs --listen 0.0.0.0:1326 --wait-for-client -c "print('hello')" I see the following:

debugpy.pydevd.2718046.log

0.32s - pydevd: Use libraries filter: False

0.00s - IDE_PROJECT_ROOTS []

0.00s - Collecting default library roots.
0.00s - LIBRARY_ROOTS ['/u/jub/.local/lib/python3.10/site-packages', '/u/jub/miniconda3/envs/torch/lib/python3.10', '/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages']

0.00s - Apply debug mode: debugpy-dap
0.00s - Preimport: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages;debugpy._vendored.force_pydevd
0.00s - Connecting to 127.0.0.1:47939
0.00s - Connected to: <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 43098), raddr=('127.0.0.1', 47939)>.
0.00s - Applying patching to hide pydevd threads (Py3 version).
0.01s - ReaderThread: empty contents received (len(line) == 0).
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (called from: File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 324, in _terminate_on_socket_close)
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (first call)
0.00s - PyDB.dispose_and_kill_all_pydevd_threads no commands being processed.
0.00s - PyDB.dispose_and_kill_all_pydevd_threads killing thread: <ReaderThread(pydevd.Reader, started daemon 22788828034624)>
0.00s - pydevd.Reader received kill signal
0.00s - PyDB.dispose_and_kill_all_pydevd_threads killing thread: <WriterThread(pydevd.Writer, started daemon 22788898952768)>
0.00s - sending cmd (http_json) -->             CMD_EXIT {"type": "event", "event": "terminated", "seq": 2, "body": {}, "pydevd_cmd_id": 129}

0.00s - pydevd.Writer received kill signal
0.00s - PyDB.dispose_and_kill_all_pydevd_threads waiting for pydb daemon threads to finish
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (called from: File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 432, in _on_run)
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (already disposed - wait)
0.10s - Successfully Loaded helper lib to set tracing to all threads.
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - __wait_for_threads_to_finish
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - dispose_and_kill_all_pydevd_threads
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py - _terminate_on_socket_close
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - __wait_for_threads_to_finish
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - dispose_and_kill_all_pydevd_threads
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - set_tracing_for_untraced_contexts
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - _locked_settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - _settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - listen
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - debug
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/public_api.py - wrapper
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - start_debugging
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - run_code
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - main
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py - <module>
0.00s - Set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py - _run_code
0.00s - Set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py - _run_module_as_main
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - _settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - listen
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - debug
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/public_api.py - wrapper
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - start_debugging
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - run_code
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - main
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py - <module>
0.40s - PyDB.dispose_and_kill_all_pydevd_threads: finished
0.00s - The following pydb threads may not have finished correctly: pydevd.CommandThread, pydevd.Writer
0.00s - PyDB.dispose_and_kill_all_pydevd_threads: finished
0.00s - ReaderThread: exit
Traceback (most recent call last):
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 422, in _on_run
    cmd.send(self.sock)
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command.py", line 109, in send
    sock.sendall(as_bytes)
BrokenPipeError: [Errno 32] Broken pipe
0.00s - WriterThread: exit

debugpy.server-2718046.log

I+00000.013: Linux-5.14.0-427.42.1.el9_4.x86_64-x86_64-with-glibc2.34 x86_64
             CPython 3.10.16 (64-bit)
             debugpy 1.8.12

I+00000.113: Initial environment:
             
             System paths:
                 sys.executable: /u/jub/miniconda3/envs/torch/bin/python(/u/jub/miniconda3/envs/torch/bin/python3.10)
                 sys.prefix: /u/jub/miniconda3/envs/torch
                 sys.base_prefix: /u/jub/miniconda3/envs/torch
                 sys.real_prefix: <missing>
                 site.getsitepackages(): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 site.getusersitepackages(): /u/jub/.local/lib/python3.10/site-packages
                 sys.path (site-packages): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('stdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('platstdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('purelib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('platlib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('include'): /u/jub/miniconda3/envs/torch/include/python3.10
                 sysconfig.get_path('scripts'): /u/jub/miniconda3/envs/torch/bin
                 sysconfig.get_path('data'): /u/jub/miniconda3/envs/torch
                 os.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/os.py
                 threading.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py
                 debugpy.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__init__.py
             
             Installed packages:
                 kiwisolver==1.4.7
                 tzdata==2024.2
                 Jinja2==3.1.4
                 torch==2.5.1
                 py-cpuinfo==9.0.0
                 filelock==3.16.1
                 multidict==6.1.0
                 pip==24.2
                 pluggy==0.13.1
                 cmake==3.31.2
                 tomlkit==0.13.2
                 pybind11==2.13.6
                 packaging==24.2
                 click==8.1.7
                 huggingface_hub==0.26.5
                 huggingface-hub==0.27.0
                 safetensors==0.4.5
                 peft==0.14.0
                 Brotli==1.0.9
                 networkx==3.2
                 sentry-sdk==2.19.2
                 PyYAML==6.0
                 mypy==0.991
                 mccabe==0.7.0
                 triton-nightly==3.0.0.post20240716052845
                 transformers==4.47.1
                 multiprocess==0.70.16
                 xformers==0.0.29.post1
                 MarkupSafe==2.1.1
                 pylint==2.15.7
                 torchaudio==2.5.1
                 async-timeout==5.0.1
                 annotated-types==0.7.0
                 Pillow==9.2.0
                 pyarrow==18.1.0
                 fast_hadamard_transform==1.0.4.post1
                 PySocks==1.7.1
                 mypy-extensions==1.0.0
                 aiosignal==1.3.2
                 hjson==3.1.0
                 fsspec==2024.9.0
                 fsspec==2024.12.0
                 setuptools==75.1.0
                 datasets==3.2.0
                 six==1.17.0
                 tqdm==4.67.1
                 typing_extensions==4.12.2
                 threadpoolctl==3.5.0
                 debugpy==1.8.12
                 smmap==5.0.1
                 ninja==1.11.1.3
                 frozenlist==1.5.0
                 scipy==1.14.1
                 scipy==1.8.1
                 gmpy2==2.1.2
                 pydantic==2.10.3
                 docker-pycreds==0.4.0
                 protobuf==5.29.2
                 pytest==6.2.4
                 aiohttp==3.11.10
                 gitdb==4.0.11
                 yarl==1.18.3
                 py==1.11.0
                 mpi4py==4.0.1
                 urllib3==2.2.3
                 propcache==0.2.1
                 wrapt==1.17.0
                 lazy-object-proxy==1.10.0
                 scikit-build==0.18.1
                 pycparser==2.22
                 cycler==0.12.1
                 distro==1.9.0
                 iniconfig==2.0.0
                 idna==3.10
                 h2==4.1.0
                 hyperframe==6.0.1
                 triton==3.1.0
                 tomli==2.2.1
                 cffi==1.15.0
                 types-dataclasses==0.6.6
                 wandb==0.19.1
                 fonttools==4.55.3
                 pycodestyle==2.10.0
                 wheel==0.44.0
                 accelerate==1.2.1
                 scikit-learn==1.6.0
                 attrs==24.3.0
                 psutil==6.1.0
                 zstandard==0.19.0
                 dill==0.3.8
                 setproctitle==1.3.4
                 black==24.3.0
                 requests==2.32.3
                 isort==5.13.2
                 mpmath==1.3.0
                 certifi==2024.12.14
                 pyparsing==3.2.0
                 hpack==4.0.0
                 pandas==2.2.3
                 tokenizers==0.21.0
                 regex==2024.11.6
                 pytz==2024.2
                 contourpy==1.3.1
                 pydantic_core==2.27.1
                 aiohappyeyeballs==2.4.4
                 pathspec==0.12.1
                 torchvision==0.20.1
                 astroid==2.13.5
                 GitPython==3.1.43
                 types-requests==2.26.3
                 matplotlib==3.10.0
                 platformdirs==4.3.6
                 parameterized==0.8.1
                 msgpack==1.1.0
                 python-dateutil==2.9.0.post0
                 toml==0.10.2
                 numpy==1.26.4
                 xxhash==3.5.0
                 joblib==1.4.2
                 charset-normalizer==3.4.0
                 colorama==0.4.6
                 sympy==1.13.1
                 aihwkit_lightning==0.0.1
                 sigmamoe==0.0
                 deepspeed==0.15.4+unknown
                 analoglora==0.0
                 analogmoe==0.0

I+00000.113: sys.argv before parsing: ['/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py', '--log-to', 'logs', '--listen', '0.0.0.0:1326', '--wait-for-client', '-c', "print('hello')"]
                      after parsing:  ['/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py']

D+00000.114: sys.argv after patching: ['-c']

D+00000.114: configure({'qt': 'none', 'subProcess': True}, {})

D+00000.114: listen(('0.0.0.0', 1326), **{})

I+00000.114: Initial debug configuration: {
                 "qt": "none",
                 "subProcess": true,
                 "python": "/u/jub/miniconda3/envs/torch/bin/python",
                 "pythonEnv": {}
             }

I+00000.114: Waiting for adapter endpoints on 127.0.0.1:37193...

I+00000.114: debugpy.listen() spawning adapter: [
                 "/u/jub/miniconda3/envs/torch/bin/python",
                 "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter",
                 "--for-server",
                 "37193",
                 "--host",
                 "0.0.0.0",
                 "--port",
                 "1326",
                 "--server-access-token",
                 "04ac658025d99f968fb21846b183284a1501cb1b3c8e54537b6a4bdd24772ce2",
                 "--log-dir",
                 "logs"
             ]

I+00000.283: Endpoints received from adapter: {
                 "client": {
                     "host": "0.0.0.0",
                     "port": 1326
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 47939
                 }
             }

I+00000.283: Adapter is accepting incoming client connections on 0.0.0.0:1326

D+00000.283: pydevd.settrace(*(), **{'host': '127.0.0.1', 'port': 47939, 'wait_for_ready_to_run': False, 'block_until_connected': True, 'access_token': '04ac658025d99f968fb21846b183284a1501cb1b3c8e54537b6a4bdd24772ce2', 'suspend': False, 'patch_multiprocessing': True, 'dont_trace_start_patterns': ('/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy',), 'dont_trace_end_patterns': ('debugpy_launcher.py',)})

I+00000.395: pydevd is connected to adapter at 127.0.0.1:47939

D+00000.395: wait_for_client()

debugpy.adapter-2718050.log

I+00000.013: Linux-5.14.0-427.42.1.el9_4.x86_64-x86_64-with-glibc2.34 x86_64
             CPython 3.10.16 (64-bit)
             debugpy 1.8.12

I+00000.127: debugpy.adapter startup environment:
             
             System paths:
                 sys.executable: /u/jub/miniconda3/envs/torch/bin/python(/u/jub/miniconda3/envs/torch/bin/python3.10)
                 sys.prefix: /u/jub/miniconda3/envs/torch
                 sys.base_prefix: /u/jub/miniconda3/envs/torch
                 sys.real_prefix: <missing>
                 site.getsitepackages(): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 site.getusersitepackages(): /u/jub/.local/lib/python3.10/site-packages
                 sys.path (site-packages): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('stdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('platstdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('purelib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('platlib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('include'): /u/jub/miniconda3/envs/torch/include/python3.10
                 sysconfig.get_path('scripts'): /u/jub/miniconda3/envs/torch/bin
                 sysconfig.get_path('data'): /u/jub/miniconda3/envs/torch
                 os.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/os.py
                 threading.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py
                 debugpy.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/../../debugpy/__init__.py(/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__init__.py)
             
             Installed packages:
                 kiwisolver==1.4.7
                 tzdata==2024.2
                 Jinja2==3.1.4
                 torch==2.5.1
                 py-cpuinfo==9.0.0
                 filelock==3.16.1
                 multidict==6.1.0
                 pip==24.2
                 pluggy==0.13.1
                 cmake==3.31.2
                 tomlkit==0.13.2
                 pybind11==2.13.6
                 packaging==24.2
                 click==8.1.7
                 huggingface_hub==0.26.5
                 huggingface-hub==0.27.0
                 safetensors==0.4.5
                 peft==0.14.0
                 Brotli==1.0.9
                 networkx==3.2
                 sentry-sdk==2.19.2
                 PyYAML==6.0
                 mypy==0.991
                 mccabe==0.7.0
                 triton-nightly==3.0.0.post20240716052845
                 transformers==4.47.1
                 multiprocess==0.70.16
                 xformers==0.0.29.post1
                 MarkupSafe==2.1.1
                 pylint==2.15.7
                 torchaudio==2.5.1
                 async-timeout==5.0.1
                 annotated-types==0.7.0
                 Pillow==9.2.0
                 pyarrow==18.1.0
                 fast_hadamard_transform==1.0.4.post1
                 PySocks==1.7.1
                 mypy-extensions==1.0.0
                 aiosignal==1.3.2
                 hjson==3.1.0
                 fsspec==2024.9.0
                 fsspec==2024.12.0
                 setuptools==75.1.0
                 datasets==3.2.0
                 six==1.17.0
                 tqdm==4.67.1
                 typing_extensions==4.12.2
                 threadpoolctl==3.5.0
                 debugpy==1.8.12
                 smmap==5.0.1
                 ninja==1.11.1.3
                 frozenlist==1.5.0
                 scipy==1.14.1
                 scipy==1.8.1
                 gmpy2==2.1.2
                 pydantic==2.10.3
                 docker-pycreds==0.4.0
                 protobuf==5.29.2
                 pytest==6.2.4
                 aiohttp==3.11.10
                 gitdb==4.0.11
                 yarl==1.18.3
                 py==1.11.0
                 mpi4py==4.0.1
                 urllib3==2.2.3
                 propcache==0.2.1
                 wrapt==1.17.0
                 lazy-object-proxy==1.10.0
                 scikit-build==0.18.1
                 pycparser==2.22
                 cycler==0.12.1
                 distro==1.9.0
                 iniconfig==2.0.0
                 idna==3.10
                 h2==4.1.0
                 hyperframe==6.0.1
                 triton==3.1.0
                 tomli==2.2.1
                 cffi==1.15.0
                 types-dataclasses==0.6.6
                 wandb==0.19.1
                 fonttools==4.55.3
                 pycodestyle==2.10.0
                 wheel==0.44.0
                 accelerate==1.2.1
                 scikit-learn==1.6.0
                 attrs==24.3.0
                 psutil==6.1.0
                 zstandard==0.19.0
                 dill==0.3.8
                 setproctitle==1.3.4
                 black==24.3.0
                 requests==2.32.3
                 isort==5.13.2
                 mpmath==1.3.0
                 certifi==2024.12.14
                 pyparsing==3.2.0
                 hpack==4.0.0
                 pandas==2.2.3
                 tokenizers==0.21.0
                 regex==2024.11.6
                 pytz==2024.2
                 contourpy==1.3.1
                 pydantic_core==2.27.1
                 aiohappyeyeballs==2.4.4
                 pathspec==0.12.1
                 torchvision==0.20.1
                 astroid==2.13.5
                 GitPython==3.1.43
                 types-requests==2.26.3
                 matplotlib==3.10.0
                 platformdirs==4.3.6
                 parameterized==0.8.1
                 msgpack==1.1.0
                 python-dateutil==2.9.0.post0
                 toml==0.10.2
                 numpy==1.26.4
                 xxhash==3.5.0
                 joblib==1.4.2
                 charset-normalizer==3.4.0
                 colorama==0.4.6
                 sympy==1.13.1
                 aihwkit_lightning==0.0.1
                 sigmamoe==0.0
                 deepspeed==0.15.4+unknown
                 analoglora==0.0
                 analogmoe==0.0

I+00000.128: Listening for incoming Client connections on 0.0.0.0:1326...

I+00000.128: Listening for incoming Server connections on 127.0.0.1:47939...

I+00000.129: Sending endpoints info to debug server at localhost:37193:
             {
                 "client": {
                     "host": "0.0.0.0",
                     "port": 1326
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 47939
                 }
             }

I+00000.129: Writing endpoints info to '/tmp/noConfigDebugAdapterEndpoints-368cbf5a634a2ec02ed2/debuggerAdapterEndpoint.txt':
             {
                 "client": {
                     "host": "0.0.0.0",
                     "port": 1326
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 47939
                 }
             }

E+00000.129: Error writing endpoints info to file:
             
             Traceback (most recent call last):
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 115, in main
                 with open(listener_file, "w") as f:
             FileNotFoundError: [Errno 2] No such file or directory: '/tmp/noConfigDebugAdapterEndpoints-368cbf5a634a2ec02ed2/debuggerAdapterEndpoint.txt'
             
             Stack where logged:
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
                 return _run_code(code, main_globals, None,
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 86, in _run_code
                 exec(code, run_globals)
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 233, in <module>
                 main(_parse_argv(sys.argv))
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 119, in main
                 log.reraise_exception("Error writing endpoints info to file:")
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/../../debugpy/common/log.py", line 222, in reraise_exception
                 _exception(format_string, *args, **kwargs)
             

I+00000.129: Not logging to "<stderr>" anymore.


Notably, I see two issues (and I don't know which one causes which or which one comes first etc.):

E+00000.129: Error writing endpoints info to file:
             
             Traceback (most recent call last):
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 115, in main
                 with open(listener_file, "w") as f:
             FileNotFoundError: [Errno 2] No such file or directory: '/tmp/noConfigDebugAdapterEndpoints-368cbf5a634a2ec02ed2/debuggerAdapterEndpoint.txt'
             
             Stack where logged:
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
                 return _run_code(code, main_globals, None,
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 86, in _run_code
                 exec(code, run_globals)
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 233, in <module>
                 main(_parse_argv(sys.argv))
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 119, in main
                 log.reraise_exception("Error writing endpoints info to file:")
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/../../debugpy/common/log.py", line 222, in reraise_exception
                 _exception(format_string, *args, **kwargs)

and

Traceback (most recent call last):
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 422, in _on_run
    cmd.send(self.sock)
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command.py", line 109, in send
    sock.sendall(as_bytes)
BrokenPipeError: [Errno 32] Broken pipe

Expected behavior

This was working before on the cluster and now it doesn't. Probably something in the cluster config was changed, but I would like to have some guidance on how to fix it/some understanding what could be going on.

Steps to reproduce:

I am trying to reproduce on a different cluster right now, but it might take a while as it is very busy.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions