Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Apr 10, 2021

Bumps pytorch-lightning from 1.0.3 to 1.2.7.

Release notes

Sourced from pytorch-lightning's releases.

Standard weekly patch release

[1.2.7] - 2021-04-06

Fixed

  • Fixed resolve a bug with omegaconf and xm.save (#6741)
  • Fixed an issue with IterableDataset when len is not defined (#6828)
  • Sanitize None params during pruning (#6836)
  • Enforce an epoch scheduler interval when using SWA (#6588)
  • Fixed TPU Colab hang issue, post training (#6816])
  • Fixed a bug where TensorBoardLogger would give a warning and not log correctly to a symbolic link save_dir (#6730)

Contributors

@​awaelchli, @​ethanwharris, @​karthikprasad, @​kaushikb11, @​mibaumgartner, @​tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

[1.2.6] - 2021-03-30

Changed

  • Changed the behavior of on_epoch_start to run at the beginning of validation & test epoch (#6498)

Removed

  • Removed legacy code to include step dictionary returns in callback_metrics. Use self.log_dict instead. (#6682)

Fixed

  • Fixed DummyLogger.log_hyperparams raising a TypeError when running with fast_dev_run=True (#6398)
  • Fixed error on TPUs when there was no ModelCheckpoint (#6654)
  • Fixed trainer.test freeze on TPUs (#6654)
  • Fixed a bug where gradients were disabled after calling Trainer.predict (#6657)
  • Fixed bug where no TPUs were detected in a TPU pod env (#6719)

Contributors

@​awaelchli, @​carmocca, @​ethanwharris, @​kaushikb11, @​rohitgr7, @​tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Weekly patch release - torchmetrics compatibility

[1.2.5] - 2021-03-23

Changed

... (truncated)

Changelog

Sourced from pytorch-lightning's changelog.

[1.2.7] - 2021-04-06

Fixed

  • Fixed resolve a bug with omegaconf and xm.save (#6741)
  • Fixed an issue with IterableDataset when len is not defined (#6828)
  • Sanitize None params during pruning (#6836)
  • Enforce an epoch scheduler interval when using SWA (#6588)
  • Fixed TPU Colab hang issue, post training (#6816)
  • Fixed a bug where TensorBoardLogger would give a warning and not log correctly to a symbolic link save_dir (#6730)

[1.2.6] - 2021-03-30

Changed

  • Changed the behavior of on_epoch_start to run at the beginning of validation & test epoch (#6498)

Removed

  • Removed legacy code to include step dictionary returns in callback_metrics. Use self.log_dict instead. (#6682)

Fixed

  • Fixed DummyLogger.log_hyperparams raising a TypeError when running with fast_dev_run=True (#6398)
  • Fixed error on TPUs when there was no ModelCheckpoint (#6654)
  • Fixed trainer.test freeze on TPUs (#6654)
  • Fixed a bug where gradients were disabled after calling Trainer.predict (#6657)
  • Fixed bug where no TPUs were detected in a TPU pod env (#6719)

[1.2.5] - 2021-03-23

Changed

  • Update Gradient Clipping for the TPU Accelerator (#6576)
  • Refactored setup for typing friendly (#6590)

Fixed

  • Fixed a bug where all_gather would not work correctly with tpu_cores=8 (#6587)
  • Fixed comparing required versions (#6434)
  • Fixed duplicate logs appearing in console when using the python logging module (#6275)
  • Added Autocast in validation, test and predict modes for Native AMP (#6565)

[1.2.4] - 2021-03-16

Changed

... (truncated)

Commits
  • f5f4f03 Fix TPU tests for checkpoint
  • 123e20d Update Changelog & version
  • e6733ad Fixed missing arguments in lr_find call (#6784)
  • b0ab0bf Fix support for symlink save_dir in TensorBoardLogger (#6730)
  • c7422d4 [Fix] TPU Training Type Plugin (#6816)
  • edf6289 Fix DPP + SyncBN (#6838)
  • 9e5d84d Enforce an epoch scheduler interval when using SWA (#6588)
  • bb4fd7e Fix unfreeze_and_add_param_group expects modules rather than module (#6...
  • 52007b6 Sanitize None params during pruning (#6836)
  • b023d74 fix boolean check on iterable dataset when len not defined (#6828)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Apr 10, 2021
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Apr 16, 2021

Superseded by #16.

@dependabot dependabot bot closed this Apr 16, 2021
@dependabot dependabot bot deleted the dependabot/pip/python/requirements/pytorch-lightning-1.2.7 branch April 16, 2021 22:46
ijrsvt pushed a commit that referenced this pull request Aug 16, 2022
We encountered SIGSEGV when running Python test `python/ray/tests/test_failure_2.py::test_list_named_actors_timeout`. The stack is:

```
#0  0x00007fffed30f393 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
   from /lib64/libstdc++.so.6
#1  0x00007fffee707649 in ray::RayLog::GetLoggerName() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#2  0x00007fffee70aa90 in ray::SpdLogMessage::Flush() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#3  0x00007fffee70af28 in ray::RayLog::~RayLog() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#4  0x00007fffee2b570d in ray::asio::testing::(anonymous namespace)::DelayManager::Init() [clone .constprop.0] ()
   from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#5  0x00007fffedd0d95a in _GLOBAL__sub_I_asio_chaos.cc () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#6  0x00007ffff7fe282a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7fe2931 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7fe674c in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7b82e79 in _dl_catch_exception () from /lib64/libc.so.6
#10 0x00007ffff7fe5ffe in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff7d5f39c in dlopen_doit () from /lib64/libdl.so.2
#12 0x00007ffff7b82e79 in _dl_catch_exception () from /lib64/libc.so.6
#13 0x00007ffff7b82f13 in _dl_catch_error () from /lib64/libc.so.6
#14 0x00007ffff7d5fb09 in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff7d5f42a in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#16 0x00007fffef04d330 in py_dl_open (self=<optimized out>, args=<optimized out>)
    at /tmp/python-build.20220507135524.257789/Python-3.7.11/Modules/_ctypes/callproc.c:1369
```

The root cause is that when loading `_raylet.so`, `static DelayManager _delay_manager` is initialized and `RAY_LOG(ERROR) << "RAY_testing_asio_delay_us is set to " << delay_env;` is executed. However, the static variables declared in `logging.cc` are not initialized yet (in this case, `std::string RayLog::logger_name_ = "ray_log_sink"`).

It's better not to rely on the initialization order of static variables in different compilation units because it's not guaranteed. I propose to change all `RAY_LOG`s to `std::cerr` in `DelayManager::Init()`.

The crash happens in Ant's internal codebase. Not sure why this test case passes in the community version though.

BTW, I've tried different approaches:

1. Using a static local variable in `get_delay_us` and remove the global variable. This doesn't work because `init()` needs to access the variable as well.
2. Defining the global variable as type `std::unique_ptr<DelayManager>` and initialize it in `get_delay_us`. This works but it requires a lock to be thread-safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant