-
Notifications
You must be signed in to change notification settings - Fork 275
Fix overflow caused by default spin timeout #1563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
To give a bit more context, in a nutshell it seems that passing |
fujitatomoya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with green CI.
|
Pulls: #1563 |
|
I think most of the CI failures are unrelated right? Regarding the failing test on windows: Also while being related to the changes in this PR, these changes should not modify the behavior when an explicit timeout is given for the executor, which is the case here. So it is kind of interesting that this fails. |
|
A quick minimal test without ROS on Windows 11 and Python 3.12 shows that the logic used in the test is flawed on Windows: but it is not strictly an issue of the |
Slightly unrelated, but the 10/16 ms scheduler resolution on Windows can be avoided by calling
|
But this is only possible in the underlying C implementation right? |
|
Using |
That is where perhaps it starts being OT, but one can write a small class with constructors and destructors like: and wrap it in a Python bindings, so that it can be called from Python tests. Alternatively, something similar can also be done directly via ctypes. Again, this is probably out of scope here, I just want to mention it for search engines and llm to find this. :) |
Cool, we should make an issue for that. For now I made an additional PR that uses |
|
@Mergifyio rebase |
Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de>
✅ Branch has been successfully rebased |
88f355e to
7fe657d
Compare
|
Is it possible to backport this change to jazzy and kilted as well? |
|
@Flova ah yeah, sorry. I have one question here about backport for jazzy and kilted before doing that. i am almost sure that, this is C++ implementation behind python |
|
Alright. We had the same discussion in PR RoboStack/ros-jazzy#137 for the RoboStack conda packages, which already include this bugfix as a patch file. There were also some concerns regarding ABI compatibility and we came to the conclusion that it private header, which is not installed, so it should be fine. In addition to that there is also the possibility to fix this issue in a less clean way, by just adding a "magic value" that doesn't overflow as the time point. This has it's own drawbacks, but would preserve the ABI. I would still favor the backport of this PR instead of a custom solution for these distros. |
|
As nobody links against rclpy ABI compability is not an issue. |
|
@Flova @traversaro @jmachowinski thanks for the feedback! Let's do that then! |
|
@Mergifyio backport kilted jazzy |
✅ Backports have been createdDetails
|
Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de> (cherry picked from commit eddfdb6)
Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de> (cherry picked from commit eddfdb6)
Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. (cherry picked from commit eddfdb6) Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Co-authored-by: Florian Vahl <git@flova.de>
Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. (cherry picked from commit eddfdb6) Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Co-authored-by: Florian Vahl <git@flova.de>
Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
) * Fix warnings from gcc. (#1501) Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Update type_support to use new abcs Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Cleanup old test cases to use new automatic inference Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Add content-filtered-topic interfaces (#1506) Signed-off-by: Barry Xu <Barry.Xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Added lock to protect futures for multithreaded executor (#1477) Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * EventsExecutor: Handle async callbacks for services and subscriptions (#1478) Closes #1473 Signed-off-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * add spinning state for the Executor classes. (#1510) Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fixes Action.*_async futures never complete (#1308) Per rclpy:1123 If two seperate client server actions are running in seperate executors the future given to the ActionClient will never complete due to a race condition This fixes the calls to rcl handles potentially leading to deadlock scenarios by adding locks to there references Co-authored-by: Aditya Agarwal <aditya.kgp25@gmail.com> Co-authored-by: Jonathan Blixt <jmblixt3@gmail.com> Signed-off-by: Jonathan Blixt <jmblixt3@gmail.com> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * remove unused 'param_type' (#1524) 'param_type' is set but never used Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Changelog Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.1 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove duplicate future handling from send_goal_async (#1532) A recent change intended to move this logic into a lock context, but actually ended up duplicating it instead. This fixes that by removing the duplicated logic outside of the lock. It also preserves the explicit typing annotation on the future. Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * fix(test_events_executor): destroy all nodes before shutdown (#1538) Signed-off-by: yuanyuyuan <az6980522@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * add BaseImpl Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Add ImplT Support Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * fix changelong Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove accidental tuple (#1542) Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Allow action servers without execute callback (#1219) Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl> * add : get clients, servers info (#1307) Signed-off-by: Minju, Lee <dlalswn531@naver.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.2 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * update tests Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * ParameterEventHandler support ContentFiltering (#1531) * ParameterEventHandler support ContentFiltering Signed-off-by: Barry Xu <barry.xu@sony.com> * Address review comments Signed-off-by: Barry Xu <barry.xu@sony.com> --------- Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fix issues with resuming async tasks awaiting a future (#1469) Signed-off-by: Błażej Sowa <bsowa123@gmail.com> Signed-off-by: Nadav Elkabets <elnadav12@gmail.com> Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.3 Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Increase clock accuracy (#1564) Signed-off-by: Florian Vahl <git@flova.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use unconditional wait when possible. (#1563) Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove default from switch with enum, so that compiler warns. (#1566) Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fix parameter parsing for unspecified target nodes (#1552) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Improve the compatibility of processing YAML parameter files (#1548) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Improve wildcard parsing and optimize the logic for parsing YAML para… (#1571) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Expose action graph functions as Node class methods. (#1574) * Expose action graph functions as Node class methods. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> * address review comments to keep the warning consistent. Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> --------- Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> * Fix performance bug in MultiThreadedExecutor (hopefully) (#1547) Signed-off-by: Michael Tandy <git@mjt.me.uk> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Changelog Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.4 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * use Msg over BaseMessage Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use Srv over BaseService Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use Action over BaseAction Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * lint Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Update rclpy/rclpy/type_support.py Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> --------- Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> Signed-off-by: Barry Xu <Barry.Xu@sony.com> Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Brad Martin <bmartin@fatlxception.org> Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com> Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Signed-off-by: yuanyuyuan <az6980522@gmail.com> Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl> Signed-off-by: Minju, Lee <dlalswn531@naver.com> Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Błażej Sowa <bsowa123@gmail.com> Signed-off-by: Nadav Elkabets <elnadav12@gmail.com> Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai> Signed-off-by: Florian Vahl <git@flova.de> Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Michael Tandy <git@mjt.me.uk> Co-authored-by: Chris Lalancette <clalancette@gmail.com> Co-authored-by: Barry Xu <barry.xu@sony.com> Co-authored-by: Brennan Miller-Klugman <55165406+brennanmk@users.noreply.github.com> Co-authored-by: Brad Martin <52003535+bmartin427@users.noreply.github.com> Co-authored-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Co-authored-by: Jonathan <jmblixt3@gmail.com> Co-authored-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Co-authored-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Co-authored-by: Yuyuan Yuan <az6980522@gmail.com> Co-authored-by: Tim Clephas <tim.clephas@nobleo.nl> Co-authored-by: Minju, Lee <70446214+leeminju531@users.noreply.github.com> Co-authored-by: Błażej Sowa <bsowa123@gmail.com> Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com> Co-authored-by: Michael Carroll <mjcarroll@intrinsic.ai> Co-authored-by: Florian Vahl <git@flova.de> Co-authored-by: Michael Tandy <git@mjt.me.uk> Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Description
Previously, the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying
pthread_cond_timedwaitcall, which waits for the conditional variable in the events queue implementation. Consequently, this lead to undefined behavior and freezes in the executor. Reducing the deadline significantly helped, but usingcv.waitinstead ofcv_.wait_untilseems to be the cleaner solution.Did you use Generative AI?
No
Additional Information