-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the subscriber, client, service and timer #431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull request, but I think it needs a slight change of approach to work best. I've made some comments inline to that effect.
Whether or not we use the shared_ptr
to a C++ object or the rcl_*
equivalent type, we still are adding the overhead of creating shared_ptr
's each time we loop over wait, which will be quite a bit more expensive. There might be a better solution, which is more complicated but also more performant, which involves preventing the destructor from running until the executor is woken up and in between wait calls. Not doing that correctly will result in deadlocks, and might be just as bad as the current solution due to lock contention, but we wouldn't know which is better until we tested it in a few different scenarios.
VectorRebind<std::pair<const rcl_subscription_t *, SubscriptionBase::SharedPtr>> sub_pair_ptrs_; | ||
VectorRebind<rclcpp::ServiceBase::SharedPtr> service_ptrs_; | ||
VectorRebind<rclcpp::ClientBase::SharedPtr> client_ptrs_; | ||
VectorRebind<rclcpp::TimerBase::SharedPtr> timer_ptrs_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should instead store shared_ptr
's to the rcl_*
types. Here's my rationale why, consider this case (current behavior):
- user creates pub and sub which talk
- user spins
- asynchronously user reset's the sub pointer
- when a message is received for the deleted subscription:
- a segfault occurs in
rcl_wait()
for using "finalized" pointer to arcl_subscription_t
- a segfault occurs in
The existing behavior, described in #349, this resulted in segfault's when rcl_wait
was still using an rcl_*
object, because that object had had it's _fini()
function called on it from the destructor of one of our C++ classes.
With the implementation in this pr it goes like this:
- user creates pub and sub which talk
- user spins
- asynchronously user reset's the sub pointer
- when a message is received for the deleted subscription:
- a callback is called for the subscription the user has already deleted
With the proposal from #349, i.e. using shared_ptr
to the rcl_*
types, it would go like this (I think):
- user creates pub and sub which talk
- user spins
- asynchronously user reset's the sub pointer
- when a message is received for the deleted subscription:
- the message is "dropped" because the
weak_ptr
to the subscription the executor has fails to materialize into ashared_ptr
because it no longer exists
- the message is "dropped" because the
It would also be great to have an automated test for this case, so we don't regress on this point. Maybe someone on our team has time to do that if @deng02 doesn't. |
@wjwwood Thanks very much for the feedback. I just want to confirm my understanding of the changes you want. The PR in its current state artificially delays the destruction of the ROS2 subscriber to ensure that the 'finalized' pointer doesn't become invalidated until we're done with it, which somewhat violates the idea that the user is the owner and is managing the lifecycle of that subscriber. Instead if we replace the raw pointers with The one thing I want to clarify is about having the |
Also I definitely agree with the need for an automated test around this. I will try to write one if the team does not have time to create one. |
Hmm, I think we're not on the same page, let me try to clarify what I mean, but first let me respond to your latest feedback inline.
Right, except we need to differentiate in the C++ class and the C But you're right to summarize the issue as violating the contract with the user about the lifecycle of the
I would say no, I would word it as:
Well, So basically I'm proposing that we decouple the C++ class and the
So the documentation for This is obviously not ideal, as the user would like to have some more determinism in how the shutdown of the subscription works. However, this is the simplest solution for the errors/segfaults (and what is implicitly happening right now). A separate, but more completely (and more complicated) solution is to make it so that the destructors of our C++ classes force the executor(s) to give up shared ownership of the
This would change the documentation of the classes, e.g. However, it is not a trivial thing to do (in my opinion), since you need to avoid deadlocks and starvation (usually using a pair of mutex, of which one is a "barrier" mutex, e.g. I use these in the rclcpp/rclcpp/include/rclcpp/graph_listener.hpp Lines 162 to 163 in b81f55e
Hope that clears up what I mean. |
@wjwwood Yes that clears things up, thanks. I'll move forwards with the changes as described above. |
87d147c
to
13ea24b
Compare
@wjwwood I've re-worked the patch to convert the Note: I've tested this change quite a bit with topic subscriptions and also used valgrind to track memory leaks. What is the best way for me to test the other components? |
@guillaumeautran awesome, I'll have a look at it as soon as I can, though I can't promise an immediate turn around right now. |
Ping @wjwwood :) |
Sorry @guillaumeautran, I'll get to it as soon as I can, I just have a lot of other stuff on my plate at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The substance of the change lgtm, but I had a few stylistic comments.
rclcpp/src/rclcpp/client.cpp
Outdated
@@ -38,7 +40,21 @@ ClientBase::ClientBase( | |||
: node_graph_(node_graph), | |||
node_handle_(node_base->get_shared_rcl_node_handle()), | |||
service_name_(service_name) | |||
{} | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: remove unnecessary leading blank line
rclcpp/include/rclcpp/service.hpp
Outdated
@@ -33,6 +35,7 @@ | |||
#include "rmw/error_handling.h" | |||
#include "rmw/rmw.h" | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: minimize vertical whitespace and also avoid pure whitespace changes in pr's
rclcpp/include/rclcpp/service.hpp
Outdated
{ | ||
if (rcl_service_fini(service, node_handle_.get()) != RCL_RET_OK) { | ||
RCUTILS_LOG_ERROR_NAMED( | ||
"RCLCPP", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logger names ought to be lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this might be a good candidate for using a rclcpp
sub logger of the node's logger so that it appears as my_node.rclcpp
rather than just rclcpp
. Imagine this error comes out on a process with several nodes in it. Having the node name as context would help you narrow it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere in the pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've fixed the casing for RCLCPP => rclcpp.
Regarding the node sublogger, shouldn't this be a rcutils concerns (as opposed to encoding the node name when writing the log)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well not if you use the RCLCPP_* logging macros, then it’s rclcpp’s issue. Also how would rcutils know anything about node subloggers. It seems pretty obvious to me that the name needs to be injected by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it for this message only. But the code looks ugly to me. I wonder if there is a smarter way to simplify that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the service had a handle to the C++ node, then it could call rclcpp::Node::get_logger()
, which would be slightly cleaner looking, but it does exactly the same thing. I'll try to re-review this asap.
rclcpp/include/rclcpp/service.hpp
Outdated
// check if service handle was initialized | ||
// TODO(karsten1987): Take this verification | ||
// directly in rcl_*_t | ||
// see: https://github.com/ros2/rcl/issues/81 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like ros2/rcl#81 is closed, can you try replacing this TODO and code snippet with rcl_service_is_valid()
? http://docs.ros2.org/ardent/api/rcl/service_8h.html#ae3f8159e4c6c43f9f2cc10dd4c5f5f3f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Karsten1987 FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
rclcpp/include/rclcpp/service.hpp
Outdated
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: minimize vertical whitespace (for reference, this is part of the Google Style guide which our style is based on, see: https://google.github.io/styleguide/cppguide.html#Vertical_Whitespace)
rclcpp/src/rclcpp/subscription.cpp
Outdated
|
||
auto custom_deletor = [=](rcl_subscription_t *rcl_subs) | ||
{ | ||
std::cout << "Deleting subscription handle\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like vestigial debug output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops...
rclcpp/src/rclcpp/timer.cpp
Outdated
@@ -13,6 +13,7 @@ | |||
// limitations under the License. | |||
|
|||
#include "rclcpp/timer.hpp" | |||
#include "rcutils/logging_macros.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be included after the system headers.
rclcpp/src/rclcpp/timer.cpp
Outdated
@@ -21,11 +22,29 @@ using rclcpp::TimerBase; | |||
|
|||
TimerBase::TimerBase(std::chrono::nanoseconds period) | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: unnecessary vertical whitespace
rclcpp/src/rclcpp/timer.cpp
Outdated
std::string("Timer could not get time until next call: ") + | ||
rcl_get_error_string_safe()); | ||
std::string("Timer could not get time until next call: ") + | ||
rcl_get_error_string_safe()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run ament_uncrustify
or the test suite on this? This is a known oddity of how uncrustify makes use format the code. I would actually expect it to fail the tests with this diff.
rclcpp/include/rclcpp/service.hpp
Outdated
new rcl_service_t, [=](rcl_service_t *service) | ||
{ | ||
if (rcl_service_fini(service, node_handle_.get()) != RCL_RET_OK) { | ||
RCUTILS_LOG_ERROR_NAMED( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the RCLCPP version of these macros now, see:
http://docs.ros2.org/ardent/api/rclcpp/logging_8hpp.html
This also removes the need for (and therefore implicitly address several other style comments I made) the new rcutils includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into it now...
13ea24b
to
60c910d
Compare
Converts all rcl_*_t types in the memory allocation strategy to shared pointers to prevent crash happening when a subscriber is reset. Issue: ros2#349
e9fb761
to
bcb49c1
Compare
We always put the { on a new line for function definitions and class declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Sorry for the delay in reviewing, I needed a block of time to wrap my head around the whole thing.
I fixed one style thing, which surprising uncrustify didn't complain about, I'll have to look into that later.
Looks good on CI too, thanks for your contribution and patience @guillaumeautran and @deng02! |
@guillaumeautran if you have time to contribute a test for this that would be great too. Realistically, I'll not have time to do it, nor will anyone on our team most likely. |
* Revert "Revert "Store the subscriber, client, service and timer (#431)" (#448)" This reverts commit 168d75c. * Convert all rcl_*_t types to shared pointers Converts all rcl_*_t types in the memory allocation strategy to shared pointers to prevent crash happening when a subscriber is reset. Issue: #349 * fixups
objects as shared pointers (#349)
To prevent an object from being deleted while the rcl_wait_set is
using raw pointers to internal members, we store them as shared
pointers.
The subscriptions are stored as a pair because a single
subscription handle can have both an intra and non-intra
process handle being used by the wait set. In order to
remove objects based on null wait set handles, we need both.