A sample Orleans application for demonstrating grain calls getting dropped/locked when publishing to self on SMS.
This is a simplification of some production code, we've pulled out the relevant parts to demonstrate the issue.
Our grains are event-raising models that form part of a reliable event-sourced system. When they are called, they do some work and then emit events on a "background" queue - this is represented by the Timer which executes the delegate in an interleaving fashion, which frees the grain to accept and process more work while publishing occurs.
In our solution we have seen messages like the following:
Dropping expired message "NewPlacement Request ..." at phase Send
Indicating that messages to the grain are being dropped. The caller is held until the timeout.
This seems to occur because background publishing work is queued while the grain is told to start DeativateOnIdle. We suspect it is entering a tear-down phase, stopping new work being taken by the grain, it then tries to perform the queued publish tasks but because the grain observes itself and SMS leads to call-chain reentrancy it ends up trying to deliver the events to itself but gets deadlocked beause the grain can't accept new jobs while it's deactivating. Any new grain calls that happen now are blocked waiting for the blocked deactivation.
This is occuring in our integration tests rather than in production environements at the moment because of our (seemingly strange) use of DeactivateOnIdle. We've written our own persistence layer for grain stroage and need to test that activtation & state rehydration is consistent and working in different test scenarios. As part of those test we've wrriten a grain extension that allows us to shutdown any grain to force a reactivation. Currently, the next test step is failing because it's hanging and timing out against the grain. Adding some artifical delay before telling the grain to deactivate does patch the issue.
Doing things
Done thing 1
Done thing 2
Shutting down
Publishing event for thing 2
Publishing event for thing 1
Message: Hi, from thing 2
Message: Hi, from thing 1
Published event for thing 1
Published event for thing 2
Calling DeactivateOnIdle
Doing another thing
Deactivating
Done another thing
Done
Deactivating
Doing things
Done thing 1
Done thing 2
Shutting down
Publishing event for thing 1
Publishing event for thing 2
Calling DeactivateOnIdle
Doing another thing
Unhandled exception. System.TimeoutException: ...