-
Notifications
You must be signed in to change notification settings - Fork 83
Fix a possible deadlock on Lunatone/Tridonic USB dongles #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since the f-string defaults to calling __repr__ when the {foo=} form is
used, and because Frame is an object, this would print the object
address by default, which is why we have to ask for the __str__ version
to get usable data.
Fixes: c72374c ("Update the async Tridonic DALI-USB driver to work with 24-bit frames")
Both _bus_watch and _send_raw methods are "waiting for something", so let's ensure that at least one of them has a nice log prefix. This should make the debug log more readable and hopefully make me help debug issue sde1000#124.
Before this patch, the comment suggested (to me, anyway) that it applies to the first branch. That's not the case, the "sequence number is among the outstandings" is the common code path which is taken when we do not hit that bug.
… our packet When we issue a command via the Lunatone DALI-to-USB dongle, the first USB packet which we get back has a copy of what we sent, like this: 20:13:24 D: tridonic: _raw_write: 12D6000300002A01000000 20:13:24 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None 20:13:24 D: tridonic: _handle_read 127300002A0100F5D6 [...] 20:13:24 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=00002a01 interval=00f5 seq=d6 20:13:24 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None When stuff works as expected, this is also accompanied by a USB packet which tells us what the USB adapter read back from DALI in response to our command. It looks like this: 20:13:24 D: tridonic: _handle_read 127100000000008AD6 20:13:24 D: tridonic: _send_raw got message: mode=12 rtype=71 frame=00000000 interval=008a seq=d6 20:13:24 D: tridonic: _send_raw finished seq=d6 Sometimes this second USB packet, however, never arrives. When that happens, the USB dongle appears to continue working properly, but the code would get stuck waiting for a response which will never arrive (and holding a lock, preventing other commands form being sent). Solve this by treating that as an erroneous response, but continue working. In my experience, this is sufficient and quicker than a full reconnect. fixes sde1000#124
|
In the circumstances where the device fails to provide a response in this way, are you expecting a response from your bus or are you expecting a DALI "NO"? (Or does it sometimes fail in both cases?) Just wondering about the choice of "there was a framing error" as the default response in this case — would "NO" be better? |
|
(I'm marking this PR as ready because I verified that it works.)
I don't know. The majority of commands which are sent do not require any reply AFAICT. When I review my recent logs, I get:
My first version (internal, unpushed) of this patch implemented a "NO" as well. But then I thought -- hey, if the dongle usually responds with a "there was no response for this command", then not sending any USB-level reply sounds like an error to me, and that's why I decided to encode that as a packet which indicates a possible failure. This packet currently gets ignored by the upper layers anyway, so the end result is the same, except that we are not masking a potential failure within the driver. Anyway, I can change this to a NO if you wish (please let me know). I was AFK yesterday, but I left this debug logging running, and it caught this event: |
Please see the individual commits; the first eight patches are just some debugging output which made my life easier, and which fixed some small bugs in the debugging code. The real fix for #124 is in the last commit -- at least I hope so. I was running with a previous version of this patch which was simply throwing an exception. I'll leave this as a draft for a few days, and hopefully I'll be able to see if I manage to hit the issue again. Sometimes it happens once every few months (!), sometimes it's also during the initial
dev_inst_mapenumeration already...When we issue a command via the Lunatone DALI-to-USB dongle, the first USB packet which we get back has a copy of what we sent, like this:
When stuff works as expected, this is also accompanied by a USB packet which tells us what the USB adapter read back from DALI in response to our command. It looks like this:
Sometimes this second USB packet, however, never arrives. When that happens, the USB dongle appears to continue working properly, but the code would get stuck waiting for a response which will never arrive (and holding a lock, preventing other commands form being sent).
Solve this by treating that as an erroneous response, but continue working. In my experience, this is sufficient and quicker than a full reconnect.