Skip to content

Conversation

@jktjkt
Copy link
Contributor

@jktjkt jktjkt commented May 16, 2025

Please see the individual commits; the first eight patches are just some debugging output which made my life easier, and which fixed some small bugs in the debugging code. The real fix for #124 is in the last commit -- at least I hope so. I was running with a previous version of this patch which was simply throwing an exception. I'll leave this as a draft for a few days, and hopefully I'll be able to see if I manage to hit the issue again. Sometimes it happens once every few months (!), sometimes it's also during the initial dev_inst_map enumeration already...

When we issue a command via the Lunatone DALI-to-USB dongle, the first USB packet which we get back has a copy of what we sent, like this:

20:13:24 D: tridonic: _raw_write: 12D6000300002A01000000
20:13:24 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None
20:13:24 D: tridonic: _handle_read 127300002A0100F5D6
[...]
20:13:24 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=00002a01 interval=00f5 seq=d6
20:13:24 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None

When stuff works as expected, this is also accompanied by a USB packet which tells us what the USB adapter read back from DALI in response to our command. It looks like this:

20:13:24 D: tridonic: _handle_read 127100000000008AD6
20:13:24 D: tridonic: _send_raw got message: mode=12 rtype=71 frame=00000000 interval=008a seq=d6
20:13:24 D: tridonic: _send_raw finished seq=d6

Sometimes this second USB packet, however, never arrives. When that happens, the USB dongle appears to continue working properly, but the code would get stuck waiting for a response which will never arrive (and holding a lock, preventing other commands form being sent).

Solve this by treating that as an erroneous response, but continue working. In my experience, this is sufficient and quicker than a full reconnect.

jktjkt added 9 commits May 15, 2025 10:08
Since the f-string defaults to calling __repr__ when the {foo=} form is
used, and because Frame is an object, this would print the object
address by default, which is why we have to ask for the __str__ version
to get usable data.

Fixes: c72374c ("Update the async Tridonic DALI-USB driver to work with 24-bit frames")
Both _bus_watch and _send_raw methods are "waiting for something", so
let's ensure that at least one of them has a nice log prefix. This
should make the debug log more readable and hopefully make me help debug
issue sde1000#124.
Before this patch, the comment suggested (to me, anyway) that it applies
to the first branch. That's not the case, the "sequence number is among
the outstandings" is the common code path which is taken when we do not
hit that bug.
… our packet

When we issue a command via the Lunatone DALI-to-USB dongle, the first
USB packet which we get back has a copy of what we sent, like this:

 20:13:24 D: tridonic: _raw_write: 12D6000300002A01000000
 20:13:24 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None
 20:13:24 D: tridonic: _handle_read 127300002A0100F5D6
 [...]
 20:13:24 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=00002a01 interval=00f5 seq=d6
 20:13:24 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None

When stuff works as expected, this is also accompanied by a USB packet
which tells us what the USB adapter read back from DALI in response to
our command. It looks like this:

 20:13:24 D: tridonic: _handle_read 127100000000008AD6
 20:13:24 D: tridonic: _send_raw got message: mode=12 rtype=71 frame=00000000 interval=008a seq=d6
 20:13:24 D: tridonic: _send_raw finished seq=d6

Sometimes this second USB packet, however, never arrives. When that
happens, the USB dongle appears to continue working properly, but the
code would get stuck waiting for a response which will never arrive (and
holding a lock, preventing other commands form being sent).

Solve this by treating that as an erroneous response, but continue
working. In my experience, this is sufficient and quicker than a
full reconnect.

fixes sde1000#124
@sde1000
Copy link
Owner

sde1000 commented May 17, 2025

In the circumstances where the device fails to provide a response in this way, are you expecting a response from your bus or are you expecting a DALI "NO"? (Or does it sometimes fail in both cases?)

Just wondering about the choice of "there was a framing error" as the default response in this case ­— would "NO" be better?

@jktjkt
Copy link
Contributor Author

jktjkt commented May 19, 2025

(I'm marking this PR as ready because I verified that it works.)

In the circumstances where the device fails to provide a response in this way, are you expecting a response from your bus or are you expecting a DALI "NO"? (Or does it sometimes fail in both cases?)

I don't know. The majority of commands which are sent do not require any reply AFAICT. When I review my recent logs, I get:

Just wondering about the choice of "there was a framing error" as the default response in this case ­— would "NO" be better?

My first version (internal, unpushed) of this patch implemented a "NO" as well. But then I thought -- hey, if the dongle usually responds with a "there was no response for this command", then not sending any USB-level reply sounds like an error to me, and that's why I decided to encode that as a packet which indicates a possible failure. This packet currently gets ignored by the upper layers anyway, so the end result is the same, except that we are not masking a potential failure within the driver.

Anyway, I can change this to a NO if you wish (please let me know).

I was AFK yesterday, but I left this debug logging running, and it caught this event:

06:04:12 D: tridonic: Sending with seq=f1
06:04:12 D: tridonic: _raw_write: 12F100030000A313000000
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None
06:04:12 D: tridonic: _handle_read 12730000A31300F8F1
06:04:12 D: tridonic: _handle_read: signaling for seq=f1
06:04:12 D: tridonic: bus_watch message 12730000A31300F8F1
06:04:12 D: tridonic: Command DTR0(19), immediate
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=0000A313 interval=00f8 seq=f1
06:04:12 D: tridonic: _send_raw rtype DALI16 or DALI24, new outstanding_transmissions=0
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None
06:04:12 D: tridonic: _handle_read 127100000000008AF1
06:04:12 D: tridonic: _handle_read: signaling for seq=f1
06:04:12 D: tridonic: bus_watch message 127100000000008AF1
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=71 frame=00000000 interval=008a seq=f1
06:04:12 D: tridonic: _send_raw finished seq=f1
06:04:12 D: tridonic: Sending with seq=f2
06:04:12 D: tridonic: _raw_write: 12F2200300009130000000
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=2 response=None
06:04:12 D: tridonic: _handle_read 12730000913000E6F2
06:04:12 D: tridonic: _handle_read: signaling for seq=f2
06:04:12 D: tridonic: bus_watch message 12730000913000E6F2
06:04:12 D: tridonic: bus_watch waiting with timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=00009130 interval=00e6 seq=f2
06:04:12 D: tridonic: _send_raw rtype DALI16 or DALI24, new outstanding_transmissions=1
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None
06:04:12 D: tridonic: _handle_read 1273000091300091F2
06:04:12 D: tridonic: _handle_read: signaling for seq=f2
06:04:12 D: tridonic: bus_watch message 1273000091300091F2
06:04:12 D: tridonic: Config command: SetExtendedFadeTime(<group (control gear) 8>)
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=00009130 interval=0091 seq=f2
06:04:12 D: tridonic: _send_raw rtype DALI16 or DALI24, new outstanding_transmissions=0
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None
06:04:12 E: tridonic: _send_raw timed out seq=f2, faking an error response
06:04:12 D: tridonic: _send_raw finished seq=f2
06:04:12 D: tridonic: Sending with seq=f3
06:04:12 D: tridonic: _raw_write: 12F300030000A300000000
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None
06:04:12 D: tridonic: _handle_read 12730000A3000A30F3
06:04:12 D: tridonic: _handle_read: signaling for seq=f3
06:04:12 D: tridonic: bus_watch message 12730000A3000A30F3
06:04:12 D: tridonic: Command DTR0(0), immediate
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=0000A300 interval=0a30 seq=f3
06:04:12 D: tridonic: _send_raw rtype DALI16 or DALI24, new outstanding_transmissions=0
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None
06:04:12 D: tridonic: _handle_read 127100000000008AF3
06:04:12 D: tridonic: _handle_read: signaling for seq=f3
06:04:12 D: tridonic: bus_watch message 127100000000008AF3
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=71 frame=00000000 interval=008a seq=f3
06:04:12 D: tridonic: _send_raw finished seq=f3
06:04:12 D: tridonic: Sending with seq=f4
06:04:12 D: tridonic: _raw_write: 12F420030000912E000000
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=2 response=None
06:04:12 D: tridonic: _handle_read 12730000912E00EDF4
06:04:12 D: tridonic: _handle_read: signaling for seq=f4
06:04:12 D: tridonic: bus_watch message 12730000912E00EDF4
06:04:12 D: tridonic: bus_watch waiting with timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=0000912E interval=00ed seq=f4
06:04:12 D: tridonic: _send_raw rtype DALI16 or DALI24, new outstanding_transmissions=1
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=1 response=None
06:04:12 D: tridonic: _handle_read 12730000912E0091F4
06:04:12 D: tridonic: _handle_read: signaling for seq=f4
06:04:12 D: tridonic: bus_watch message 12730000912E0091F4
06:04:12 D: tridonic: Config command: SetFadeTime(<group (control gear) 8>)
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=73 frame=0000912E interval=0091 seq=f4
06:04:12 D: tridonic: _send_raw rtype DALI16 or DALI24, new outstanding_transmissions=0
06:04:12 D: tridonic: _send_raw waiting for outstanding_transmissions=0 response=None
06:04:12 D: tridonic: _handle_read 127100000000008AF4
06:04:12 D: tridonic: _handle_read: signaling for seq=f4
06:04:12 D: tridonic: bus_watch message 127100000000008AF4
06:04:12 D: tridonic: bus_watch waiting for data, no timeout
06:04:12 D: tridonic: _send_raw got message: mode=12 rtype=71 frame=00000000 interval=008a seq=f4
06:04:12 D: tridonic: _send_raw finished seq=f4

@jktjkt jktjkt marked this pull request as ready for review May 19, 2025 06:24
@sde1000 sde1000 merged commit 9ba8e26 into sde1000:master May 22, 2025
0 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants