Skip to content

Conversation

@jkroonza
Copy link
Contributor

$ strace -p 3701
strace: Process 3701 attached
read(14^C^Cstrace: Process 3701 detached

$ ls -lah /proc/3701/fd/14
lrwx------ 1 root root 64 Oct 22 02:52 /proc/3701/fd/14 -> 'socket:[2478479691]'

$ grep 2478479691 /proc/net/{udp,tcp,raw}*
/proc/net/raw6: 118: 000002FF000000000000000002000100:003A 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 2478479691 2 00000000c0af204a 0

So it's blocking trying to receive RS. This seems bogus, as the only raw socket is dhcpv6relay_sock_rsra, and this only ever gets read from dhcpv6relay_router_solicitation which should only ever be called if event-handler determines there's data available.

All sockets are datagram (and by implication already deemed "unreliable") based, so making them all non-blocking should have no adverse effect, however, blocking pppd process most certainly has undesired effects.

@jkroonza
Copy link
Contributor Author

jkroonza commented Oct 22, 2025

@paulusmack perhaps you'd like to see if you can spot something else, I make explicit note there are other raw sockets used in pppd, but I can also confirm socket fds 12 and 13 are dhcpv6proxy based on udp6 sockets with dhcp port numbers, and the rsra socket is created directly after those in the code ... so I'm 99% sure this is the one.

We only seem to get this on a specific remote brand router though ... which does make me wonder about a few things.

Not yet in production, would prefer another eyeball on this before taking to production (which is based on the specific remote router and setup that is quite unique to the brand - not the one with the IA_NA problem - the only relevant place I can reliably test this).

@enaess definitely a 2.5.3 problem in my opinion :).

@Neustradamus
Copy link
Member

@paulusmack: Have you seen this @jkroonza PR?

@paulusmack
Copy link
Collaborator

With these sockets in non-blocking mode, are we still guaranteed to get complete packets returned by recvfrom()? Or could we get partial packets and need to reassemble them?

@paulusmack
Copy link
Collaborator

BTW, I agree with the need for all sockets used in pppd to be non-blocking.

@jkroonza
Copy link
Contributor Author

With these sockets in non-blocking mode, are we still guaranteed to get complete packets returned by recvfrom()? Or could we get partial packets and need to reassemble them?

These are all either UDP or RAW sockets, so to the best of my understanding yes I believe so. The read will either succeed, or fail. In the case of buffer too small, question really is, will we end up in a situation where we receive a partial packet or will we get -1 with some appropriate error.

Based on experience with work in DNS I believe we end up with a truncated packet in the case of read().

In the case of ppp we have two cases:

  1. Router Solicitation messages, in which case we don't care, the content of the RS doesn't change the content of the Router Announcement we respond with. We're using read() here, and truncated packet is fine.

  2. In the case of DHCP messages it gets a LOT more tricky, based on the description of MSG_TRUNC we will end up with a truncated packet, MSG_TRUNC there would allow us to detect it at least.

@jkroonza
Copy link
Contributor Author

Read up a bit more, yes, with any given recvfrom and/or read we will always get exactly one (potentially) truncated packet. In the case where we read a truncated packet, that's it, it's gone. In the current code the recvfrom will log if the return is >= buffer, it upon reading up a bit more, it will return == buffer if exact or overflown, so I really aught to add MSG_TRUNC, and then change the test to > sizeof(buffer). I find it highly unlikely that a buffer would ever need to be >1KB, however, I can't preclude it either. Fixing that, IMHO should be a separate PR, and I would need suggestions on how to best handle it - obviously allocating a 64KB buffer to ensure that it can never be overflown is impractical. However, fixing the boundary issue that's currently there seems fairly easy at least.

It seems fixing the truncation issue involves a double recvfrom on every recvfrom, using MSG_PEEK | MSG_TRUNC first to determine the required buffer length, then to allocate the buffer (alloca in this case would work just fine to just put it on the stack) before using recvfrom to read into that buffer.

As mentioned, separate issue though, this PR will just make sure we don't block if somehow we attempt a read when there is nothing to read. I still think the fact that we're seeing this is indicative of a bug in the event code ... so I'll take a proper read through that as well, having made a quick read through of that prior to making these non-blocking I didn't spot anything obvious (but as usual, something is only non-obvious until it is).

@chipitsine
Copy link
Contributor

if we are not blocking read when "there's nothing to read", does that mean in such case we'll get empty response ? which is not handled now, right ?

@jkroonza
Copy link
Contributor Author

if we are not blocking read when "there's nothing to read", does that mean in such case we'll get empty response ? which is not handled now, right ?

We'll get a return of -1 with EAGAIN / EWOULDBLOCK, This will log an error in the logs, as getting to that is already a bug.

I note that the two uses of recvfrom() already had MSG_DONTWAIT, thus the only possible blcoking operation would be the read() on line 750.

Looking at it that way, I think that read() should simply become a recvfrom() with MSG_DONTWAIT as well. Let me rework the patch quickly. Going to double-commit and also enable the MSG_TRUNC options to fix the off-by-one buffer overflow issue.

unicast.

https://datatracker.ietf.org/doc/html/rfc4861#section-4.2 states:

Destination Address
    Typically the Source Address of an invoking Router
    Solicitation or the all-nodes multicast address.

This implies we should respond *to* the requesting device only using
unicast, thus make it so.

Use recvfrom to achieve this, also making the socket non-blocking.

If we fail to receive due to EWOULDBLOCK or EAGAIN - assume we "lost" a
RS and just send a RA to multicast anyway.

Signed-off-by: Jaco Kroon <jaco@uls.co.za>
When calling recvfrom we can pass MSG_TRUNC to always get the number of
bytes that was in the original packet (for non-connection oriented
protocols such as UDP) rather than the number of bytes placed into the
buffer.  This allows us to compare the number of bytes in the packet to
the size of the buffer rather than just the number of bytes placed in
the buffer, allowing us to receive and process a full buffer rather than
just 1 byte less than the buffer size.

Technically this fixes an off by one error, but there should be no
operational impact and highly unlikely to occur in the wild.

Signed-off-by: Jaco Kroon <jaco@uls.co.za>
@jkroonza
Copy link
Contributor Author

Latest push is still intended to fix the blocking situation, however, I also noticed from the RFC that when responding to a router solicitation it should be a unicast response, so the non-blocking is addressed as a side effect of switching to recvfrom() to achieve that, and adding MSG_DONTWAIT.

This means writes will still be blocking - but due to the nature of unconnected packets an extended block is extremely unlikely (maybe a millisecond or so), which I believe for reliability is preferable in this case.

I've tacked on a second commit to fix the off-by-one buffer overrun checks in other uses of recvfrom() as well whilst I'm busy with this code.

This code has been tested but not yet deployed to production, I will action that after a meeting starting in 10.

@paulusmack paulusmack merged commit 1affa97 into ppp-project:master Dec 5, 2025
31 checks passed
@Neustradamus Neustradamus removed the request for review from paulusmack December 6, 2025 07:36
@Neustradamus
Copy link
Member

@jkroonza: Thanks for your work and thanks to @paulusmack for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants