Skip to content

Conversation

@taspelund
Copy link
Contributor

Initial implementation of MP-BGP, supporting IPv4 Unicast and IPv6 Unicast address-families.

Adds MP_REACH_NLRI and MP_UNREACH_NLRI types and handling, capabilities negotiation and handling, updated message parsing framework, enhanced UPDATE error handling (RFC 7606), lots of tests, and more flexible BGP next-hop encoding support (in preparation for extended next-hop and BGP unnumbered).

Implement MP_REACH_NLRI and MP_UNREACH_NLRI (RFC 4760) for IPv6 route
exchange.

Wire format (RFC 7606 compliance):
- Always encode MP-BGP attributes first in UPDATE
- Never encode both traditional and MP-BGP encodings in the same UPDATE
- De-duplicate received non-MP attributes
- Reject duplicate MP-BGP attributes with NOTIFICATION
- Handle received UPDATE carrying traditional and MP-BGP encodings
- Handle received UPDATE carrying both MP_REACH_NLRI and MP_UNREACH_NLRI

Next-hop handling:
- Add BgpNexthop enum: Ipv4, Ipv6Single, Ipv6Double (link-local + GUA)
- Add UpdateMessage::nexthop() to centralize next-hop extraction logic
- Preserve raw bytes in MpReach for capability-aware parsing

Fanout refactor:
- Split into Fanout4/Fanout6 using zero-sized marker types
- Add RouteUpdate enum carrying typed prefix Vecs
- Replace FSM event passing to use AFI/SAFI-specific announcements

Session layer updates:
- Process MP_REACH_NLRI/MP_UNREACH_NLRI in received UPDATEs
- Extract IPv4/IPv6 prefixes separately for RIB updates
- Encode IPv4 Unicast in traditional nlri/withdrawn fields 
- Encode IPv6 Unicast in MP-BGP attributes

Message parsing:
- Add MpReach/MpUnreach structs
- Add prefixes4_from_wire()/prefixes6_from_wire() helpers
- Restrict UPDATE nlri/withdrawn fields to Prefix4
- Restrict NEXT_HOP attribute to Ipv4Addr
- Add Display impl for Aggregator/As4Aggregator

Error handling:
- Add UnsupportedAddressFamily error variant
- Add MalformedAttributeList error variant

Testing:
- Add property-based tests for codec round-trips
- Cover BgpNexthop, MpReach, MpUnreach, UpdateMessage variants
- Test RFC 7606 encoding order and deduplication behavior

Closes: #397
Continue MP-BGP (RFC 4760) implementation with MpReachNlri and
MpUnreachNlri path attribute types. Add comprehensive framework for
parse error handling for all BGP message types, with revised UPDATE
error handling per RFC 7606.

Message layer changes:
- Rewrite MpReachNlri/MpUnreachNlri path attribute types to use parsed
  types in the struct, moving structural parsing into connection layer
  and treating unsupported AFI/SAFIs as parsing errors (if we don't
  support it, then we didn't advertise it, and you sending it to us is
  an error).
- Introduce MessageParseError enum covering all message types (Open,
  Update, Notification, RouteRefresh) with structured error reasons
- Add UpdateParseError with RFC 7606 error actions and reason types
  preserving section context (withdrawn/attributes/nlri)
- Add OpenParseError, NotificationParseError, RouteRefreshParseError
  for non-UPDATE message failures
- Implement mandatory attribute validation (ORIGIN, AS_PATH, NEXT_HOP)
  with treat-as-withdraw semantics per RFC 7606
- Track parse errors in UpdateMessage for session-layer processing

Session layer changes:
- Add AfiSafiState enum to track capability negotiation outcome per
  address family (Unconfigured/Advertised/Negotiated)
- Add StopReason::ParseError variant carrying error codes for proper
  NOTIFICATION message generation
- Handle ConnectionEvent::ParseError in all FSM states
- Add TcpConnectionFails event for recv loop IO errors

Connection layer changes:
- Distinguish parse errors from IO errors via RecvError enum
- Send ParseError events to FSM for all message types instead of
  converting to generic IO errors, preserving error codes for
  NOTIFICATION generation

Testing:
- Add unit tests for attribute error action selection per RFC 7606
- Add unit tests for attribute flag validation
- Add unit tests for error collection during UPDATE parsing
- Add unit tests for mandatory attribute validation
- Update proptest strategies for new MpReachNlri/MpUnreachNlri types
Splits RouteUpdate into V4(RouteUpdate4) and V6(RouteUpdate6) variants,
each of which have an IP-version specific Announce/Withdraw variant.
This uses the type system to ensure that any sender of a RouteUpdate
will be forced to use multiple events if they want to trigger both an
announcement and a withdrawal, upholding the RFC 7606 requirement that
Updates cannot be encoded with more than one of the following populated:
- Traditional Withdrawn field (IPv4)
- Traditional NLRI field (IPv4)
- MP_REACH_NLRI (IPv4 or IPv6)
- MP_UNREACH_NLRI (IPv4 or IPv6)

Additionally, this cleans up some unhelpful tests (like verifying length
or emptiness of nlri/withdrawn fields) and adds several more:
- round-trip codec tests for route-refresh and notification
- simultaneous traditional/mp-bgp encoding of ipv4 unicast
- making sure tests covering mp-bgp enums cover all variants
- making sure tests covering BgpNexthop enum cover all variants
Split ImportExportPolicy into typed IPv4 and IPv6 variants
(ImportExportPolicy4, ImportExportPolicy6) to enable proper
per-AF filtering. This adds IPv6 import filtering.

Key changes:
- Add ImportExportPolicy4/6 types for compile-time type safety
- Add ImportExportPolicyV1 for API backward compatibility
- Update SessionInfo to use per-AF policy fields
- Add per-AF export policy change handling
- Add per-AF route refresh (used for import policy changes)
- Add mark_bgp_peer_stale4/6 for per-AF stale marking

The Neighbor struct retains legacy allow_import/allow_export
fields for API compatibility, with conversion helpers to
combine per-AF policies back to the legacy format.

Closes: #211
- Adds length checks for message and path attribute parsing
- Adds AtomicAggregate mandatory path attribute (basic support)
- Updates Aggregator/As4Aggregator path attribute types to use parsed
  fields instead of raw bytes and have a real Display impl
- Adds logging of non-zero reserved field in MP_REACH_NLRI
- Adds unit tests to ensure length validation works properly
Removes the filtering of originated routes from a received UPDATE.

Scenario:
1. Peer advertises us a route that we are originating and we discard it.
2. We stop advertising that route.
3. Us withdrawing our advertisement doesn't cause the peer to choose a
   different bestpath for that route.
4. We no longer have a path for that route from this peer.

This could also be addressed by:
A) Sending a route-refresh to the peer when we stop originating routes.
B) Inserting a Local path into the RIB for routes we are originating
   which is preferred over BGP paths learned from peers during bestpath.

(A) is not a good idea because we don't know how large the peer's RIB
will be and we'd have to re-process their entire RIB every time we stop
advertising even a single route.

(B) is fine conceptually, as most routing stacks use this approach:
When using a "network" statement to pull routes from other protocols
into the BGP topology or creating an aggregate, a locally originated
path generally goes into the BGP LOC-RIB that is preferred over any
peer-learned path. If an existing routing table entry doesn't exist,
then this generally creates a blackhole route for the originated prefix
(in order to prevent routing loops from occurring as a result of missing
entries for the component routes).
In the case of maghemite, well, really in the case of dendrite, this
routing loop prevention is done without using blackhole routes: packets
arriving on a front panel port that enocounter a NAT "miss" are dropped
rather than submitted to an LPM lookup in the External routing table
(avoiding us becoming a transit router + hairpinning packets sent to us
from outside the rack). All that to say: we don't need local routes for
any kind of forwarding (or blackholing) data plane behavior.

This brings us to the crux of the issue: Inter-VPC routing.
Since we don't currently have a native inter-VPC data plane within the
rack, then we need to have installed an External route that covers our
originated prefix regardless of what mask is used. If we do an exact
match check and filtering, then we just make it harder for our peer to
advertise us a route we need. We effectively require the peer to
advertise us reachability to our own prefix, so long as the mask doesn't
match what we're using (e.g. if we send a /24, they have to send us any
mask other than a /24 that covers the original IP range).

So while removing this filter doesn't address the issues inherent to
us needing the peer to provide us a route back to our own address space,
it does make it simpler for us to learn that route (no longer needing
a route with anything but our own mask) and to avoid extra logic to
ensure we can recover routes that we filtered.
Adds a new API version for MP-BGP.
Updates mgadm to use the new API version.
Adds backwards compatible type and conversion for peers created with old
API versions.
@taspelund taspelund self-assigned this Dec 15, 2025
@taspelund taspelund added needs testing bgp Border Gateway Protocol mgd Maghemite daemon customer For any bug reports or feature requests tied to customer requests rust Pull requests that update rust code labels Dec 15, 2025
@rcgoodfellow rcgoodfellow added this to the 18 milestone Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bgp Border Gateway Protocol customer For any bug reports or feature requests tied to customer requests mgd Maghemite daemon needs testing rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants