-
Notifications
You must be signed in to change notification settings - Fork 608
Description
Hi everyone,
I'm currently working on a network topology using 100 containers running Serf. The setup is a 2-spine, 4-leaf topology, with each leaf hosting 25 nodes. All nodes are in the same subnet, and the configuration is quite simple.
However, I'm facing an issue when trying to join nodes to the Serf cluster:
Up to 40 nodes, the serf join commands work seamlessly, and all nodes can see each other in the cluster.
Beyond 40 nodes, I encounter issues like I/O timeout, No route to host, and failed joins.
I've also noticed a significant increase in ARP broadcast traffic as more nodes are added.
I suspect this may be a network-related issue within the Containerlab setup or Serf's handling of ARP broadcasts. Has anyone encountered similar issues, or does anyone have suggestions on how to mitigate the ARP broadcast or join failures?
I have changed the gossip interval to 5 seconds to see if this can resolve the issue but no luck.
Furthermore, I have used a simple JSON file for each container to initialize serf:

Thanks in advance!
I have added images for review:

