Skip to content

Serf Node Join Issues  #745

@anjumm

Description

@anjumm

Hi everyone,

I'm currently working on a network topology using 100 containers running Serf. The setup is a 2-spine, 4-leaf topology, with each leaf hosting 25 nodes. All nodes are in the same subnet, and the configuration is quite simple.

However, I'm facing an issue when trying to join nodes to the Serf cluster:

Up to 40 nodes, the serf join commands work seamlessly, and all nodes can see each other in the cluster.
Beyond 40 nodes, I encounter issues like I/O timeout, No route to host, and failed joins.
I've also noticed a significant increase in ARP broadcast traffic as more nodes are added.
I suspect this may be a network-related issue within the Containerlab setup or Serf's handling of ARP broadcasts. Has anyone encountered similar issues, or does anyone have suggestions on how to mitigate the ARP broadcast or join failures?

I have changed the gossip interval to 5 seconds to see if this can resolve the issue but no luck.
Furthermore, I have used a simple JSON file for each container to initialize serf:
image

Thanks in advance!

I have added images for review:

image

One of the interface of the container:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions