Skip to content

VPP-dataplane Setup Failure, CoreDNS and calico-kube-controllers fail to reach kubeAPI Server #685

@umarfarooq-git

Description

@umarfarooq-git

Environment

  • Calico/VPP version: v3.27.0
  • all pods in calico-system namespace: v3.27.2
  • tigera-operator: v1.32.5
  • Kubernetes version:
    Client Version: v1.28.8
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.28.8
  • Deployment type: Vagrant VM with following details.
IP_NW = "192.168.56."
MASTER_IP_START = 1
Vagrant.configure("2") do |config|
config.vm.box = "ubuntu/bionic64"
config.vm.box_check_update = false

#Provision Master Nodes
(1..NUM_MASTER_NODE).each do |i|
config.vm.define "kubemaster" do |node|
# Name shown in the GUI
node.vm.provider "virtualbox" do |vb|
vb.name = "kubemaster"
vb.memory = 4096
vb.cpus = 4 
end
node.vm.hostname = "kubemaster"
node.vm.network :private_network, ip: IP_NW + "#{MASTER_IP_START + i}"
node.vm.network "forwarded_port", guest: 22, host: "#{2710 + i}"
node.vm.network "private_network", ip: "192.168.56.10", virtualbox__hostonly: true
node.vm.provision "setup-hosts", :type => "shell", :path => "ubuntu/vagrant/setup-hosts.sh" do |s|
s.args = ["enp0s8"]
end
node.vm.provision "setup-dns", type: "shell", :path => "ubuntu/update-dns.sh"
end
end
end
  • Network configuration:
Capture

Issue description
CoreDNS and calico-kube-controllers not able to run. CrashLoopBackOff, Probably both of these pods are trying to connect API server at wrong IP. Logs are provided below.

To Reproduce
Steps to reproduce the behavior:

  1. Bring up the VM according to the given Vagrant settings.
  2. I disable the firewall on all machines with command #ufw disable
  3. Disable swap
    sudo swapoff -a
  4. Forwarding IPv4 and letting iptables see bridged traffic
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system
  1. Installed containerd runtime and Configure system as Cgroup driver by putting following details in /etc/containerd/config.toml
version = 2
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true
  1. Installed kubeadm, kubelet and kubectl using apt repository.
  2. Initiated K8S cluster as following.
    kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=192.168.56.2

To Install Calico with the VPP dataplane

Followed the instruction from here. https://docs.tigera.io/calico/latest/getting-started/kubernetes/vpp/getting-started

  1. I assign the huge pages with
    echo "vfio_pci" > /etc/modules-load.d/95-vpp.conf
    modprobe vfio_pci
    echo "vm.nr_hugepages = 512" >> /etc/sysctl.conf
    sysctl -p
    systemctl restart kubelet

  2. kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.2/manifests/tigera-operator.yaml

  3. kubectl apply -f https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.27.0/yaml/calico/installation-default.yaml

  4. curl -o calico-vpp.yaml https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.27.0/yaml/generated/calico-vpp.yaml
    Keep all settings unchanged except for modifying the vpp_dataplane_interface in the configuration file. The remaining configurations will stay the same as those in my cluster, with the service_prefix retaining its default value of 10.96.0.0/12.
    kind: ConfigMap
    apiVersion: v1
    metadata:
    name: calico-config
    namespace: calico-vpp-dataplane
    data:
    service_prefix: 10.96.0.0/12
    vpp_dataplane_interface: enp0s8
    vpp_uplink_driver: ""

Have spent days on it 😞
Want to find out if there is some issue with the version of calico or I am doing something wrong. Bcasuse It works like charm if I simply configure calico CNI using following Link and don't setup VPP dataplane at all.
kubectl create -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml

Expected behavior
I want to setup calico VPP dataplane to test the VPP different available VPP drivers like DPDK for traffic acceleration.

Additional context
Logs of Pods creating issue.

k get pods -A -o wide
Capture

calico-kube-controllers
Capture

Capture

Similar issues
#217
projectcalico/calico#6227

@lwr20 @AloysAugustin @Josh-Tigera Would be grateful for your help everyone.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions