-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Kube-apiserver sends two requests, one second apart, to etcd every 10 seconds [0]. When the etcd certs change on disk (e.g. after etcdadm reset and etcdadm init are invoked), the requests are rejected [1]. This appears to have some impact on etcd performance--still investigating.
When cctl recovers an etcd cluster, it first brings down the existing (potentially degraded) cluster and then brings it back up. This changes the CA certs on disk. When etcd performance is impacted, adding a third member can fail (though typically succeeds on a retry).
The workaround may require
Action Items:
- Investigate whether the rejected requests impact etcd performance. If they do, cctl can stop all kube-apiserver instances before--not after--recovering the etcd cluster.
- Consider adding retries to etcdadm's etcd API calls
- Modify recovery test to use three instead of two masters, and also use the etcd benchmark tool to increase the size of the database.
[0] etcd-io/etcd#9285.
[1]
Oct 19 21:07:41 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33182" (error "EOF", ServerName "")
Oct 19 21:07:42 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33186" (error "EOF", ServerName "")
Oct 19 21:07:51 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33202" (error "EOF", ServerName "")
Oct 19 21:07:52 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33206" (error "EOF", ServerName "")
Oct 19 21:08:01 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33222" (error "EOF", ServerName "")
Oct 19 21:08:02 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33226" (error "EOF", ServerName "")
Oct 19 21:08:11 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33242" (error "EOF", ServerName "")
Oct 19 21:08:12 coreos-daniel-478-10-105-16-132platform9.sys etcd[15649]: rejected connection from "127.0.0.1:33246" (error "EOF", ServerName "")
Metadata
Metadata
Assignees
Labels
No labels