-
Notifications
You must be signed in to change notification settings - Fork 86
Adds instrumentation for kube client-go #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds instrumentation for kube client-go #2316
Conversation
|
Issue: #2315 |
0aa720f to
1558e7f
Compare
| Name: "http_request_duration_seconds", | ||
| Help: "Latency of kubernetes client requests in seconds by endpoint.", | ||
| }, []string{"endpoint"}), | ||
| results: prometheus.NewCounterVec(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a results counter for "path" and "method" as well to help identifying operations happening during the reconciliation of a given resource ytpe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! I did consider that.
The client-go metrics API doesn't give us all of that context in the ResultMetric interface, and I think that's intentional. It gives us the LatencyMetric (http_request_duration_seconds) that does have method and URL path. I think the best we can do is something along these lines:
- Add method to http_request_duration_seconds (and maybe rate_limiter_duration_seconds.)
- Monitor "sum by (method, endpoint) (rate(http_request_duration_seconds_count[d])) " for request rate by (normalized) url path and method.
- Monitor "sum by (status_code) (rate(http_requests_total{status_code!=200}[d])) " for http error rate in general.
I suspect the intent of the ResultMetric is mostly for clients working with multiple cluster API servers as a lower cardinality indicator of api server trouble. For our single-cluster context that is less interesting, but I included it anyways since the latency metric is absent of the status code field.
Exposes a new http server from the skupper controller and kube-adaptor processes for exposing prometheus metrics. Adds instrumentation around kubernetes API http and client rate limiting. - skupper_kubernetes_client_http_request_duration_seconds: request latency by endpoint. - skupper_kubernetes_client_http_requests_total: total requests by status code. - skupper_kubernetes_client_http_retries_total: total request retries by status code. - skupper_kubernetes_client_rate_limiter_duration_seconds: client side rate limiting latency by endpoint Signed-off-by: Christian Kruse <christian@c-kruse.com>
Signed-off-by: Christian Kruse <christian@c-kruse.com>
Signed-off-by: Christian Kruse <christian@c-kruse.com>
8fcd980 to
d8ccfd5
Compare
internal/kube/metrics/client.go
Outdated
| Subsystem: "kubernetes_client", | ||
| Name: "http_requests_total", | ||
| Help: "Total number of kubernetes client requests by status code.", | ||
| }, []string{"status_code"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion just to add the method to this and to the http_retries_total as well.
Signed-off-by: Christian Kruse <christian@c-kruse.com>
Exposes a new http server from the skupper controller and kube-adaptor processes for exposing prometheus metrics. Adds instrumentation around kubernetes API http and client rate limiting.