[Bug] Autoscaler doesn't support TLS#1119
Conversation
|
Thank you for your contribution! Would you be able to provide a sample YAML file in this directory? Additionally, I have added you to the Slack channel for KubeRay contributors, |
|
@kevin85421 - Thank you! I've added the sample YAML file now. Also, looks like 1 check is failing due to the same issue on this PR #1096 |
kevin85421
left a comment
There was a problem hiding this comment.
LGTM. I also tested this PR manually.
# Step 0: Build operator image, create Kind cluster, and load into the Kind cluster
# (path: ray-operator)
make docker-image
kind create cluster --image=kindest/node:v1.23.0
kind load docker-image controller:latest
# Step 1: Install the operator and CRD (path: helm-chart/kuberay-operator)
helm install kuberay-operator . --set image.repository=controller,image.tag=latest
# Step 2: Create an autoscaling RayCluster with TLS support.
# (path: ray-operator/config/samples)
kubectl apply -f ray-cluster.autoscaler.tls.yaml
# Step 3: Test autoscaler
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
# 2 worker Pods will be created if the cluster has enough CPUs.
kubectl exec $HEAD_POD -it -c ray-head -- python -c "import ray;ray.init();ray.autoscaler.sdk.request_resources(num_cpus=4)"
| expectedContainer.Resources = customResources | ||
| expectedContainer.EnvFrom = customEnvFrom | ||
| expectedContainer.Env = append(expectedContainer.Env, customEnv...) | ||
| expectedContainer.VolumeMounts = append(customVolumeMounts, expectedContainer.VolumeMounts...) |
There was a problem hiding this comment.
Can you add a comment to explain why VolumeMount uses append(customVolumeMounts, expectedContainer.VolumeMounts...) instead of append(expectedContainer.VolumeMounts, customVolumeMounts...)? It may seem confusing because Env uses append(expectedContainer.Env, customEnv...).
I traced the following functions to understand the order of the VolumeMounts slice.
-
The function
mergeAutoscalerOverrides: AppendAutoscalerOptions.VolumeMountsto the autoscaler container. -
Use
addEmptyDirto append theray-logto the autoscaler container'sVolumeMounts.
| # autoscalerOptions is an OPTIONAL field specifying configuration overrides for the Ray autoscaler. | ||
| # The example configuration shown below below represents the DEFAULT values. | ||
| # (You may delete autoscalerOptions if the defaults are suitable.) | ||
| autoscalerOptions: |
There was a problem hiding this comment.
Is autoscalerOptions the only difference compared to ray-cluster.tls.yaml? If so, we can add the autoscalerOptions section to ray-cluster.tls.yaml. I apologize for missing that. I didn't realize the YAML file would be that long.
|
I will draft a follow-up PR to address my comments (#1119 (comment), #1119 (comment)). Merge this PR. |
Autoscaler doesn't support TLS
Why are these changes needed?
This PR adds the ability to mount volumes in AutoscalerOptions to enable TLS in Autoscaler. At the moment, if you enable TLS and the autoscaler, the autoscaler is unable to connect to the Ray head because the certificates doesn't exist and there was no way to mount volumes where the certs reside like what you can do in the other containers for example
This PR adds the ability to mount volumes in the AutoscalerOptions so we can mount the TLS volumes.
Related issue number
No github issue but here's the Slack chat for more context - https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1684511397156609
Checks