A Rust-based tool for deploying Talos Linux Kubernetes clusters with the Cilium CNI. Currently supports Hetzner Cloud, with more cloud providers coming soon. Similar to terraform-hcloud-talos but built entirely in Rust without Terraform dependencies.
Warning
This project is under active development and is considered experimental. Features may change, and not all functionality is production-ready yet. If you encounter bugs or have feature requests, please open an issue on GitHub.
- Automated Cluster Deployment: Create production-ready Kubernetes clusters on Hetzner Cloud
- Talos Linux: Immutable, minimal, and secure Kubernetes operating system
- Cilium CNI: High-performance networking with eBPF
- Web Dashboard: Modern web UI for cluster management, monitoring, and operations
- LoadBalancer Support: Cilium Node IPAM for LoadBalancer services using node IPs
- Prometheus Monitoring: Built-in support for Prometheus stack (Prometheus, Grafana, AlertManager)
- Metrics Server: Kubernetes resource metrics for HPA and kubectl top commands
- Cluster Autoscaler: Automatic worker node scaling based on pod resource demands (official Kubernetes autoscaler with Hetzner support)
- Private Networking: Automatic setup of Hetzner Cloud private networks
- Security First:
- Firewall with Talos/Kubernetes API ports pre-configured
- IP allowlisting (restricts access to your IP only)
- Flexible Configuration: YAML-based cluster configuration
- Multiple Node Types: Support for control plane and worker nodes with different specifications
- Health Checks: Built-in validation and cluster readiness checks
Before using this tool, you need to install the following CLI tools:
- talosctl - Talos Linux CLI tool (installation guide)
- kubectl - Kubernetes CLI tool (installation guide)
- helm - Kubernetes package manager (installation guide)
Download the latest release from the GitHub Releases page:
# Replace PLATFORM with: linux-x86_64, linux-aarch64, macos-x86_64, or macos-aarch64
curl -LO https://github.com/dihmeetree/oxide/releases/latest/download/oxide-PLATFORM.tar.gz
tar xzf oxide-PLATFORM.tar.gz
sudo mv oxide /usr/local/bin/git clone https://github.com/dihmeetree/oxide
cd oxide
cargo build --release
cargo install --path .The binary will be available as oxide.
Before deploying clusters, you need to create a Hetzner Cloud snapshot containing the Talos image. Choose one of the following methods:
Note: Check the latest Talos version at https://github.com/siderolabs/talos/releases and update the version in the commands below accordingly.
# 1. Create a temporary server
hcloud server create --type cx11 --name talos-snapshot --image ubuntu-22.04 --location nbg1
# 2. Enable rescue mode and reboot
hcloud server enable-rescue talos-snapshot
hcloud server reboot talos-snapshot
# 3. SSH into the rescue system
ssh root@<server-ip>
# 4. Download and write the Talos image
cd /tmp
wget -O /tmp/talos.raw.xz https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.11.0/hcloud-amd64.raw.xz
xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
# 5. Shutdown the server
shutdown -h now
# 6. Wait a moment, then create snapshot from Hetzner Console or CLI
hcloud server create-image --type snapshot --description "Talos v1.11.0" talos-snapshot
# 7. Note the snapshot ID (you'll need this for configuration)
hcloud image list
# 8. Delete the temporary server
hcloud server delete talos-snapshotFor automated image creation, use HashiCorp Packer:
# See the official Talos documentation for Packer configuration:
# https://www.talos.dev/v1.11/talos-guides/install/cloud-platforms/hetzner/
# Example: Use the terraform-hcloud-talos/_packer directory in this repo
cd terraform-hcloud-talos/_packer
export HCLOUD_TOKEN=your-token-here
packer init .
packer build .Create an example configuration file:
oxide initThis creates a cluster.yaml file with default settings that you can customize.
Edit the cluster.yaml file to match your requirements:
cluster_name: my-talos-cluster
hcloud:
# Get your token from https://console.hetzner.cloud/
# Or set HCLOUD_TOKEN environment variable
location: nbg1
network:
cidr: 10.0.0.0/16
subnet_cidr: 10.0.1.0/24
zone: eu-central
talos:
version: v1.11.3
kubernetes_version: 1.34.1
hcloud_snapshot_id: "123456789" # Your snapshot ID from step 1
cilium:
version: 1.17.8
enable_hubble: true
enable_ipv6: false
prometheus:
version: 77.13.0
enabled: true
namespace: monitoring
enable_grafana: true
enable_alertmanager: true
retention: 30d
storage_size: 50Gi
enable_persistent_storage: false
metrics_server:
enabled: true
control_planes:
- name: control-plane
server_type: cpx21 # 3 vCPUs, 4GB RAM
count: 3
workers:
- name: worker
server_type: cpx31 # 4 vCPUs, 8GB RAM
count: 3export HCLOUD_TOKEN=your-hetzner-cloud-api-tokenoxide createThis will:
- Detect your public IP and create firewall rules
- Create a private network
- Provision control plane and worker servers with firewall applied
- Generate and apply Talos configurations
- Bootstrap the Kubernetes cluster
- Install Cilium CNI
- Generate kubeconfig file
- Install optional components (Metrics Server, Prometheus, Autoscaler) based on configuration
Security Notes:
- Firewall restricts Talos and Kubernetes API access to your current IP address only
- All inter-cluster communication uses private network
- Talos provides secure API-only access (no SSH)
The process typically takes 5-10 minutes.
export KUBECONFIG=./output/kubeconfig
kubectl get nodesOxide includes a comprehensive web dashboard for managing and monitoring your Kubernetes clusters through a modern, responsive UI:
# Start the dashboard server (default port: 3000)
oxide dashboard
# Use a custom port
oxide dashboard --port 8080
# Use a custom configuration file
oxide --config my-cluster.yaml dashboardOnce started, open your browser to http://localhost:3000 to access the dashboard.
- Home Page: Overview with total clusters, nodes, system status, and alert counts
- Cluster List: View all clusters with their status, node count, and Talos version
- Cluster Details: Detailed view with nodes, control plane/worker counts, and metrics charts
- Create Cluster: Web form to deploy new clusters without CLI
- Cluster Operations:
- Scale worker or control plane nodes
- Upgrade Talos version
- Delete clusters with confirmation
- Nodes List: View all nodes across clusters with CPU/Memory usage and status
- Node Details:
- Resource metrics (CPU/Memory usage with historical charts)
- All pods running on the node
- Node specifications and role
- Real-time status monitoring
- Pods List: View all pods across the cluster with filtering and sorting
- Sort by CPU usage (highest to lowest)
- Status indicators (Running, Pending, Failed)
- Resource usage metrics
- Pod Details:
- CPU and Memory usage with percentage and limits
- Container information (image, resources, restart counts)
- Pod labels and configuration
- Status and restart history
- Pod Logs:
- Real-time log viewing with syntax highlighting
- Log level detection (Error, Warning, Info, Debug)
- Container selection for multi-container pods
- Configurable tail lines
- Services List: View all Kubernetes services
- Service types (ClusterIP, NodePort, LoadBalancer)
- Port configurations
- Endpoint counts
- Service Details:
- Cluster and External IPs
- Port mappings with protocols
- Selector labels and session affinity
- Active endpoints
- Deployments List: View all deployments with replica status
- Available, progressing, and unavailable counts
- Update strategy information
- Deployment Details:
- Replica status and scaling information
- Update strategy (RollingUpdate, Recreate)
- Deployment conditions
- Pod list with status
- Labels and selectors
- Events Page: Real-time Kubernetes events
- Warning and Normal event counts
- Event timeline with filtering
- Object details (Pod, Node, Service, etc.)
- Event messages and reasons
- Occurrence counts and timestamps
Production-ready cluster insights with actionable recommendations:
Resource Management:
- Pods without resource limits or requests
- Over-provisioned pods (using <20% of requests)
- Under-provisioned pods (using >90% of limits)
- Pods with high restart counts
Reliability:
- Deployments with single replicas (no HA)
- Pods missing liveness or readiness probes
- Frequent pod restarts indicating instability
Security:
- Pods running as root user
- Privileged containers
- Containers using hostPath volumes
- Pods using 'latest' image tags
Configuration:
- Services without endpoints (no backing pods)
- Namespaces without resource quotas
Each insight includes:
- Severity level (High, Medium, Low)
- Clear description of the issue
- Actionable recommendations
- List of affected resources
- Category classification
Cilium Page:
- Cilium version and configuration status
- Hubble and IPv6 enablement status
- Per-pod CPU and Memory usage charts
- Cilium pod details with resource metrics
- Historical performance data
Envoy Page (if using Envoy with Cilium):
- Envoy version and pod status
- CPU and Memory usage per pod
- RPS (Requests Per Second) metrics
- HTTP status code distribution (2xx, 3xx, 4xx, 5xx)
- Historical metrics and trends
- Prometheus Integration: Real-time metrics collection and visualization
- Historical Charts: CPU and Memory usage over time
- Multi-Pod Views: Compare metrics across pods
- Interactive Legends: Click to show/hide specific pods
- Automatic Refresh: Data updates every 30 seconds
- Alert Integration: Shows firing alerts count in navigation
- Modern Design: Clean, responsive interface with dark mode
- Fast Navigation: Quick access to all resources
- Search & Filter: Find resources quickly
- Status Indicators: Color-coded status badges
- Real-time Updates: Automatic data refresh
- Responsive Layout: Works on desktop, tablet, and mobile
- Breadcrumb Navigation: Easy navigation through resource hierarchy
# Using default cluster.yaml
oxide create
# Using a custom configuration file
oxide --config my-cluster.yaml create# Using default cluster.yaml
oxide status
# Using a custom configuration file
oxide --config my-cluster.yaml statusShows information about all servers organized by node pools, including current node counts and server specifications.
Scale the number of nodes in your cluster up or down:
# Scale workers to 5 nodes (uses first worker pool by default)
oxide scale worker --count 5
# Scale control plane nodes to 3
oxide scale control-plane --count 3
# Scale a specific node pool
oxide scale worker --count 10 --pool worker-largeScaling Behavior:
- Scale Up: Creates new nodes with the same configuration as the existing pool, automatically configures them with Talos, and applies firewall rules
- Scale Down: Removes the newest nodes first (highest index numbers)
- Pool-specific: Can target specific node pools if you have multiple worker or control plane pools configured
Example Use Cases:
# Increase workers for higher workload
oxide scale worker --count 10
# Scale down to save costs during low-usage periods
oxide scale worker --count 2
# Add more control plane nodes for HA
oxide scale control-plane --count 3Important Notes:
- Scaling is idempotent - if already at target count, no changes are made
- New nodes are automatically joined to the cluster
- When scaling down, ensure your workloads can handle node removals
- Control plane scaling: maintaining odd numbers (1, 3, 5) is recommended for etcd quorum
# Using default cluster.yaml
oxide destroy
# Using a custom configuration file
oxide --config my-cluster.yaml destroyWarning: This permanently deletes all servers, networks, and SSH keys.
Upgrade your cluster nodes to a new Talos version:
# Upgrade control plane nodes only
oxide upgrade --version v1.11.3 --control-plane
# Upgrade worker nodes only
oxide upgrade --version v1.11.3 --workers
# Upgrade both control plane and worker nodes
oxide upgrade --version v1.11.3 --control-plane --workers
# Upgrade without preserving node data (default is to preserve)
oxide upgrade --version v1.11.3 --control-plane --workers --preserve false
# Wait and observe each node upgrade (shows live progress)
oxide upgrade --version v1.11.3 --control-plane --wait
# Stage the upgrade (applies on next reboot, useful if upgrade fails due to open files)
oxide upgrade --version v1.11.3 --workers --stageUpgrade Behavior:
- Sequential Upgrade: Nodes are upgraded one at a time to maintain cluster availability
- Automatic Image Selection: Installer image is automatically constructed from version (e.g.,
ghcr.io/siderolabs/installer:v1.11.3) - Data Preservation: By default, node data is preserved during upgrade (
--preserve true) - Granular Control: Can upgrade control plane and workers independently
- Progress Logging: Shows detailed progress for each node upgrade
- etcd Quorum Protection: Talos automatically refuses control plane upgrades that would break etcd quorum
Upgrade Options:
--version: Talos version to upgrade to (required)--control-plane: Upgrade control plane nodes--workers: Upgrade worker nodes--preserve: Preserve node data (default: true)--wait: Wait and observe the upgrade process for each node (shows live output)--stage: Stage the upgrade to apply on next reboot (useful if upgrade fails due to open files)
Example Upgrade Workflow:
# 1. Upgrade control plane nodes first
oxide upgrade --version v1.11.3 --control-plane
# 2. Wait for control plane to stabilize
kubectl get nodes
# 3. Upgrade worker nodes
oxide upgrade --version v1.11.3 --workersImportant Notes:
- At least one of
--control-planeor--workersmust be specified - Upgrade Path: Always upgrade through adjacent minor releases sequentially (e.g., 1.10 → 1.11 → 1.12)
- Control Plane First: Recommended to upgrade control plane nodes before worker nodes
- One at a Time: Nodes are upgraded sequentially to maintain cluster availability - avoid upgrading all nodes simultaneously
- Kubernetes Version: Talos upgrade does NOT automatically upgrade Kubernetes version
- Automatic Rollback: If new version fails to boot, Talos will automatically rollback
- Version Compatibility: Check Talos upgrade documentation for version compatibility
Install the Prometheus monitoring stack (Prometheus, Grafana, AlertManager):
oxide install-prometheusThis installs the kube-prometheus-stack Helm chart with:
- Prometheus server with persistent storage
- Grafana dashboards (default login: admin/admin)
- AlertManager for notifications
- Service monitors for Cilium and Kubernetes components
oxide prometheus-statusShows the status of all Prometheus components and provides Grafana access instructions.
To access Grafana locally, use port-forwarding:
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 --kubeconfig=./output/kubeconfigThen open http://localhost:3000 in your browser:
- Username:
admin - Password:
admin(change after first login)
To access Prometheus UI locally:
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 --kubeconfig=./output/kubeconfigThen open http://localhost:9090 in your browser.
To access AlertManager UI locally:
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093 --kubeconfig=./output/kubeconfigThen open http://localhost:9093 in your browser.
oxide uninstall-prometheusInstall the Kubernetes Cluster Autoscaler with Hetzner support to automatically scale worker nodes based on pod resource requests:
oxide install-autoscalerThis installs the official Kubernetes Cluster Autoscaler configured for Hetzner Cloud provider. The autoscaler will:
- Automatically add worker nodes when pods cannot be scheduled due to insufficient resources
- Remove underutilized worker nodes to save costs
- Respect min/max node limits configured per worker pool
Configuration Example:
autoscaler:
enabled: true
worker_pools:
- name: worker-pool
server_type: cpx11 # Hetzner server type
location: fsn1 # Hetzner location
min_nodes: 1
max_nodes: 10Monitor Autoscaler Logs:
kubectl logs -n oxide-system -l app=cluster-autoscaler -f --kubeconfig=./output/kubeconfigImportant Notes:
- The autoscaler only scales worker nodes, not control plane nodes
- Scaling decisions are based on pod resource requests (CPU/memory), not actual usage
- Nodes are created with the same Talos configuration as your initial worker nodes
- The autoscaler respects PodDisruptionBudgets when scaling down
oxide uninstall-autoscalerInstall the Kubernetes Metrics Server for resource metrics and HPA support:
oxide install-metrics-serverThe Metrics Server enables:
kubectl top nodesandkubectl top podscommands- HorizontalPodAutoscaler (HPA) to scale pods based on CPU/memory usage
- Resource-based autoscaling decisions
Verify Installation:
kubectl top nodes --kubeconfig=./output/kubeconfigNote: Metrics Server is automatically installed during cluster creation if enabled in the configuration.
oxide uninstall-metrics-server| Field | Description | Required |
|---|---|---|
cluster_name |
Unique name for your cluster | Yes |
hcloud |
Hetzner Cloud settings | Yes |
talos |
Talos Linux configuration | Yes |
cilium |
Cilium CNI settings | Yes |
prometheus |
Prometheus monitoring settings | No |
metrics_server |
Metrics Server settings | No |
autoscaler |
Cluster autoscaler settings | No |
control_planes |
Control plane node specs | Yes |
workers |
Worker node specs | No |
| Field | Description | Default |
|---|---|---|
token |
API token (or use HCLOUD_TOKEN env var) |
- |
location |
Data center location (nbg1, fsn1, hel1, etc.) | nbg1 |
network.cidr |
Private network CIDR | 10.0.0.0/16 |
network.subnet_cidr |
Subnet CIDR | 10.0.1.0/24 |
network.zone |
Network zone | eu-central |
| Field | Description | Default |
|---|---|---|
version |
kube-prometheus-stack chart version | 77.13.0 |
enabled |
Enable Prometheus installation | true |
namespace |
Kubernetes namespace for Prometheus | monitoring |
enable_grafana |
Enable Grafana dashboards | true |
enable_alertmanager |
Enable AlertManager | true |
retention |
Prometheus data retention period | 30d |
storage_size |
Prometheus persistent storage size | 50Gi |
enable_persistent_storage |
Enable persistent storage for Prometheus | false |
| Field | Description | Default |
|---|---|---|
enabled |
Enable metrics server | true |
Note: Metrics Server is automatically installed during cluster creation when enabled.
| Field | Description | Required |
|---|---|---|
enabled |
Enable cluster autoscaler | Yes |
worker_pools |
List of worker pools to autoscale | Yes |
Worker Pool Configuration:
| Field | Description | Required | Default |
|---|---|---|---|
name |
Worker pool name | Yes | - |
server_type |
Hetzner server type (cpx11, cpx21...) | Yes | - |
location |
Hetzner location (fsn1, nbg1...) | Yes | - |
min_nodes |
Minimum autoscaled nodes (set to 0 to preserve initial worker nodes) | No | 0 |
max_nodes |
Maximum autoscaled nodes | Yes | - |
Important: Set
min_nodes: 0to ensure the autoscaler only manages nodes it creates dynamically, leaving your initial worker nodes (defined inworkers.count) untouched. This way:
- Your base worker nodes always remain in the cluster
- The autoscaler only creates/deletes additional nodes above this baseline
- Pods will be consolidated back to original nodes when autoscaled nodes are no longer needed
| Field | Description | Default |
|---|---|---|
name |
Node name prefix | - |
server_type |
Hetzner server type (cx21, cpx31, etc.) | - |
count |
Number of nodes to create | 1 |
labels |
Additional Kubernetes labels | {} |
Shared vCPU (AMD EPYC):
| Type | vCPUs | RAM | Storage | Price/Month |
|---|---|---|---|---|
| cpx11 | 2 | 2GB | 40GB | ~€4.49 |
| cpx21 | 3 | 4GB | 80GB | ~€8.99 |
| cpx31 | 4 | 8GB | 160GB | ~€15.99 |
| cpx41 | 8 | 16GB | 240GB | ~€29.99 |
| cpx51 | 16 | 32GB | 360GB | ~€59.99 |
Dedicated vCPU (AMD EPYC):
| Type | vCPUs | RAM | Storage | Price/Month |
|---|---|---|---|---|
| ccx13 | 2 | 8GB | 80GB | ~€12.99 |
| ccx23 | 4 | 16GB | 160GB | ~€25.99 |
| ccx33 | 8 | 32GB | 240GB | ~€49.99 |
| ccx43 | 16 | 64GB | 360GB | ~€99.99 |
| ccx53 | 32 | 128GB | 600GB | ~€199.99 |
| ccx63 | 48 | 192GB | 960GB | ~€299.99 |
See Hetzner Cloud pricing for all available types.
The tool creates:
- Firewall: Hetzner Cloud firewall with restricted access to Talos and Kubernetes APIs
- Private Network: A Hetzner Cloud private network for inter-node communication
- Control Plane Nodes: Run the Kubernetes control plane (etcd, API server, scheduler, controller manager)
- Worker Nodes: Run your application workloads
- Cilium: Provides networking, load balancing, and network policies
Your IP (Firewall Allowed)
↓
┌──────────────────────────────────────────────┐
│ Hetzner Cloud Firewall │
│ - Talos API (50000): Your IP only │
│ - Kubernetes API (6443): Your IP only │
│ - HTTP (80): Public access │
│ - HTTPS (443): Public access │
└──────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────┐
│ Hetzner Cloud Private Network │
│ 10.0.0.0/16 │
│ Node Subnet: 10.0.1.0/24 │
│ Pod CIDR: 10.0.16.0/20 │
│ Service CIDR: 10.0.8.0/21 │
│ │
│ ┌────────────┐ ┌────────────┐ │
│ │ Control │ │ Control │ │
│ │ Plane 1 │ │ Plane 2 │ ... │
│ └────────────┘ └────────────┘ │
│ │
│ ┌────────────┐ ┌────────────┐ │
│ │ Worker 1 │ │ Worker 2 │ ... │
│ └────────────┘ └────────────┘ │
└──────────────────────────────────────────────┘
The automatically configured firewall includes:
| Port | Protocol | Source | Purpose |
|---|---|---|---|
| 6443 | TCP | Your IP | Kubernetes API |
| 50000 | TCP | Your IP | Talos API |
| 80 | TCP | 0.0.0.0/0 | HTTP Traffic |
| 443 | TCP | 0.0.0.0/0 | HTTPS Traffic |
Note: Internal cluster communication on the private network (10.0.0.0/16) is not restricted by Hetzner Cloud firewalls.
After cluster creation, the following files are generated in the output/ directory:
controlplane.yaml- Talos configuration for control plane nodesworker.yaml- Talos configuration for worker nodestalosconfig- Talos client configurationkubeconfig- Kubernetes client configurationsecrets.yaml- Talos secrets (keep secure!)
Important: The secrets.yaml file contains sensitive information. Keep it secure and never commit to version control.
- Check API token: Ensure
HCLOUD_TOKENis set correctly - Verify prerequisites: Make sure talosctl, kubectl, and helm are installed
- Check logs: Run with
--verboseflag for detailed output - Resource limits: Verify your Hetzner account has sufficient resources
# Check Talos node status
talosctl --talosconfig ./output/talosconfig --nodes <node-ip> health
# Check Kubernetes pods
kubectl get pods -A# Check Cilium status
kubectl get pods -n kube-system -l k8s-app=cilium
# View Cilium logs
kubectl logs -n kube-system -l k8s-app=ciliumExample monthly costs for a 3 control plane + 3 worker cluster:
- Control Planes (3x cpx21): ~€27/month (3 × €8.99)
- Workers (3x cpx31): ~€48/month (3 × €15.99)
- IPv4 Addresses (6 servers): ~€3/month (6 × €0.50)
- Network: Free
- Snapshot: ~€0.50/month
- Traffic: 1-5TB free per server (depending on type)
Total: ~€78.50/month
Costs are approximate. See Hetzner pricing for exact rates.
- Single Binary: No Terraform or provider management
- Type Safety: Rust's type system catches errors at compile time
- Performance: Fast Rust implementation
- Native Integration: Direct API calls, no intermediate layers
- You need to manage other infrastructure beyond Hetzner
- Your team has existing Terraform expertise
- You require Terraform's extensive module ecosystem
cargo buildcargo test --releasecargo clippy -- -D warnings
cargo fmtContributions are welcome! Please ensure your code:
- Compiles without warnings
- Passes all tests
- Follows Rust formatting conventions
- Includes documentation for public APIs
[Add your license here]
- Talos Linux - Secure Kubernetes OS
- Cilium - eBPF-based networking
- Hetzner Cloud - Affordable cloud hosting
- terraform-hcloud-talos - Inspiration for this project
- Never commit your
HCLOUD_TOKENor API credentials - Store kubeconfig files securely
- Use private networks for inter-node communication
- Enable Cilium network policies for pod-to-pod security
- Regularly update Talos and Kubernetes versions
For issues and questions:
- Check the Troubleshooting section
- Review Talos documentation
- Check Cilium documentation
- Open an issue on GitHub


