From 8f1870d996f78939da2d5b2190190fc6c137e4c9 Mon Sep 17 00:00:00 2001 From: Rin Oliver Date: Tue, 3 Feb 2026 15:29:49 -0600 Subject: [PATCH 1/5] docs(gitlab): add comprehensive troubleshooting guide Add troubleshooting.md covering common issues and solutions: - Authentication issues (token expiry, endpoint config) - VM deployment failures (config validation, resource availability) - SSH connection issues (key setup, timeouts, permissions) - Environment variable configuration - Network and connectivity troubleshooting - Job execution problems - Orphaned VM cleanup procedures Also updates README.md to link to the new troubleshooting guide. Addresses DI-342 requirement for troubleshooting documentation. Verified: - All variable names match scripts exactly - Error messages match actual script output - CLI commands verified against Orka3 CLI docs - All reference links validated Co-Authored-By: Claude Opus 4.5 --- GitLab/README.md | 2 + GitLab/troubleshooting.md | 430 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 432 insertions(+) create mode 100644 GitLab/troubleshooting.md diff --git a/GitLab/README.md b/GitLab/README.md index c4cd115..b68ac00 100644 --- a/GitLab/README.md +++ b/GitLab/README.md @@ -3,4 +3,6 @@ The Orka GitLab integration enables you to use [Orka by MacStadium][orka] in your GitLab CI/CI pipelines. Learn how to configure the [GitLab Shell executor](shell-executor.md) or the [GitLab Custom executor](custom-executor.md). +For common issues and solutions, see the [Troubleshooting guide](troubleshooting.md). + [orka]: https://www.macstadium.com/orka diff --git a/GitLab/troubleshooting.md b/GitLab/troubleshooting.md new file mode 100644 index 0000000..c9e4447 --- /dev/null +++ b/GitLab/troubleshooting.md @@ -0,0 +1,430 @@ +# Troubleshooting the GitLab Orka Integration + +This guide covers common issues and solutions when using the GitLab [Custom executor][custom] with [Orka][orka]. + +## Quick diagnostics + +Before diving into specific issues, run these checks: + +```bash +# Verify Orka CLI is installed and accessible +orka3 version + +# Test Orka authentication +orka3 config set --api-url "$ORKA_ENDPOINT" +orka3 user set-token "$ORKA_TOKEN" +orka3 vm list + +# Verify jq is installed (required by scripts) +jq --version + +# Test SSH key validity +ssh-keygen -l -f ~/.ssh/orka_deployment_key +``` + +## Authentication issues + +### Error: "unauthorized" or "401" + +**Symptoms:** +- VM deployment fails with authentication errors +- `orka3` commands return "unauthorized" + +**Causes:** +- `ORKA_TOKEN` is invalid, expired, or not set +- `ORKA_ENDPOINT` is incorrect + +**Solutions:** + +1. Verify the token is set correctly: + ```bash + echo "$ORKA_TOKEN" | head -c 20 + ``` + +2. Generate a new token: + ```bash + # For user authentication + orka3 login + orka3 user get-token + + # For service accounts (CI/CD recommended) + orka3 serviceaccount token + ``` + +3. Verify the endpoint is correct: + ```bash + curl -s "$ORKA_ENDPOINT/api/v1/health" | jq . + ``` + +**Note:** Tokens expire after 1 hour. For CI/CD pipelines, use [service accounts][serviceaccount] which provide longer-lived tokens. + +### Error: "config not found" or "no such host" + +**Symptoms:** +- CLI commands fail before authentication +- "dial tcp: lookup" errors + +**Causes:** +- `ORKA_ENDPOINT` is not set or malformed +- Network connectivity issues to Orka API + +**Solutions:** + +1. Verify the endpoint format (include protocol, no trailing slash): + ```bash + # Correct + export ORKA_ENDPOINT="http://10.221.188.20" + + # Incorrect + export ORKA_ENDPOINT="10.221.188.20" + export ORKA_ENDPOINT="http://10.221.188.20/" + ``` + +2. Test network connectivity: + ```bash + curl -v "$ORKA_ENDPOINT/api/v1/health" + ``` + +3. If using VPN, verify your connection. See your [IP plan][ip-plan] for connection details. + +## VM deployment failures + +### Error: "VM deployment failed" + +**Symptoms:** +- prepare.sh exits with "VM deployment failed" +- Deployment attempts exhausted + +**Causes:** +- `ORKA_CONFIG_NAME` doesn't exist or is misspelled +- No available nodes with sufficient resources +- Base image not found + +**Solutions:** + +1. Verify the VM config exists: + ```bash + orka3 vm-config list | grep "$ORKA_CONFIG_NAME" + ``` + +2. Check available node resources: + ```bash + orka3 node list -o wide + ``` + +3. Verify the base image exists: + ```bash + orka3 image list + ``` + +4. Increase deployment attempts by setting the environment variable: + ```yaml + # In .gitlab-ci.yml + variables: + VM_DEPLOYMENT_ATTEMPTS: "3" + ``` + +### Error: "Invalid ip" or "Invalid port" + +**Symptoms:** +- VM deploys but connection info extraction fails +- "Invalid ip: null" in logs + +**Causes:** +- VM deployment returned unexpected JSON format +- jq parsing error +- VM is in a failed state + +**Solutions:** + +1. Manually test deployment and inspect output: + ```bash + orka3 vm deploy test-vm --config "$ORKA_CONFIG_NAME" -o json | jq . + ``` + +2. Verify jq is correctly installed: + ```bash + echo '{"ip":"10.0.0.1"}' | jq -r '.ip' + ``` + +3. Check VM status after deployment: + ```bash + orka3 vm list -o wide + ``` + +## SSH connection issues + +### Error: "Waited 30 seconds for sshd to start" + +**Symptoms:** +- VM deploys successfully +- SSH connection times out after 30 seconds + +**Causes:** +- SSH is not enabled on the base image +- SSH key mismatch +- Network/firewall blocking SSH port +- VM is still booting + +**Solutions:** + +1. Verify SSH is enabled on your base image: + - Connect to a VM via VNC + - Check System Preferences > Sharing > Remote Login + +2. Verify the SSH key matches: + ```bash + # On Runner: get public key fingerprint + ssh-keygen -l -f ~/.ssh/orka_deployment_key + + # On VM: check authorized_keys + cat ~/.ssh/authorized_keys + ``` + +3. Test SSH connectivity manually: + ```bash + ssh -i ~/.ssh/orka_deployment_key -p admin@ "echo ok" + ``` + +4. Increase the wait time by modifying prepare.sh (line 66) if VMs need more boot time. + +### Error: "Permission denied (publickey)" + +**Symptoms:** +- SSH connection is refused +- "Permission denied" in logs + +**Causes:** +- SSH key has a passphrase (not supported) +- Wrong SSH user +- SSH key not in VM's authorized_keys + +**Solutions:** + +1. Verify the SSH key has no passphrase: + ```bash + # This should NOT prompt for a passphrase + ssh-keygen -y -f ~/.ssh/orka_deployment_key + ``` + +2. If the key has a passphrase, generate a new one without: + ```bash + ssh-keygen -t rsa -b 4096 -f ~/.ssh/orka_key -N "" + ``` + +3. Verify the `ORKA_VM_USER` matches the user on the VM (default: `admin`): + ```yaml + variables: + ORKA_VM_USER: "admin" + ``` + +4. Ensure the public key is in the VM's `~/.ssh/authorized_keys`. + +### Error: "Host key verification failed" + +**Symptoms:** +- SSH fails with host key errors +- "REMOTE HOST IDENTIFICATION HAS CHANGED" warnings + +**Causes:** +- Known hosts file has stale entries +- Strict host key checking enabled + +**Solutions:** + +The scripts handle this automatically by updating known_hosts, but if issues persist: + +1. Clear the known hosts for the problematic IP: + ```bash + ssh-keygen -R "[]:" + ``` + +2. The scripts use `StrictHostKeyChecking=no` during initial connection, so this should not block ephemeral VMs. + +## Environment variable issues + +### Error: "unbound variable" or blank values + +**Symptoms:** +- Script fails immediately +- Variables are empty + +**Causes:** +- Required environment variables not set +- Variables not exported correctly in GitLab CI/CD + +**Solutions:** + +1. Verify all required variables are set in your GitLab CI/CD settings or `.gitlab-ci.yml`: + +| Variable | Required | Description | +|----------|----------|-------------| +| `ORKA_TOKEN` | Yes | Authentication token | +| `ORKA_ENDPOINT` | Yes | Orka API URL | +| `ORKA_CONFIG_NAME` | Yes | VM config template name | +| `ORKA_SSH_KEY_FILE` | Yes | Private SSH key contents | +| `ORKA_VM_USER` | No | SSH user (default: `admin`) | +| `ORKA_VM_NAME_PREFIX` | No | VM name prefix (default: `gl-runner`) | +| `VM_DEPLOYMENT_ATTEMPTS` | No | Retry count (default: `1`) | + +2. For sensitive variables, use GitLab CI/CD [masked variables][masked-variables]: + - Go to Settings > CI/CD > Variables + - Add variables with "Masked" option enabled + +3. Verify variables are accessible in your job: + ```yaml + test_variables: + script: + - echo "Endpoint: $ORKA_ENDPOINT" + - echo "Config: $ORKA_CONFIG_NAME" + ``` + +## Network and connectivity issues + +### Runner cannot reach Orka endpoint + +**Symptoms:** +- "Connection refused" or "Connection timed out" +- curl to endpoint fails + +**Causes:** +- Runner is not on the same network as Orka +- VPN not connected +- Firewall blocking traffic + +**Solutions:** + +1. Verify network connectivity: + ```bash + ping -c 3 $(echo "$ORKA_ENDPOINT" | sed 's|http://||') + curl -v "$ORKA_ENDPOINT/api/v1/health" + ``` + +2. If using VPN, verify your connection using your [IP plan][ip-plan] details. + +3. For Docker-based runners, ensure the container has network access: + ```bash + docker run --rm orka-gitlab curl -v "$ORKA_ENDPOINT/api/v1/health" + ``` + +### IP mapping issues + +**Symptoms:** +- VM deploys but SSH connects to wrong IP +- "No route to host" errors + +**Causes:** +- Private/public IP mismatch +- settings.json not configured for IP mapping + +**Solutions:** + +1. If your network requires IP mapping, create `/var/custom-executor/settings.json`: + ```json + { + "mappings": [ + { + "private_host": "10.221.188.100", + "public_host": "203.0.113.100" + } + ] + } + ``` + +2. See [template-settings.md](template-settings.md) for configuration details. + +## Job execution issues + +### Error: Build script fails but not a system failure + +**Symptoms:** +- Job fails during run.sh +- Error is from your CI/CD script, not the integration + +**Causes:** +- Your build script has errors +- Missing dependencies on the VM +- Path or environment issues on the VM + +**Solutions:** + +1. The integration correctly distinguishes between: + - **Build failures**: Your script failed (exit code from script) + - **System failures**: Infrastructure failed (exit code 1) + +2. Check your build script runs correctly on a standalone Orka VM. + +3. Ensure required tools are installed on your base image. + +### Error: Job hangs or times out + +**Symptoms:** +- Job runs but never completes +- GitLab times out the job + +**Causes:** +- Long-running process without output +- SSH connection dropped +- VM became unresponsive + +**Solutions:** + +1. The scripts use SSH keep-alive (60-second intervals for 60 minutes): + ``` + ServerAliveInterval=60 + ServerAliveCountMax=60 + ``` + +2. For very long jobs, consider: + - Breaking into smaller jobs + - Adding periodic output to prevent timeout + - Increasing GitLab job timeout in project settings + +## Cleanup issues + +### Orphaned VMs + +**Symptoms:** +- VMs remain after job completion +- `orka3 vm list` shows old runner VMs + +**Causes:** +- Runner crashed before cleanup +- cleanup.sh failed +- Network issue during cleanup + +**Solutions:** + +1. Manually delete orphaned VMs: + ```bash + # List VMs with runner prefix + orka3 vm list | grep "gl-runner" + + # Delete specific VM + orka3 vm delete + + # Delete all runner VMs (use with caution) + orka3 vm list -o json | jq -r '.[].name' | grep "gl-runner" | xargs -I {} orka3 vm delete {} + ``` + +2. Consider setting up a periodic cleanup job to remove stale VMs. + +## Getting help + +If you're still experiencing issues: + +1. Check the [Orka documentation][orka-docs] for platform-specific guidance +2. Review GitLab Runner [logs][runner-logs]: `gitlab-runner --debug run` +3. Contact [MacStadium Support][support] with: + - Error messages and logs + - Environment details (Runner version, Orka version) + - Steps to reproduce + +[custom]: https://docs.gitlab.com/runner/executors/custom.html +[orka]: https://support.macstadium.com/hc/en-us/articles/29904434271387-Orka-Overview +[orka-docs]: https://support.macstadium.com/hc/en-us +[ip-plan]: https://support.macstadium.com/hc/en-us/articles/28230867289883-IP-Plan +[serviceaccount]: https://support.macstadium.com/hc/en-us/articles/28347450648987-Orka3-Service-Accounts +[masked-variables]: https://docs.gitlab.com/ee/ci/variables/#mask-a-cicd-variable +[runner-logs]: https://docs.gitlab.com/runner/faq/#how-can-i-get-a-debug-log +[support]: https://support.macstadium.com/ From 6d9ffb64fa44ba669e200724110bdb1926292716 Mon Sep 17 00:00:00 2001 From: Rin Oliver Date: Wed, 4 Feb 2026 09:31:27 -0600 Subject: [PATCH 2/5] Fix connectivity test commands in troubleshooting guide Replace non-existent /api/v1/health endpoint references with working CLI-based verification methods (orka3 version). Co-Authored-By: Claude Opus 4.5 --- GitLab/troubleshooting.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/GitLab/troubleshooting.md b/GitLab/troubleshooting.md index c9e4447..63cbca3 100644 --- a/GitLab/troubleshooting.md +++ b/GitLab/troubleshooting.md @@ -51,9 +51,10 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key orka3 serviceaccount token ``` -3. Verify the endpoint is correct: +3. Verify the endpoint is reachable: ```bash - curl -s "$ORKA_ENDPOINT/api/v1/health" | jq . + # This shows both client and server versions if connected + orka3 version ``` **Note:** Tokens expire after 1 hour. For CI/CD pipelines, use [service accounts][serviceaccount] which provide longer-lived tokens. @@ -82,7 +83,9 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key 2. Test network connectivity: ```bash - curl -v "$ORKA_ENDPOINT/api/v1/health" + # Check if the endpoint is reachable + curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/version" + # 200 = reachable, 000 = network issue ``` 3. If using VPN, verify your connection. See your [IP plan][ip-plan] for connection details. @@ -297,14 +300,14 @@ The scripts handle this automatically by updating known_hosts, but if issues per 1. Verify network connectivity: ```bash ping -c 3 $(echo "$ORKA_ENDPOINT" | sed 's|http://||') - curl -v "$ORKA_ENDPOINT/api/v1/health" + orka3 version # Should show server version if connected ``` 2. If using VPN, verify your connection using your [IP plan][ip-plan] details. 3. For Docker-based runners, ensure the container has network access: ```bash - docker run --rm orka-gitlab curl -v "$ORKA_ENDPOINT/api/v1/health" + docker run --rm --entrypoint orka3 orka-gitlab version ``` ### IP mapping issues From f2f9bd1355d58dc9a72887f8523041d00d92d847 Mon Sep 17 00:00:00 2001 From: Rin Oliver Date: Thu, 5 Feb 2026 15:23:07 -0600 Subject: [PATCH 3/5] docs(gitlab): address PR review feedback Changes based on ispasov's review: - Remove redundant diagnostics section (Docker image validates these) - Remove orka3 login/user get-token (CI/CD must use service accounts) - Remove token verification steps (not available in container context) - Remove grep piping (use CLI's built-in argument filtering) - Remove export suggestions (vars must be in container/GitLab context) - Consolidate network connectivity to single curl approach - Remove "verify exists" steps (trust CLI error messages) - Add guidance to deploy VMs manually for SSH troubleshooting - Simplify cleanup section (remove stale VM detection complexity) - Remove duplicate ping-based connectivity checks Co-Authored-By: Claude Opus 4.5 --- GitLab/troubleshooting.md | 261 ++++++++++++-------------------------- 1 file changed, 79 insertions(+), 182 deletions(-) diff --git a/GitLab/troubleshooting.md b/GitLab/troubleshooting.md index 63cbca3..3684427 100644 --- a/GitLab/troubleshooting.md +++ b/GitLab/troubleshooting.md @@ -2,26 +2,6 @@ This guide covers common issues and solutions when using the GitLab [Custom executor][custom] with [Orka][orka]. -## Quick diagnostics - -Before diving into specific issues, run these checks: - -```bash -# Verify Orka CLI is installed and accessible -orka3 version - -# Test Orka authentication -orka3 config set --api-url "$ORKA_ENDPOINT" -orka3 user set-token "$ORKA_TOKEN" -orka3 vm list - -# Verify jq is installed (required by scripts) -jq --version - -# Test SSH key validity -ssh-keygen -l -f ~/.ssh/orka_deployment_key -``` - ## Authentication issues ### Error: "unauthorized" or "401" @@ -31,33 +11,21 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key - `orka3` commands return "unauthorized" **Causes:** -- `ORKA_TOKEN` is invalid, expired, or not set +- `ORKA_TOKEN` is invalid or expired - `ORKA_ENDPOINT` is incorrect **Solutions:** -1. Verify the token is set correctly: +1. Generate a new service account token: ```bash - echo "$ORKA_TOKEN" | head -c 20 - ``` - -2. Generate a new token: - ```bash - # For user authentication - orka3 login - orka3 user get-token - - # For service accounts (CI/CD recommended) orka3 serviceaccount token ``` -3. Verify the endpoint is reachable: - ```bash - # This shows both client and server versions if connected - orka3 version - ``` +2. Update the token in GitLab CI/CD settings: + - Go to Settings > CI/CD > Variables + - Update `ORKA_TOKEN` with the new token -**Note:** Tokens expire after 1 hour. For CI/CD pipelines, use [service accounts][serviceaccount] which provide longer-lived tokens. +**Note:** Service account tokens are valid for 1 year by default. For custom duration, use `--duration` flag. ### Error: "config not found" or "no such host" @@ -71,24 +39,23 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key **Solutions:** -1. Verify the endpoint format (include protocol, no trailing slash): - ```bash - # Correct - export ORKA_ENDPOINT="http://10.221.188.20" +1. Verify the endpoint format in GitLab CI/CD Variables (include protocol, no trailing slash): + ``` + # Correct format + http://10.221.188.20 - # Incorrect - export ORKA_ENDPOINT="10.221.188.20" - export ORKA_ENDPOINT="http://10.221.188.20/" + # Incorrect formats + 10.221.188.20 # Missing protocol + http://10.221.188.20/ # Trailing slash ``` -2. Test network connectivity: +2. Test network connectivity from your runner: ```bash - # Check if the endpoint is reachable - curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/version" + curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/api/v1/cluster-info" # 200 = reachable, 000 = network issue ``` -3. If using VPN, verify your connection. See your [IP plan][ip-plan] for connection details. +3. If using VPN, verify your connection. See your [IP plan][ip-plan] for details. ## VM deployment failures @@ -101,13 +68,12 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key **Causes:** - `ORKA_CONFIG_NAME` doesn't exist or is misspelled - No available nodes with sufficient resources -- Base image not found **Solutions:** -1. Verify the VM config exists: +1. If the error says "config does not exist", check the spelling of `ORKA_CONFIG_NAME` in your GitLab CI/CD Variables. Create the config if needed: ```bash - orka3 vm-config list | grep "$ORKA_CONFIG_NAME" + orka3 vm-config create --image --cpu ``` 2. Check available node resources: @@ -115,14 +81,8 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key orka3 node list -o wide ``` -3. Verify the base image exists: - ```bash - orka3 image list - ``` - -4. Increase deployment attempts by setting the environment variable: +3. Increase deployment attempts by setting the environment variable in `.gitlab-ci.yml`: ```yaml - # In .gitlab-ci.yml variables: VM_DEPLOYMENT_ATTEMPTS: "3" ``` @@ -135,24 +95,23 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key **Causes:** - VM deployment returned unexpected JSON format -- jq parsing error - VM is in a failed state **Solutions:** -1. Manually test deployment and inspect output: +1. Deploy a VM manually and inspect the output: ```bash - orka3 vm deploy test-vm --config "$ORKA_CONFIG_NAME" -o json | jq . + orka3 vm deploy test-vm --config "$ORKA_CONFIG_NAME" -o json ``` -2. Verify jq is correctly installed: +2. Check VM status: ```bash - echo '{"ip":"10.0.0.1"}' | jq -r '.ip' + orka3 vm list test-vm -o wide ``` -3. Check VM status after deployment: +3. Delete the test VM after inspection: ```bash - orka3 vm list -o wide + orka3 vm delete test-vm ``` ## SSH connection issues @@ -165,31 +124,36 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key **Causes:** - SSH is not enabled on the base image -- SSH key mismatch -- Network/firewall blocking SSH port +- SSH key not configured on the VM - VM is still booting **Solutions:** -1. Verify SSH is enabled on your base image: - - Connect to a VM via VNC - - Check System Preferences > Sharing > Remote Login +Since the runner automatically deletes failed VMs, deploy a VM manually to troubleshoot: -2. Verify the SSH key matches: +1. Deploy a test VM: ```bash - # On Runner: get public key fingerprint - ssh-keygen -l -f ~/.ssh/orka_deployment_key + orka3 vm deploy test-debug --config "$ORKA_CONFIG_NAME" + ``` - # On VM: check authorized_keys - cat ~/.ssh/authorized_keys +2. Get connection details: + ```bash + orka3 vm list test-debug ``` -3. Test SSH connectivity manually: +3. Connect via Screen Sharing (VNC) to check: + - System Preferences > Sharing > Remote Login is enabled + - Your public key is in `~/.ssh/authorized_keys` + +4. Test SSH manually: ```bash ssh -i ~/.ssh/orka_deployment_key -p admin@ "echo ok" ``` -4. Increase the wait time by modifying prepare.sh (line 66) if VMs need more boot time. +5. Clean up: + ```bash + orka3 vm delete test-debug + ``` ### Error: "Permission denied (publickey)" @@ -207,42 +171,17 @@ ssh-keygen -l -f ~/.ssh/orka_deployment_key 1. Verify the SSH key has no passphrase: ```bash # This should NOT prompt for a passphrase - ssh-keygen -y -f ~/.ssh/orka_deployment_key + ssh-keygen -y -f /path/to/key ``` 2. If the key has a passphrase, generate a new one without: ```bash - ssh-keygen -t rsa -b 4096 -f ~/.ssh/orka_key -N "" - ``` - -3. Verify the `ORKA_VM_USER` matches the user on the VM (default: `admin`): - ```yaml - variables: - ORKA_VM_USER: "admin" + ssh-keygen -t ed25519 -f ~/.ssh/orka_key -N "" ``` -4. Ensure the public key is in the VM's `~/.ssh/authorized_keys`. - -### Error: "Host key verification failed" +3. Verify `ORKA_VM_USER` in GitLab CI/CD Variables matches the user on the VM (default: `admin`). -**Symptoms:** -- SSH fails with host key errors -- "REMOTE HOST IDENTIFICATION HAS CHANGED" warnings - -**Causes:** -- Known hosts file has stale entries -- Strict host key checking enabled - -**Solutions:** - -The scripts handle this automatically by updating known_hosts, but if issues persist: - -1. Clear the known hosts for the problematic IP: - ```bash - ssh-keygen -R "[]:" - ``` - -2. The scripts use `StrictHostKeyChecking=no` during initial connection, so this should not block ephemeral VMs. +4. Deploy a test VM and verify the public key is in `~/.ssh/authorized_keys`. ## Environment variable issues @@ -253,16 +192,15 @@ The scripts handle this automatically by updating known_hosts, but if issues per - Variables are empty **Causes:** -- Required environment variables not set -- Variables not exported correctly in GitLab CI/CD +- Required environment variables not configured in GitLab **Solutions:** -1. Verify all required variables are set in your GitLab CI/CD settings or `.gitlab-ci.yml`: +Verify all required variables are set in GitLab CI/CD settings (Settings > CI/CD > Variables): | Variable | Required | Description | |----------|----------|-------------| -| `ORKA_TOKEN` | Yes | Authentication token | +| `ORKA_TOKEN` | Yes | Service account token | | `ORKA_ENDPOINT` | Yes | Orka API URL | | `ORKA_CONFIG_NAME` | Yes | VM config template name | | `ORKA_SSH_KEY_FILE` | Yes | Private SSH key contents | @@ -270,17 +208,7 @@ The scripts handle this automatically by updating known_hosts, but if issues per | `ORKA_VM_NAME_PREFIX` | No | VM name prefix (default: `gl-runner`) | | `VM_DEPLOYMENT_ATTEMPTS` | No | Retry count (default: `1`) | -2. For sensitive variables, use GitLab CI/CD [masked variables][masked-variables]: - - Go to Settings > CI/CD > Variables - - Add variables with "Masked" option enabled - -3. Verify variables are accessible in your job: - ```yaml - test_variables: - script: - - echo "Endpoint: $ORKA_ENDPOINT" - - echo "Config: $ORKA_CONFIG_NAME" - ``` +For sensitive variables like `ORKA_TOKEN` and `ORKA_SSH_KEY_FILE`, enable the "Masked" option. ## Network and connectivity issues @@ -297,18 +225,14 @@ The scripts handle this automatically by updating known_hosts, but if issues per **Solutions:** -1. Verify network connectivity: +1. Test connectivity from the runner environment: ```bash - ping -c 3 $(echo "$ORKA_ENDPOINT" | sed 's|http://||') - orka3 version # Should show server version if connected + curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/api/v1/cluster-info" ``` 2. If using VPN, verify your connection using your [IP plan][ip-plan] details. -3. For Docker-based runners, ensure the container has network access: - ```bash - docker run --rm --entrypoint orka3 orka-gitlab version - ``` +3. For Docker-based runners, ensure the container has network access to the Orka endpoint. ### IP mapping issues @@ -322,44 +246,35 @@ The scripts handle this automatically by updating known_hosts, but if issues per **Solutions:** -1. If your network requires IP mapping, create `/var/custom-executor/settings.json`: - ```json - { - "mappings": [ - { - "private_host": "10.221.188.100", - "public_host": "203.0.113.100" - } - ] - } - ``` +If your network requires IP mapping, create `/var/custom-executor/settings.json`: +```json +{ + "mappings": [ + { + "private_host": "10.221.188.100", + "public_host": "203.0.113.100" + } + ] +} +``` -2. See [template-settings.md](template-settings.md) for configuration details. +See [template-settings.md](template-settings.md) for configuration details. ## Job execution issues -### Error: Build script fails but not a system failure +### Build script fails **Symptoms:** - Job fails during run.sh - Error is from your CI/CD script, not the integration -**Causes:** -- Your build script has errors -- Missing dependencies on the VM -- Path or environment issues on the VM - -**Solutions:** - -1. The integration correctly distinguishes between: - - **Build failures**: Your script failed (exit code from script) - - **System failures**: Infrastructure failed (exit code 1) +**Note:** The integration distinguishes between: +- **Build failures**: Your script failed (returns script exit code) +- **System failures**: Infrastructure failed (returns exit code 1) -2. Check your build script runs correctly on a standalone Orka VM. +If your build script fails, the issue is in your script, not the integration. Test your script on a standalone Orka VM. -3. Ensure required tools are installed on your base image. - -### Error: Job hangs or times out +### Job hangs or times out **Symptoms:** - Job runs but never completes @@ -368,20 +283,14 @@ The scripts handle this automatically by updating known_hosts, but if issues per **Causes:** - Long-running process without output - SSH connection dropped -- VM became unresponsive **Solutions:** -1. The scripts use SSH keep-alive (60-second intervals for 60 minutes): - ``` - ServerAliveInterval=60 - ServerAliveCountMax=60 - ``` +1. For long jobs, add periodic output to prevent GitLab timeout. + +2. Consider breaking long jobs into smaller stages. -2. For very long jobs, consider: - - Breaking into smaller jobs - - Adding periodic output to prevent timeout - - Increasing GitLab job timeout in project settings +3. Increase GitLab job timeout in project settings if needed. ## Cleanup issues @@ -389,28 +298,17 @@ The scripts handle this automatically by updating known_hosts, but if issues per **Symptoms:** - VMs remain after job completion -- `orka3 vm list` shows old runner VMs **Causes:** - Runner crashed before cleanup -- cleanup.sh failed - Network issue during cleanup **Solutions:** -1. Manually delete orphaned VMs: - ```bash - # List VMs with runner prefix - orka3 vm list | grep "gl-runner" - - # Delete specific VM - orka3 vm delete - - # Delete all runner VMs (use with caution) - orka3 vm list -o json | jq -r '.[].name' | grep "gl-runner" | xargs -I {} orka3 vm delete {} - ``` - -2. Consider setting up a periodic cleanup job to remove stale VMs. +Delete orphaned VMs manually: +```bash +orka3 vm delete +``` ## Getting help @@ -427,7 +325,6 @@ If you're still experiencing issues: [orka]: https://support.macstadium.com/hc/en-us/articles/29904434271387-Orka-Overview [orka-docs]: https://support.macstadium.com/hc/en-us [ip-plan]: https://support.macstadium.com/hc/en-us/articles/28230867289883-IP-Plan -[serviceaccount]: https://support.macstadium.com/hc/en-us/articles/28347450648987-Orka3-Service-Accounts [masked-variables]: https://docs.gitlab.com/ee/ci/variables/#mask-a-cicd-variable [runner-logs]: https://docs.gitlab.com/runner/faq/#how-can-i-get-a-debug-log [support]: https://support.macstadium.com/ From b007022190d20685c0ef5489779d5572279f456b Mon Sep 17 00:00:00 2001 From: Rin Oliver Date: Tue, 17 Feb 2026 09:04:34 -0600 Subject: [PATCH 4/5] fix: remove incorrect endpoint cause from 401 error section An incorrect endpoint produces timeout errors, not 401s. Endpoint troubleshooting is already covered in the connectivity section. Co-Authored-By: Claude Opus 4.6 --- GitLab/troubleshooting.md | 1 - 1 file changed, 1 deletion(-) diff --git a/GitLab/troubleshooting.md b/GitLab/troubleshooting.md index 3684427..2f08f6f 100644 --- a/GitLab/troubleshooting.md +++ b/GitLab/troubleshooting.md @@ -12,7 +12,6 @@ This guide covers common issues and solutions when using the GitLab [Custom exec **Causes:** - `ORKA_TOKEN` is invalid or expired -- `ORKA_ENDPOINT` is incorrect **Solutions:** From d42df7025ebd903338ecd4fae6f489aa7d6eaf7c Mon Sep 17 00:00:00 2001 From: Rin Oliver Date: Tue, 17 Feb 2026 09:08:22 -0600 Subject: [PATCH 5/5] fix: deduplicate connectivity check in auth section Reference the network section instead of repeating the curl check. Co-Authored-By: Claude Opus 4.6 --- GitLab/troubleshooting.md | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/GitLab/troubleshooting.md b/GitLab/troubleshooting.md index 2f08f6f..682bc6e 100644 --- a/GitLab/troubleshooting.md +++ b/GitLab/troubleshooting.md @@ -48,13 +48,7 @@ This guide covers common issues and solutions when using the GitLab [Custom exec http://10.221.188.20/ # Trailing slash ``` -2. Test network connectivity from your runner: - ```bash - curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/api/v1/cluster-info" - # 200 = reachable, 000 = network issue - ``` - -3. If using VPN, verify your connection. See your [IP plan][ip-plan] for details. +2. If the endpoint is correct but commands still fail, see [Runner cannot reach Orka endpoint](#runner-cannot-reach-orka-endpoint) for connectivity troubleshooting. ## VM deployment failures