Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions GitLab/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@
The Orka GitLab integration enables you to use [Orka by MacStadium][orka] in your GitLab CI/CI pipelines.
Learn how to configure the [GitLab Shell executor](shell-executor.md) or the [GitLab Custom executor](custom-executor.md).

For common issues and solutions, see the [Troubleshooting guide](troubleshooting.md).

[orka]: https://www.macstadium.com/orka
323 changes: 323 additions & 0 deletions GitLab/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,323 @@
# Troubleshooting the GitLab Orka Integration

This guide covers common issues and solutions when using the GitLab [Custom executor][custom] with [Orka][orka].

## Authentication issues

### Error: "unauthorized" or "401"

**Symptoms:**
- VM deployment fails with authentication errors
- `orka3` commands return "unauthorized"

**Causes:**
- `ORKA_TOKEN` is invalid or expired

**Solutions:**

1. Generate a new service account token:
```bash
orka3 serviceaccount token <service-account-name>
```

2. Update the token in GitLab CI/CD settings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned there are two ways of passing there variables:

  1. Either though the Gitlab UI
  2. Or on container startup (docker run...)

My suggestion would be to extract a section that tells people how vars are set and in all the places where we say "Make sure this variable is correctly set" to point them to the section so they know how exactly to set the variable.

WDYT

- Go to Settings > CI/CD > Variables
- Update `ORKA_TOKEN` with the new token

**Note:** Service account tokens are valid for 1 year by default. For custom duration, use `--duration` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the control plane. EKS does not allow you to have such long tokens. So there you would have to use the no-expiration flag.


### Error: "config not found" or "no such host"

**Symptoms:**
- CLI commands fail before authentication
- "dial tcp: lookup" errors

**Causes:**
- `ORKA_ENDPOINT` is not set or malformed
- Network connectivity issues to Orka API

**Solutions:**

1. Verify the endpoint format in GitLab CI/CD Variables (include protocol, no trailing slash):
```
# Correct format
http://10.221.188.20

# Incorrect formats
10.221.188.20 # Missing protocol
http://10.221.188.20/ # Trailing slash
```

2. If the endpoint is correct but commands still fail, see [Runner cannot reach Orka endpoint](#runner-cannot-reach-orka-endpoint) for connectivity troubleshooting.

## VM deployment failures

### Error: "VM deployment failed"

**Symptoms:**
- prepare.sh exits with "VM deployment failed"
- Deployment attempts exhausted

**Causes:**
- `ORKA_CONFIG_NAME` doesn't exist or is misspelled
- No available nodes with sufficient resources

**Solutions:**

1. If the error says "config does not exist", check the spelling of `ORKA_CONFIG_NAME` in your GitLab CI/CD Variables. Create the config if needed:
```bash
orka3 vm-config create <config-name> --image <image-name> --cpu <count>
```

2. Check available node resources:
```bash
orka3 node list -o wide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need the wide flag here to see the available resources.

```

3. Increase deployment attempts by setting the environment variable in `.gitlab-ci.yml`:
```yaml
variables:
VM_DEPLOYMENT_ATTEMPTS: "3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in the yml file? And not in the Gitlab UI? I would suggest an unified approach when setting variables. As I mentioned above we can have a section that talks about how vars are set so we can let users pick whatever works for them

```

### Error: "Invalid ip" or "Invalid port"

**Symptoms:**
- VM deploys but connection info extraction fails
- "Invalid ip: null" in logs

**Causes:**
- VM deployment returned unexpected JSON format
- VM is in a failed state

**Solutions:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't really a solution for this as it is just a symptom. So one needs to be able to find the root cause and fix it.
Usually if the VM information cannot be extracted, this means there are bigger issues.
It is a good suggestion to ask people to try to deploy a VM manually, but we do not give them anything actionable that can fix the issue.


1. Deploy a VM manually and inspect the output:
```bash
orka3 vm deploy test-vm --config "$ORKA_CONFIG_NAME" -o json
```

2. Check VM status:
```bash
orka3 vm list test-vm -o wide
```

3. Delete the test VM after inspection:
```bash
orka3 vm delete test-vm
```

## SSH connection issues

### Error: "Waited 30 seconds for sshd to start"

**Symptoms:**
- VM deploys successfully
- SSH connection times out after 30 seconds

**Causes:**
- SSH is not enabled on the base image
- SSH key not configured on the VM
- VM is still booting

**Solutions:**

Since the runner automatically deletes failed VMs, deploy a VM manually to troubleshoot:

1. Deploy a test VM:
```bash
orka3 vm deploy test-debug --config "$ORKA_CONFIG_NAME"
```

2. Get connection details:
```bash
orka3 vm list test-debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get the connection details from the output of the deploy command. There is no need for a separate command.

```

3. Connect via Screen Sharing (VNC) to check:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScreenSharing and VNC are two different things.
So maybe ScreenSharing or VNC.

- System Preferences > Sharing > Remote Login is enabled
- Your public key is in `~/.ssh/authorized_keys`

4. Test SSH manually:
```bash
ssh -i ~/.ssh/orka_deployment_key -p <PORT> admin@<VM_IP> "echo ok"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runner deletes the VMs that fail.
So here we need to suggest deploying a VM manually and trying to connect to it.

```

5. Clean up:
```bash
orka3 vm delete test-debug
```

### Error: "Permission denied (publickey)"

**Symptoms:**
- SSH connection is refused
- "Permission denied" in logs

**Causes:**
- SSH key has a passphrase (not supported)
- Wrong SSH user
- SSH key not in VM's authorized_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the key is wrong


**Solutions:**

1. Verify the SSH key has no passphrase:
```bash
# This should NOT prompt for a passphrase
ssh-keygen -y -f /path/to/key
```

2. If the key has a passphrase, generate a new one without:
```bash
ssh-keygen -t ed25519 -f ~/.ssh/orka_key -N ""
```

3. Verify `ORKA_VM_USER` in GitLab CI/CD Variables matches the user on the VM (default: `admin`).

4. Deploy a test VM and verify the public key is in `~/.ssh/authorized_keys`.

## Environment variable issues

### Error: "unbound variable" or blank values

**Symptoms:**
- Script fails immediately
- Variables are empty

**Causes:**
- Required environment variables not configured in GitLab

**Solutions:**

Verify all required variables are set in GitLab CI/CD settings (Settings > CI/CD > Variables):

| Variable | Required | Description |
|----------|----------|-------------|
| `ORKA_TOKEN` | Yes | Service account token |
| `ORKA_ENDPOINT` | Yes | Orka API URL |
| `ORKA_CONFIG_NAME` | Yes | VM config template name |
| `ORKA_SSH_KEY_FILE` | Yes | Private SSH key contents |
| `ORKA_VM_USER` | No | SSH user (default: `admin`) |
| `ORKA_VM_NAME_PREFIX` | No | VM name prefix (default: `gl-runner`) |
| `VM_DEPLOYMENT_ATTEMPTS` | No | Retry count (default: `1`) |

For sensitive variables like `ORKA_TOKEN` and `ORKA_SSH_KEY_FILE`, enable the "Masked" option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not recommend passing the whole file via the the Gitlab UI, but rather mounting the file inside the container.


## Network and connectivity issues

### Runner cannot reach Orka endpoint

**Symptoms:**
- "Connection refused" or "Connection timed out"
- curl to endpoint fails

**Causes:**
- Runner is not on the same network as Orka
- VPN not connected
- Firewall blocking traffic

**Solutions:**

1. Test connectivity from the runner environment:
```bash
curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/api/v1/cluster-info"
```

2. If using VPN, verify your connection using your [IP plan][ip-plan] details.

3. For Docker-based runners, ensure the container has network access to the Orka endpoint.

### IP mapping issues

**Symptoms:**
- VM deploys but SSH connects to wrong IP
- "No route to host" errors

**Causes:**
- Private/public IP mismatch
- settings.json not configured for IP mapping

**Solutions:**

If your network requires IP mapping, create `/var/custom-executor/settings.json`:
```json
{
"mappings": [
{
"private_host": "10.221.188.100",
"public_host": "203.0.113.100"
}
]
}
```

See [template-settings.md](template-settings.md) for configuration details.

## Job execution issues

### Build script fails

**Symptoms:**
- Job fails during run.sh
- Error is from your CI/CD script, not the integration

**Note:** The integration distinguishes between:
- **Build failures**: Your script failed (returns script exit code)
- **System failures**: Infrastructure failed (returns exit code 1)

If your build script fails, the issue is in your script, not the integration. Test your script on a standalone Orka VM.

### Job hangs or times out

**Symptoms:**
- Job runs but never completes
- GitLab times out the job

**Causes:**
- Long-running process without output
- SSH connection dropped

**Solutions:**

1. For long jobs, add periodic output to prevent GitLab timeout.

2. Consider breaking long jobs into smaller stages.

3. Increase GitLab job timeout in project settings if needed.

## Cleanup issues

### Orphaned VMs

**Symptoms:**
- VMs remain after job completion

**Causes:**
- Runner crashed before cleanup
- Network issue during cleanup

**Solutions:**

Delete orphaned VMs manually:
```bash
orka3 vm delete <vm-name>
```

## Getting help

If you're still experiencing issues:

1. Check the [Orka documentation][orka-docs] for platform-specific guidance
2. Review GitLab Runner [logs][runner-logs]: `gitlab-runner --debug run`
3. Contact [MacStadium Support][support] with:
- Error messages and logs
- Environment details (Runner version, Orka version)
- Steps to reproduce

[custom]: https://docs.gitlab.com/runner/executors/custom.html
[orka]: https://support.macstadium.com/hc/en-us/articles/29904434271387-Orka-Overview
[orka-docs]: https://support.macstadium.com/hc/en-us
[ip-plan]: https://support.macstadium.com/hc/en-us/articles/28230867289883-IP-Plan
[masked-variables]: https://docs.gitlab.com/ee/ci/variables/#mask-a-cicd-variable
[runner-logs]: https://docs.gitlab.com/runner/faq/#how-can-i-get-a-debug-log
[support]: https://support.macstadium.com/