Skip to content

Commit ff267d5

Browse files
committed
Switch preflights from Depot to our own BuildKit
1 parent 8702b3b commit ff267d5

File tree

15 files changed

+398
-123
lines changed

15 files changed

+398
-123
lines changed

.github/PREFLIGHT_SETUP.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Preflight Test CI Setup
2+
3+
## Required GitHub Secrets
4+
5+
The preflight tests require the following GitHub repository secrets to be configured:
6+
7+
### `FLYCTL_PREFLIGHT_CI_USER_TOKEN` (Required for deploy token tests)
8+
9+
This must be a **user token** (not a limited access token) with permissions to:
10+
- Create apps in the `flyctl-ci-preflight` organization
11+
- Create deploy tokens (requires user-level permissions)
12+
- Manage machines, volumes, and other resources
13+
14+
**How to create:**
15+
1. Log in to Fly.io with a user account that has access to the `flyctl-ci-preflight` org
16+
2. Run: `flyctl auth token`
17+
3. Copy the token (it should NOT end with `@tokens.fly.io`)
18+
4. Add it to GitHub Secrets as `FLYCTL_PREFLIGHT_CI_USER_TOKEN`
19+
20+
### `FLYCTL_PREFLIGHT_CI_FLY_API_TOKEN` (Fallback)
21+
22+
This is the fallback token used when `FLYCTL_PREFLIGHT_CI_USER_TOKEN` is not available. It can be either a user token or a limited access token.
23+
24+
**Current behavior:**
25+
- If `FLYCTL_PREFLIGHT_CI_USER_TOKEN` exists, it will be used (preferred)
26+
- If not, falls back to `FLYCTL_PREFLIGHT_CI_FLY_API_TOKEN`
27+
- Deploy token tests will fail if neither is a user token
28+
29+
## Why Two Tokens?
30+
31+
We use two separate tokens for security reasons:
32+
33+
1. **Most tests** can run with limited access tokens (more secure, limited blast radius)
34+
2. **Deploy token tests** require user tokens (can create other tokens)
35+
36+
By having both, we can use the least privileged token for most operations while still supporting the full test suite.
37+
38+
## Verifying Token Type
39+
40+
To check if a token is a user token vs limited access token:
41+
42+
```bash
43+
# Set the token
44+
export FLY_API_TOKEN="your-token-here"
45+
46+
# Check the user
47+
flyctl auth whoami
48+
```
49+
50+
**User token output:** `user@example.com` or `uuid@some-domain.com`
51+
**Limited access token output:** `uuid@tokens.fly.io` (ends with `@tokens.fly.io`)
52+
53+
Deploy token tests **require** a token that does NOT end with `@tokens.fly.io`.

.github/workflows/preflight.yml

Lines changed: 19 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,25 @@ on:
1212

1313
jobs:
1414
preflight-tests:
15+
name: "preflight-tests (${{ matrix.group }})"
1516
if: ${{ github.repository == 'superfly/flyctl' }}
1617
runs-on: ubuntu-latest
1718
strategy:
1819
fail-fast: false
1920
matrix:
20-
parallelism: [20]
21-
index:
22-
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
21+
group:
22+
- apps
23+
- deploy
24+
- launch
25+
- scale
26+
- volume
27+
- console
28+
- logs
29+
- machine
30+
- postgres
31+
- tokens
32+
- wireguard
33+
- misc
2334
steps:
2435
- uses: actions/checkout@v6
2536
- uses: actions/setup-go@v6
@@ -53,37 +64,19 @@ jobs:
5364
- name: Run preflight tests
5465
id: preflight
5566
env:
56-
FLY_PREFLIGHT_TEST_ACCESS_TOKEN: ${{ secrets.FLYCTL_PREFLIGHT_CI_FLY_API_TOKEN }}
67+
# Use user token if available (required for deploy token tests), otherwise fall back to limited token
68+
FLY_PREFLIGHT_TEST_ACCESS_TOKEN: ${{ secrets.FLYCTL_PREFLIGHT_CI_USER_TOKEN || secrets.FLYCTL_PREFLIGHT_CI_FLY_API_TOKEN }}
5769
FLY_PREFLIGHT_TEST_FLY_ORG: flyctl-ci-preflight
5870
FLY_PREFLIGHT_TEST_FLY_REGIONS: ${{ inputs.region }}
5971
FLY_PREFLIGHT_TEST_NO_PRINT_HISTORY_ON_FAIL: 'true'
6072
FLY_FORCE_TRACE: 'true'
6173
run: |
6274
mkdir -p bin
63-
if [ -e master-build/flyctl ]; then
64-
mv master-build/flyctl bin/flyctl
65-
fi
66-
if [ -e bin/flyctl ]; then
67-
chmod +x bin/flyctl
68-
fi
75+
(test -e master-build/flyctl) && mv master-build/flyctl bin/flyctl
76+
chmod +x bin/flyctl
6977
export PATH=$PWD/bin:$PATH
70-
test_opts=""
71-
if [[ "${{ github.ref }}" != "refs/heads/master" ]]; then
72-
test_opts="-short"
73-
fi
74-
test_log="$(mktemp)"
75-
function finish {
76-
rm "$test_log"
77-
}
78-
trap finish EXIT
79-
set +e
80-
go test ./test/preflight/... --tags=integration -v -timeout=15m $test_opts -run "${{ steps.test_split.outputs.run }}" | tee "$test_log"
81-
test_status=$?
82-
set -e
8378
echo -n failed= >> $GITHUB_OUTPUT
84-
awk '/^--- FAIL:/{ printf("%s ", $3) }' "$test_log" >> $GITHUB_OUTPUT
85-
echo >> $GITHUB_OUTPUT
86-
exit $test_status
79+
./scripts/preflight.sh -r "${{ github.ref }}" -g "${{ matrix.group }}" -o $GITHUB_OUTPUT
8780
- name: Post failure to slack
8881
if: ${{ github.ref == 'refs/heads/master' && failure() }}
8982
uses: slackapi/slack-github-action@91efab103c0de0a537f72a35f6b8cda0ee76bf0a

internal/build/imgsrc/docker.go

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,7 @@ func newRemoteDockerClient(ctx context.Context, apiClient flyutil.Client, flapsC
301301

302302
if !connectOverWireguard && !wglessCompatible {
303303
client := &http.Client{
304+
Timeout: 30 * time.Second, // Add timeout for each request
304305
Transport: &http.Transport{
305306
DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
306307
return tls.Dial("tcp", fmt.Sprintf("%s.fly.dev:443", app.Name), &tls.Config{})
@@ -322,9 +323,29 @@ func newRemoteDockerClient(ctx context.Context, apiClient flyutil.Client, flapsC
322323
fmt.Fprintln(streams.Out, streams.ColorScheme().Yellow("👀 checking remote builder compatibility with wireguardless deploys ..."))
323324
span.AddEvent("checking remote builder compatibility with wireguardless deploys")
324325

325-
res, err := client.Do(req)
326+
// Retry with backoff to allow DNS propagation time
327+
var res *http.Response
328+
b := &backoff.Backoff{
329+
Min: 2 * time.Second,
330+
Max: 30 * time.Second,
331+
Factor: 2,
332+
Jitter: true,
333+
}
334+
maxRetries := 10 // Up to ~5 minutes total with backoff
335+
for attempt := 0; attempt < maxRetries; attempt++ {
336+
res, err = client.Do(req)
337+
if err == nil {
338+
break
339+
}
340+
341+
if attempt < maxRetries-1 {
342+
dur := b.Duration()
343+
terminal.Debugf("Remote builder compatibility check failed (attempt %d/%d), retrying in %s (err: %v)\n", attempt+1, maxRetries, dur, err)
344+
pause.For(ctx, dur)
345+
}
346+
}
326347
if err != nil {
327-
tracing.RecordError(span, err, "failed to get remote builder settings")
348+
tracing.RecordError(span, err, "failed to get remote builder settings after retries")
328349
return nil, err
329350
}
330351

@@ -594,7 +615,7 @@ func buildRemoteClientOpts(ctx context.Context, apiClient flyutil.Client, appNam
594615
}
595616

596617
func waitForDaemon(parent context.Context, client *dockerclient.Client) (up bool, err error) {
597-
ctx, cancel := context.WithTimeout(parent, 2*time.Minute)
618+
ctx, cancel := context.WithTimeout(parent, 5*time.Minute) // 5 minutes for daemon to become responsive (includes DNS propagation time)
598619
defer cancel()
599620

600621
b := &backoff.Backoff{

internal/build/imgsrc/ensure_builder.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,7 @@ func (p *Provisioner) createBuilder(ctx context.Context, region, builderName str
531531
return nil, nil, retErr
532532
}
533533

534-
retErr = flapsClient.Wait(ctx, builderName, mach, "started", 60*time.Second)
534+
retErr = flapsClient.Wait(ctx, builderName, mach, "started", 180*time.Second) // 3 minutes for machine start + DNS propagation
535535
if retErr != nil {
536536
tracing.RecordError(span, retErr, "error waiting for builder machine to start")
537537
return nil, nil, retErr
@@ -582,7 +582,7 @@ func restartBuilderMachine(ctx context.Context, appName string, builderMachine *
582582
return err
583583
}
584584

585-
if err := flapsClient.Wait(ctx, appName, builderMachine, "started", time.Second*60); err != nil {
585+
if err := flapsClient.Wait(ctx, appName, builderMachine, "started", time.Second*180); err != nil { // 3 minutes for restart + DNS propagation
586586
tracing.RecordError(span, err, "error waiting for builder machine to start")
587587
return err
588588
}

internal/command/console/console.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,11 @@ func runConsole(ctx context.Context) error {
231231
consoleCommand = flag.GetString(ctx, "command")
232232
}
233233

234-
return ssh.Console(ctx, sshClient, consoleCommand, true, params.Container)
234+
// Allocate PTY only when no command is specified or when explicitly requested
235+
// This matches the behavior of `fly ssh console`
236+
allocPTY := consoleCommand == "" || flag.GetBool(ctx, "pty")
237+
238+
return ssh.Console(ctx, sshClient, consoleCommand, allocPTY, params.Container)
235239
}
236240

237241
func selectMachine(ctx context.Context, app *fly.AppCompact, appConfig *appconfig.Config) (*fly.Machine, func(), error) {

internal/command/deploy/machines_deploymachinesapp.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,9 @@ func (md *machineDeployment) DeployMachinesApp(ctx context.Context) error {
107107

108108
if updateErr := md.updateReleaseInBackend(ctx, status, metadata); updateErr != nil {
109109
if err == nil {
110-
err = fmt.Errorf("failed to set final release status: %w", updateErr)
110+
// Deployment succeeded, but we couldn't update the release status
111+
// This is not critical enough to fail the entire deployment
112+
terminal.Warnf("failed to set final release status after successful deployment: %v\n", updateErr)
111113
} else {
112114
terminal.Warnf("failed to set final release status after deployment failure: %v\n", updateErr)
113115
}

scanner/rails_dockerfile_test.go

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,8 @@ CMD ["rails", "server"]
4646
err = os.WriteFile(filepath.Join(dir, "Dockerfile"), []byte(customDockerfile), 0644)
4747
require.NoError(t, err)
4848

49-
// Change to test directory
50-
originalDir, _ := os.Getwd()
51-
defer os.Chdir(originalDir)
52-
err = os.Chdir(dir)
53-
require.NoError(t, err)
54-
5549
// Run the scanner - it should detect the Rails app
50+
// No need to change directories, configureRails accepts a directory path
5651
si, err := configureRails(dir, &ScannerConfig{SkipHealthcheck: true})
5752
drainHealthcheckChannel() // Wait for goroutine to complete before cleanup
5853

@@ -89,11 +84,7 @@ CMD ["rails", "server"]`
8984
err = os.WriteFile(filepath.Join(dir, "Dockerfile"), []byte(customDockerfile), 0644)
9085
require.NoError(t, err)
9186

92-
originalDir, _ := os.Getwd()
93-
defer os.Chdir(originalDir)
94-
err = os.Chdir(dir)
95-
require.NoError(t, err)
96-
87+
// No need to change directories, configureRails accepts a directory path
9788
si, err := configureRails(dir, &ScannerConfig{SkipHealthcheck: true})
9889
drainHealthcheckChannel() // Wait for goroutine to complete before cleanup
9990
require.NoError(t, err)
@@ -123,11 +114,7 @@ CMD ["rails", "server"]`
123114
err = os.WriteFile(filepath.Join(dir, "Dockerfile"), []byte(customDockerfile), 0644)
124115
require.NoError(t, err)
125116

126-
originalDir, _ := os.Getwd()
127-
defer os.Chdir(originalDir)
128-
err = os.Chdir(dir)
129-
require.NoError(t, err)
130-
117+
// No need to change directories, configureRails accepts a directory path
131118
si, err := configureRails(dir, &ScannerConfig{SkipHealthcheck: true})
132119
drainHealthcheckChannel() // Wait for goroutine to complete before cleanup
133120
require.NoError(t, err)
@@ -150,12 +137,8 @@ CMD ["rails", "server"]`
150137

151138
// Note: No Dockerfile created
152139

153-
originalDir, _ := os.Getwd()
154-
defer os.Chdir(originalDir)
155-
err = os.Chdir(dir)
156-
require.NoError(t, err)
157-
158140
// This test would need bundle to not be available, which is hard to simulate
141+
// No need to change directories, configureRails accepts a directory path
159142
// The scanner will either find bundle (and try to use it) or not find it
160143
// If bundle is not found and no Dockerfile exists, it should fail
161144

@@ -199,11 +182,7 @@ EXPOSE 3000`
199182
err = os.WriteFile(filepath.Join(dir, "Dockerfile"), []byte(customDockerfile), 0644)
200183
require.NoError(t, err)
201184

202-
originalDir, _ := os.Getwd()
203-
defer os.Chdir(originalDir)
204-
err = os.Chdir(dir)
205-
require.NoError(t, err)
206-
185+
// No need to change directories, configureRails accepts a directory path
207186
si, err := configureRails(dir, &ScannerConfig{SkipHealthcheck: true})
208187
drainHealthcheckChannel() // Wait for goroutine to complete before cleanup
209188
require.NoError(t, err)

scripts/preflight.sh

Lines changed: 68 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,22 @@
1-
#! /bin/bash
1+
#!/bin/bash
22
set -euo pipefail
33

44
ref=
5+
group=
6+
# Legacy support for numeric sharding (deprecated)
57
total=
68
index=
79
out=
810

9-
while getopts r:t:i:o: name
11+
while getopts r:g:t:i:o: name
1012
do
1113
case "$name" in
1214
r)
1315
ref="$OPTARG"
1416
;;
17+
g)
18+
group="$OPTARG"
19+
;;
1520
t)
1621
total="$OPTARG"
1722
;;
@@ -22,7 +27,7 @@ do
2227
out="$OPTARG"
2328
;;
2429
?)
25-
printf "Usage: %s: [-r REF] [-t TOTAL] [-i INDEX] [-o FILE]\n" $0
30+
printf "Usage: %s: [-r REF] [-g GROUP] [-t TOTAL] [-i INDEX] [-o FILE]\n" $0
2631
exit 2
2732
;;
2833
esac
@@ -43,12 +48,66 @@ trap finish EXIT
4348

4449
set +e
4550

46-
gotesplit \
47-
-total "$total" \
48-
-index "$index" \
49-
github.com/superfly/flyctl/test/preflight/... \
50-
-- --tags=integration -v -timeout=15m $test_opts | tee "$test_log"
51-
test_status=$?
51+
# Define test groups based on logical groupings
52+
if [[ -n "$group" ]]; then
53+
case "$group" in
54+
apps)
55+
test_pattern="^TestAppsV2"
56+
;;
57+
deploy)
58+
test_pattern="^Test(FlyDeploy|Deploy)"
59+
;;
60+
launch)
61+
test_pattern="^Test(FlyLaunch|Launch)"
62+
;;
63+
scale)
64+
test_pattern="^TestFlyScale"
65+
;;
66+
volume)
67+
test_pattern="^TestVolume"
68+
;;
69+
console)
70+
test_pattern="^TestFlyConsole"
71+
;;
72+
logs)
73+
test_pattern="^TestFlyLogs"
74+
;;
75+
machine)
76+
test_pattern="^TestFlyMachine"
77+
;;
78+
postgres)
79+
test_pattern="^TestPostgres"
80+
;;
81+
tokens)
82+
test_pattern="^TestTokens"
83+
;;
84+
wireguard)
85+
test_pattern="^TestFlyWireguard"
86+
;;
87+
misc)
88+
test_pattern="^Test(ErrOutput|ImageLabel|NoPublicIP)"
89+
;;
90+
*)
91+
echo "Unknown test group: $group"
92+
echo "Available groups: apps, deploy, launch, scale, volume, console, logs, machine, postgres, tokens, wireguard, misc"
93+
exit 1
94+
;;
95+
esac
96+
97+
go test -tags=integration -v -timeout=15m $test_opts -run "$test_pattern" github.com/superfly/flyctl/test/preflight/... | tee "$test_log"
98+
test_status=$?
99+
# Legacy numeric sharding using gotesplit (deprecated)
100+
elif [[ -n "$total" && -n "$index" ]]; then
101+
gotesplit \
102+
-total "$total" \
103+
-index "$index" \
104+
github.com/superfly/flyctl/test/preflight/... \
105+
-- --tags=integration -v -timeout=15m $test_opts | tee "$test_log"
106+
test_status=$?
107+
else
108+
echo "Error: Must specify either -g GROUP or both -t TOTAL and -i INDEX"
109+
exit 1
110+
fi
52111

53112
set -e
54113

0 commit comments

Comments
 (0)