Skip to content

inject preview deployments into nested stacks#70

Merged
xh3b4sd merged 14 commits intomainfrom
preview
Sep 19, 2025
Merged

inject preview deployments into nested stacks#70
xh3b4sd merged 14 commits intomainfrom
preview

Conversation

@xh3b4sd
Copy link
Contributor

@xh3b4sd xh3b4sd commented Sep 11, 2025

Fixes https://linear.app/splits/issue/PE-4885/emit-debug-logs-for-every-single-release-that-causes-state-drift-in.
Towards https://linear.app/splits/issue/PE-4788/support-vercel-like-preview-deployments.

{
    "time": "2025-09-17 15:38:12",
    "level": "info",
    "message": "continuing reconciliation loop",
    "domain": "1D0FD508.lite.testing.splits.org",
    "reason": "detected state drift",
    "release": "splits-lite",
    "version": "e307253e62c2da119579e2505db0b6185642ab9b",
    "caller": "/Users/xh3b4sd/project/0xSplits/kayron/pkg/operator/policy/ensure.go:68"
}

@xh3b4sd xh3b4sd self-assigned this Sep 11, 2025
Comment on lines 13 to 16
if obj.kin == Service {
c.ser[obj.ind].Artifact = c.ser[obj.ind].Artifact.Merge(obj.Artifact)
c.ser[obj.ind].Release.Deploy.Preview = obj.Release.Deploy.Preview
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly, but this enables us to reuse some information in one place, which we already fetched from the Github API in another place. E.g. the preview worker handler resets the preview flag of the release artifact of any main release that defines preview releases for its entire repository. I don't know how to make this better at this point in time.

Comment on lines +42 to +47
return Hash{
dsh: []byte("-" + upp),
low: []byte(low),
upp: []byte(upp),
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are hashing branch names for identifying preview deployments inside all of AWS. We also have to create the preview deployment domain using some kind of deterministic identifier based on the preview branch. Turned out we cannot just use branch names in preview domains because branch names have a far wider character set than domain names allow for. E.g. the domain feature/branch.lite.splits.org is invalid.

Comment on lines +3 to +6
// Active defines this worker handler to always be executed.
func (c *CloudFormation) Active() bool {
return true
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are marking almost all worker handlers in Kayron as always active. The alternative would be to wrap every worker handler in a proxy worker handler, but I thought this approach here is more explicit.

var tag string
{
tag = curTag(ima, x.Release.Docker.String())
tag = curTag(ima, x.Release.Labels.Hash.Upper(), x.Release.Docker.String())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we have potentially many ECS Services using the same service tag. So we have to differentiate via the preview tags, which contain the branch based preview hashes.

Comment on lines +87 to +92
{
byt, err = i.renPre(nam, byt)
if err != nil {
return tracer.Mask(err)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we potentially inject the preview deployments into the CloudFormation templates before uploading to S3.

Comment on lines +310 to +314
#
# AUTO GENERATED PREVIEW DEPLOYMENT 1D0FD508
#

Service1D0FD508:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can see how the CloudFormation templates get extended.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it just keeps adding to the end as new previews are created? and then once a preview is done (i.e. PR is merged/closed) it drops the updates in here related to that PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we only append. If we have outputs in those appended templates we can make sure to put the expanded resources into the Resources section properly.

Once the pull requests are not open anymore the respective resources do not get added anymore to the templates, because when we look for open pull requests in pkg/preview/expand.go, then we will not get that merged/closed result anymore.

	opt := &github.PullRequestListOptions{
		State: "open",
		ListOptions: github.ListOptions{
			PerPage: 100,
		},
	}

	var pul []*github.PullRequest
	{
		pul, _, err = p.git.PullRequests.List(context.Background(), p.own, rel.Github.String(), opt)
		if err != nil {
			return nil, tracer.Mask(err)
		}
	}

Comment on lines +18 to +23
// filter is a collection of branch name prefixes that we want to ignore when
// expanding a service release into preview releases. E.g. we do not want to
// deploy preview releases for dependabot branches.
filter = []string{
"dependabot/",
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not deploy previews for e.g. dependabot PRs. We can extend this list any time.

Comment on lines +102 to +105
{
pre.Labels.Hash = hash.New(bra)
pre.Labels.Head = ref
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where the ref comes from that we already fetched when searching for pull requests. If we have this ref, then we do not need to lookup those refs for every single branch in the reference worker handler.

Comment on lines +17 to +23
res = Resource{
Ser: p.sca.Search([]byte(" Service:")),
Tas: p.sca.Search([]byte(" TaskDefinition:")),
Dom: p.sca.Search([]byte(" DomainRecord:")),
Tar: p.sca.Search([]byte(" TargetGroup:")),
Lis: p.sca.Search([]byte(" ListenerRule:")),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is all super ugly, but it works.

Comment on lines +379 to +384
"level": "info",
"message": "continuing reconciliation loop",
"domain": "1d0fd508.lite.testing.splits.org",
"reason": "detected state drift",
"release": "splits-lite",
"version": "e307253e62c2da119579e2505db0b6185642ab9b",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should see something like this in the logs

@xh3b4sd xh3b4sd marked this pull request as ready for review September 18, 2025 15:19
@xh3b4sd xh3b4sd requested a review from r0ohafza September 18, 2025 15:19
Copy link

@mihoward21 mihoward21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, went through it all. just a few comments/questions. is this wired up on splits lite right now? can i go create some branches/pr's and play around with it there?


var out []byte
{
out = bytes.ReplaceAll(lin, sub[1], tag)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is this doing? can't tell what the regex is looking for / replacing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok i see below in the test how this works.

// deployments.

{
pri++

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we need to "reserve" a certain number of priorities per service? i.e. lite.splits.org is allotted 50 - 99, explorer.splits.org is allotted 100 - 149, teams.splits.org is allotted 150 - 199?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah well i guess just for the testing environment. but yea i think the same general idea.

Comment on lines +310 to +314
#
# AUTO GENERATED PREVIEW DEPLOYMENT 1D0FD508
#

Service1D0FD508:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it just keeps adding to the end as new previews are created? and then once a preview is done (i.e. PR is merged/closed) it drops the updates in here related to that PR?

@xh3b4sd
Copy link
Contributor Author

xh3b4sd commented Sep 19, 2025

can i go create some branches/pr's and play around with it there?

This should work for all deployed services that we define releases for. Saying that, I am realizing we can shoot ourselves potentially in the foot by defining previews for e.g. Kayron itself. That wouldn't work because Kayron is not exposed to the internet, and therefore doesn't have e.g. the ListenerRule to copy. And so the stack update would fail etc. Maybe we need a whitelist/blacklist approach here or document those details some more. Having said that, removing the preview config from the release definition should bring the CloudFormation stack back to health again, so the failed state is not critical.

For Splits Lite this works already. See 0xSplits/splits-lite#44.

@xh3b4sd xh3b4sd merged commit 74fd411 into main Sep 19, 2025
3 checks passed
@xh3b4sd xh3b4sd deleted the preview branch September 19, 2025 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants