[CI] Add Design Doc for Premerge Advisor #599

boomanaiden154 · 2025-09-16T20:35:18Z

This patch adds a design document for the premerge advisor that we plan on implementing.

Keenuts · 2025-09-18T11:53:35Z

premerge/premerge-advisor.md

+failures. It is also not reasonable to expect PR authors to spend time
+familiarizing themselves with all known flaky tests and dig through postcommit
+testing logs to see if the failures in their premerge run also occur in main.


nit/maybe out-of-topic: is that really unreasonable to look at failure logs and eyeball if we are responsible? Knowing which tests are flaky yes, but we should expect contributors to at least look at the logs if something fails

The majority of people already look at the logs. But you also need to be able to correlate the failures in your PR to postcommit failures, which is this missing piece. The heuristics I've seen most people use to determine if their PR is the culprit don't take into account whether or not there is a correlation and thus they are pretty error prone.

Keenuts · 2025-09-18T11:59:18Z

premerge/premerge-advisor.md

+We plan on implementing the web server in python with the `flask` library. All
+contributors to the premerge infrastructure are already familiar with Python,
+building web servers in Python with `flask` is relatively easy, and we do not
+need high performance or advanced features. For storage, we plan on using SQLite
+as it has support built in to Python, does not require any additional complexity
+in terms of infrastructure setup, and is reliable.


Could we consider going without the DB?
A DB, even SQLite means we have a schema, and if we change it, have a migration path.

What we need is history on the N last commits runs on main. Could this be stored in memory in the flask process? (assuming memory shared across the N workers if we use waitress)

It means if we restart the cluster, the first commit won't have history, but this shouldn't be that often and would simplify greatly incremental work on this advisor.

Whether we keep things in-memory or inside a DB, we still need a schema. The schema that we would need is also not complicated. We just need the file, the workflow run number/commit SHA, the failure message, and maybe a timestamp/otherwise unique ID. Keeping the data for all of eternity also isn't that necessary. We can just dump the existing DB and start fresh, just like we would if we were storing info in memory and the server restarted.

Storing the information in memory makes it much harder to keep track of flaky tests over long periods of time, and we lose all information on flaky tests everytime the server restarts, which isn't ideal. Server restarts are normal and expected with k8s, so the design needs to take them into account.

premerge/premerge-advisor.md

Keenuts · 2025-09-18T12:06:35Z

premerge/premerge-advisor.md

+Given we have two identical clusters that need to be able to function
+independently of each other, we also need to duplicate the web server for the
+premerge advisor. Queries and failure information uploads will go to the
+cluster-local premerge advisor web server. Periodically (eg once every thirty


By not polling, and requesting each job to upload data to the service/DB, while also having clustering means we now need to handle data inconsistencies.

What about:

each job uploads an artifact to the github action storage (meaning retention policies etc is handled by Github).

one instance of this service is polling (as the metric container does), and fetches those artifacts as they are published.

the service keeps the test state in memory (no DB schema to migrate as we iterate on this service)

the service then posts a message to each PR as workflows are polled depending on the stored state.

We could also have a single instance of this service as this is not a critical piece, a failure would just prevent sending helpful comments, but wouldn't back-up the queue.

We don't really need to handle data inconsistencies. The system is designed for eventual consistency on the order of minutes because having slightly inconsistent results doesn't really matter.

We can't use Github artifacts because we also need to support uploading data from the postcommit builders running on buildbot.

I think this is reasonably critical to the UX of the premerge system and not having two means we probably need to push maintenance to off-hours more often, but ack on looking more into keeping a single instance.

You should explicitly state in the design doc why the inconsistencies won't be a problem. It would also be a good idea to outline the basic DB schema. Try to design it so that it's easily extensible if/when we decide we need to extend it.

Also, if we're planning on using the DB to help identify flaky tests over time, which your comments imply, then that should be stated in the design doc as well.

What was the upshot of "looking more into keeping a single instance"?

I've added some text around why inconsistencies won't be a problem and also documented the current state (we have premerge write to both clusters rather than trying to have them sync between each other).

Also added some text on the database design.

premerge/premerge-advisor.md

cmtice · 2025-09-22T17:27:58Z

premerge/premerge-advisor.md

+While the premerge infrastructure is generally reliable, tests that are broken
+at HEAD, or tests that are flaky can significantly degrade the user experience.
+Failures unrelated to the PR being worked on can lead to warning fatigue which
+can compound this problem when people land failing tests into main that they


The entire phrase starting with "which can compound" needs to be reworked -- it is not clear to me what you are trying to say.

Added some text to describe what I'm trying to get at. Please take a look and let me know what you think.

premerge/premerge-advisor.md

cmtice · 2025-09-22T17:39:51Z

premerge/premerge-advisor.md

+occur regularly whether due to flaky tests, or due to main being broken. Efforts
+to fix these issues at the source and keep main always green and deflake tests
+are laudable, but not a scalable solution to the problem of false positive
+failures. It is also not reasonable to expect PR authors to spend time


suggested changes: "laudable, but not a scalable solution to" => "ongoing, but are not sufficient to prevent" and "failures." => "failure reports."

Changed from laudable to ongoing.

I think it's still important to mention that this is a scalability problem. And fixing flaky tests fixes the failures (which is ideal) rather than just making them go unreported.

premerge/premerge-advisor.md

cmtice · 2025-09-22T17:53:21Z

premerge/premerge-advisor.md

+Information on how the premerge jobs will query the server and display results
+about flaky/already broken tests is in
+[the section below](#informing-the-user). We plan on having both the premerge
+and postcommit jobs upload failure information to the web server. This enables


It would be appropriate to include a link to the postsubmit testing docs.

Also, I assume the data being uploaded from postcommit and PR jobs will be tagged and treated differently somehow (i.e. we want to store one but not the other). More details about that would be good.

We want to store both. But they will be tagged appropriately.

Added some text on how exactly we'll tag them. I don't think a design doc is the best place to get into exactly how the database schemas will be set up, but I can add that information if desired.

premerge/premerge-advisor.md

cmtice · 2025-09-22T18:12:28Z

premerge/premerge-advisor.md

+workflow failure and then update that comment everytime we get new data. This
+prevents creating much noise. This does mean that the comment might get buried
+in long running reviews, but those are not the common case and people should
+learn to expect to look for the comment earlier in the thread in such cases.


Shouldn't the comment about which test failures the user can ignore go into the Summary, at the very top?

As in the workflow summary page?

This section is specifically talking about adding comments mirroring that section. The info will be prominently placed within the comment, but the overall ordering of that thread relative to other comments in the PR will change as more comments get added.

lnihlen · 2025-09-24T15:43:40Z

premerge/premerge-advisor.md

+in long running reviews, but those are not the common case and people should
+learn to expect to look for the comment earlier in the thread in such cases.
+
+## Alternatives Considered


It looks like there are a number of off-the-shelf commercially available and open source flake detector systems. Here's a quote from Gemini:

A unit test flake detector identifies tests that pass and fail inconsistently without code changes by rerunning tests multiple times and analyzing the results for differing outcomes. These tools use historical data and CI/CD pipeline logs to flag unreliable tests, with features like auto-rerunning, historical analysis, and test quarantine for managing flagged tests. Popular examples include features within CI platforms like Azure DevOps, Mergify, and TeamCity, as well as third-party tools like Trunk.io, the flaky-tests-detection PyPI package.

I think it's worth discussing the build/buy/re-use tradeoffs here with at least a few of these.

Added a section to cover this. This actually ended up being a pretty useful exercise.

trunk.io looks almost exactly like what we want other than that it (along with all the other alternatives) only work with flaky tests rather than failures at head. We really need both to create a cohesive solution for LLVM.

cmtice · 2025-10-01T15:14:58Z

premerge/premerge-advisor.md

+failure information so that it can be queried later. We plan on having jobs
+extract failure logs and upload them to a web server. This web server in
+addition to having an endpoint to accept uploads will have an endpoint that will
+accept test failure information (logs and filenames) and return whether or not


What's the difference between accepting "uploads" and accepting "test failure information". I think I know what is meant (PR info vs postsubmit testing info) but that is not really clear here.

Reworded. One is supposed to be just uploading failure information, and the other is supposed to be requesting explanations.

cmtice · 2025-10-01T15:15:56Z

premerge/premerge-advisor.md

+about flaky/already broken tests is in
+[the section below](#informing-the-user). We plan on having both the premerge
+and [postcommit jobs](post-submit-testing.md) upload failure information to the
+web server. This enables the web server to know about failures that are not the


Maybe you should say something here about the data from the post commit jobs being stored in the DB; failures form the precommit jobs get compared against the DB to determine if the failures are new or already existed.

Added some text to cover this.

premerge/premerge-advisor.md

cmtice · 2025-10-01T15:23:11Z

premerge/premerge-advisor.md

+list, they will be added to the currently active failure list. Failures in the
+currently active list not present in the latest postcommit run will be removed
+from the currently active list. In addition to data from postcommit testing, if
+a PR lands with unexplained premerge failures, those failures are also added to


What do you mean by "unexplained premerge failures" here?

Reworded this because the system as implemented does not work like this.

premerge/premerge-advisor.md

cmtice · 2025-10-01T15:26:08Z

premerge/premerge-advisor.md

+information/links showing the tests are flaky (and the flakiness rate) or are
+broken in `main`. If all of the test failures are due to flakes or failures in
+`main`, the comment will indicate to the user that they can safely merge their
+PR despite the test failures. We plan to construct the comment in a manner


"comment" => "review comment" (maybe)?. Also, since (I think), we're also going to put this information into the Summary section, you should mention that.

It's a normal comment, not a review comment.

It's also not going into the summary section currently. If/when it does, we can update the doc to match.

premerge/premerge-advisor.md

cmtice · 2025-10-01T15:31:15Z

At a high level I think the design looks pretty good now. I've asked for additions or clarifications in a few places. I also have a bunch of editorial suggestions, but those can wait for now.

cmtice · 2025-12-15T23:32:09Z

premerge/premerge-advisor.md

+keeping `main` green all the time and deflake tests rather than work on a
+premerge advisor. Both of these alternatives are considered below.
+
+### Deflaking Tests and Keeping Main Green


Since this section appears to be describing something we considered but did not do (and are not planning on doing?), I would prefix the section title with "Alternative: ". Just a suggestion.

It's a subtitle under the Alternatives Considered section, so it should be clear that it is nested within that section when actually rendered (although that does depend on the markdown renderer).

boomanaiden154 force-pushed the premerge-advisor-design-doc branch 2 times, most recently from d12d861 to e688059 Compare September 17, 2025 21:31

[CI] Add Design Doc for Premerge Advisor

3195b70

This patch adds a design document for the premerge advisor that we plan on implementing.

boomanaiden154 force-pushed the premerge-advisor-design-doc branch from e688059 to 3195b70 Compare September 17, 2025 23:57

boomanaiden154 marked this pull request as ready for review September 17, 2025 23:58

boomanaiden154 requested review from Keenuts, cmtice, dschuff, gburgessiv and lnihlen September 17, 2025 23:58

Keenuts reviewed Sep 18, 2025

View reviewed changes

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Sep 22, 2025

View reviewed changes

lnihlen reviewed Sep 24, 2025

View reviewed changes

some feedback

89d6122

boomanaiden154 requested review from Keenuts and cmtice September 29, 2025 07:40

Add section on existing systems

9a4ab36

cmtice reviewed Oct 1, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Oct 1, 2025

View reviewed changes

premerge/premerge-advisor.md Show resolved Hide resolved

cmtice reviewed Oct 1, 2025

View reviewed changes

premerge/premerge-advisor.md Show resolved Hide resolved

cmtice reviewed Oct 1, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Oct 1, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

cmtice reviewed Oct 1, 2025

View reviewed changes

premerge/premerge-advisor.md Outdated Show resolved Hide resolved

boomanaiden154 mentioned this pull request Oct 19, 2025

[CI] Add Initial Wiring for Premerge Advisor Explanations llvm/llvm-project#164132

Merged

update doc

8465728

boomanaiden154 requested a review from cmtice December 15, 2025 19:57

cmtice reviewed Dec 15, 2025

View reviewed changes

feedback

170137b

boomanaiden154 requested a review from cmtice December 16, 2025 01:22

[CI] Add Design Doc for Premerge Advisor #599

Are you sure you want to change the base?

[CI] Add Design Doc for Premerge Advisor #599

Uh oh!

Conversation

boomanaiden154 commented Sep 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

boomanaiden154 Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

boomanaiden154 Sep 18, 2025 •

edited

Loading