If you want to skip reading the context and go directly to the getting started section, click here.
- Time-based completion: Finish thousands of tests in your target timeframe (2-5 minutes).
- Dynamic auto scaling: Auto-adjust runners based on test load.
- Smart distribution: Balance workload by execution time, not test count.
In the below example, we see more than three thousand tests run in just 1.5 minutes with a desired run time of 2 minutes in total.
This action covers both execution modes in Playwright:
- When
fullyParallel=true- Parallel run of all individual test cases on runners - When
fullyParallel=false- Parallel run of all individual test files on runners
We will explain this by looking into the details of how Playwright Sharding works and what problems it brings with its implementation.
Playwright Sharding is an out-of-the-box solution from Playwright to allow distributed runs on any machine. Its inner workings and GitHub usage examples have two main flaws.
Playwright sharding distributes tests based on the total count of tests (balancing shards) and not based on how much time each test takes to complete. Since sharding is not time aware of every test while distributing tests on runners, it results in situations as below.
Playwright gives a GitHub actions example that shows how we can use a GitHub matrix strategy to distribute tests on a fixed number of runners (4 in the given example). This results in inefficiencies as shown below.
| Aspect | π« Playwright Sharding | β RunWright |
|---|---|---|
| Timely feedback on pull requests | π Test runs that take a long time cannot be run with every pull request. | β‘οΈ Fast and predictable run times make it possible to run system tests with every PR. |
| Trust in Tests | π Tests that aren't run with each PR don't get fixed with each PR. They are often run after new changes are already merged into the main branch. As seen frequently, such tests give false positives due to new changes and break the team's trust in them. | π Tests that are run with every PR get fixed with the PRs. They provide timely feedback to developers, give true positives, and improve the team's trust in them. |
| Maintenance Fatigue | π© Tests that are not fixed with PRs get passed on to QAs. When this happens frequently, which it often does, it results in maintenance fatigue in QAs. QAs find themselves demotivated and stuck in this never-ending cycle of fixing broken tests, with little to no time to do anything else that is meaningful. | π When developers are responsible for fixing the tests that are broken due to their own changes, it frees up time for testers in the team to do more meaningful work such as exploratory testing, writing new tests for missing functionality, learning new ways of testing, and mentoring team members on testing and automation. |
| To increase Test Coverage or not? | π Increased test run times create pressure on the team to limit test suite growth and over-optimize existing tests rather than adding new tests to increase test coverage. | π When teams have a solution and setup that can always finish tests in a fixed time (say 2 to 5 minutes), it encourages them to write new tests to increase test coverage for missing functionality and not worry about over-optimization to keep run times in check. |
| Runner Scaling Efficiency | π Adding more runners has diminishing returns and doesn't guarantee proportional time savings | π‘ Smart auto-scaling based on test load gives consistent and directly proportional performance benefits |
| Costs and returns | πΈ As we have seen, with more added runners, the infrastructure costs grow in proportion but with diminishing performance results. | π° Infrastructure costs are always in proportion to our test run demands, and we only pay for what we use. Nothing more. Nothing less. |
| Scalability Potential | π Approach doesn't scale well with an increased number of tests. | π Excellent scalability that grows efficiently with test suite expansion, always keeping total run time fixed to our desired times (say 2 to 5 minutes regardless of total tests to run) |
Key Takeaway: RunWright transforms system testing from a burden into an enabler, allowing teams to maintain fast feedback loops while scaling their test suites confidently.
** At the time of writing this document, there are no known other solutions (paid or open source) that can do this using Playwright and GitHub.
To build a solution that is "time aware" and that can "auto-scale" based on the "current test load," there are a few things that we need.
π i.e.:
- Ξ£ T_i = TestRunTimeForEachTest(i) = execution time of test i (from state.json)
- We get this value from the
state.jsonfile that is generated using a custom state-reporter.js file and committed on a post-commit hook.
- We get this value from the
- N = total number of tests to run.
- We get the test scope by running the playwright command with the
--listoption.
- We get the test scope by running the playwright command with the
- TargetRunTime = total desired time to complete the run (in minutes)
- We get this as input from the user.
- TotalLoad = Ξ£ T_i = total test load (in terms of test run time)
- We iterate over each runner to keep the
Ξ£ T_i <= TargetRunTime. - Note that the total run time for each runner is affected by the number of parallel threads and is explained in more detail in the next section.
- We iterate over each runner to keep the
- Cores = number of cores per runner.
- Default cores on GitHub linux public runners is 4.
- Default cores on GitHub linux private runners is 2.
- For enterprise projects, it is possible to request for custom powerful larger runners that have higher cores.
- For Linux runners, the action can calculate the cores at run time with this command:
NUM_CORES=$(nproc)
- Threads (Parallel threads per runner).
- Recommended Threads per runner is half of cores; i.e. (Threads = Cores / 2).
- Runners = Total number of required runners.
- We calculate the optimal required runners as shown in the next section by using all the above available information.
- Providing runners as a GitHub dynamic matrix.
- GitHub fromJSON and GITHUB_OUTPUT variables makes it possible to pass dynamic matrix from one job to another.
- Note: It is good to note that it is not straightforward to pass the matrix variables using other variable options (such as setting as environment variables or taking from user as workflow input variables). Because of this reason, creating dynamic matrix remains a bit of a mystery and thats why most teams end up using hardcoded matrix in their workflows.
- Pro Tip: Users can Limit the maximum number of runners to a sensible limit (say 20) in their caller workflow to avoid spinning up hundreds of runners, in case of a huge test set with very long tests and short total run time wishes. Also good to note that this variable cannot be passed as input variable. So it must be hardcoded and fixed in the workflow files.
π Equation
Since we know every individual TestRunTimeForEachTest(i) from state.json, the total workload is:
Total parallel capacity available on the runners:
Equating Load and Capacity:
Solving for Runners:
Finally, we piece all this information together in this custom runwright GitHub action and give you:
- a
dynamic-matrix - and
test-load-distribution-jsonas output variables.
- Dev/Tester adds/updates a test.
- Tries to commit changes.
- Pre-commit git hook automatically runs the
npx playwright test --only-changedcommand to run tests. - Custom state-reporter.js, which is added in the playwright.config.ts file as a reporter, runs after the tests are finished, running and updating the state.json file with the affected test run times.
- The changes staged at the time of running the pre-commit hook are committed (except the state.json file, which is updated as a result of the pre-commit hook itself).
- Post-commit, a post-commit hook runs and commits the state.json file as well (skipping running of pre-commit hooks again).
- User provides a desired total run time (either pre-defined and hard-coded for pull-request/push triggers, to say 2 or 4 minutes, or by giving it manually if using a workflow dispatch workflow).
- Based on the above
input run timeand the information in thestate.jsonfile, therunwrightaction has all the information to calculate the minimum required runners as per test load. - The exact tests that need to run on each runner are also calculated and received from the action to the caller workflow.
- Each runner then runs the tests in its scope and creates a
blob reportfor the tests that it has run. - Once all runners are finished running tests, another job consolidates all the
blob reportsto create a consolidatedfinal test report. - You can now verify if the total test run time was within your desired run time. Your desired total run time should always be a little higher than your slowest tests, since that is your limiting factor. Tip: If you have very lengthy tests, try breaking them down into smaller atomic or partial integration tests.
There are 3 main steps involved:
-
Add a
pre-commithook file as shown here.- This will run
--only-changedtests on local commits.
- This will run
-
Copy the state-reporter.js file and put it in the root repository.
- This will create a
state.jsonfile that contains the mapping of test path and the time it took to run (in ms).
- This will create a
-
Update the playwright.config.ts file reporters to include this reporter as shown below.
reporter: [["list"], ["html"], ["github"], ["./state-reporter.js"]], -
Add a
post-commithook file as shown here.- This will automatically stage and commit the updated
state.jsonfile to the feature branch.
- This will automatically stage and commit the updated
-
Add a reusable workflow that can take inputs from the user to run playwright commands and finish tests in x minutes.
-
Add an example trigger workflow that shows how to use the reusable workflow to run desired tests. Here are a few examples of
trigger workflows:
Step 2: Run tests based on the defined triggers (push to main, pull_request to main, schedule, on demand, etc.)
- To test the setup, use one of the below workflows (either in your own test repository or fork the above sandbox repository to try it out).
For reference, an example-workflow.yml file is also available in the root of the RunWright GitHub project.
- If you find any issues, use the issues page to raise them.
- Do not use sharding-related commands in the input playwright command to run, since this solution is meant to overcome the flaws of sharding. Using sharding again would introduce those shortcomings again.
- If you are using custom powerful GitHub runners, use the same custom runner type for the job that evaluates "RunWright" as what you would use in the subsequent job for running tests. This is important to correctly calculate the workers (threads), which is half of the cores of the runners.
| Sr No | Test Description | Test Condition | Expected Result | Actual Result | Status |
|---|---|---|---|---|---|
| 1 | Run with 0 tests found | npx playwright test --grep="non-existent-test" |
Action should handle gracefully, set RUNNER_COUNT=1, and exit successfully | β Action exits gracefully with "No tests found" message | β PASS |
| 2 | Test with fullyParallel=true | npx playwright test with -fully-parallel=true |
Should distribute individual tests across runners | β Looking into runwright job, we can see that tests from same file are distributed across different runners | β PASS |
| 3 | Test with fullyParallel=false | npx playwright test with -fully-parallel=false |
Should distribute tests files across runners | β File-level distribution works correctly. No tests from same file + project seen in different runners. With each file size of 102 seconds run time, and 2 workers, each runner getting 2 files is also accurate. Slowest runner time was 2 mins 48 seconds. Total run time in html report = 1.9m | β PASS |
| 4 | Run with very few tests (with < 2 min total run time) | npx playwright test --grep='Wait for 5 seconds' |
Should create 1 runner with optimal worker allocation | β Creates 1 runner, completes in less than <2 mins> | β PASS |
| 5 | Run with all tests (~30 mins when run sequentially) | npx playwright test |
Should create around 8 runners with optimal worker allocation | β Creates 9 runners. Slowest runner time was 2 mins 59 seconds. Total run time on html report: 2.1m | β PASS |
| 6 | Run ~1.5k tests in 2 minutes | [Test Command Placeholder] | Should create optimal number of runners to finish within 2 minutes | β Completes in ~2 minutes with dynamic runner allocation (as seen in the early tests image. runner id not available) | β PASS |
| 7 | Run ~3k tests in 2 minutes | npx playwright test |
Should scale up runners appropriately to meet time constraint | β Scales to multiple runners, finishes in ~2 minutes | β PASS |
| 8 | Test missing from state.json | Delete tests from file06 for 20,25 seconds for firefox in state.json. Run command npx playwright test --grep='Wait for 20 seconds' --project=firefox |
Should fail with clear error message and suggestions. No grace failure to avoid "false positive" situation from runs. | β Provides clear error with post-commit hook guidance | β PASS |
| 9 | Single project configuration | npx playwright test --project='chromium' |
Should work with single browser project | β Handles single project scenarios correctly | β PASS |
| 10 | Multiple project configuration | npx playwright test --project='chromium' --project='webkit' |
Should group tests by project within runners | β Correctly groups and distributes multi-project tests | β PASS |
| 11 | CPU core detection | [check any of previous runs] | Should detect available cores and calculate optimal workers | β Detects cores correctly, sets workers = cores/2 | β PASS |
| 12 | Dynamic matrix generation | [check any of previous runs] | Should create proper GitHub Actions matrix format | β Generates valid JSON array for matrix strategy | β PASS |
| 13 | Load balancing accuracy | [check any of previous runs] | Distribution should be based on execution time, not test count | β Uses actual test execution times for optimal distribution | β PASS |
| 14 | Runner utilization | [check any of previous runs] | All runners should finish at approximately the same time | β Even load distribution across all runners | β PASS |
| 15 | Browser caching | [check any of previous runs] | Should cache and reuse Playwright browsers efficiently | β Implements proper browser caching strategy | β PASS |
| 16 | Error handling for malformed state.json | [Test Command Placeholder] | Should provide clear error when state.json is corrupted | β Handles JSON parsing errors gracefully | β PASS |
| 17 | Large test suite scalability | run with .5 seconds (before putting a restriction for minimum time) | Should handle test suites with 5k+ tests efficiently | β Scales appropriately for large test suites (tested with tests of size 3k+) | β PASS |
| 18 | Custom runner types compatibility | [Test Command Placeholder] | Should work with custom GitHub runner configurations | β Compatible with custom runner specifications | οΌ NOT-YET-TESTED |
| 19 | Output format validation | [check any of previous runs] | All outputs should be valid JSON and consumable by workflows | β All outputs are properly formatted and consumable | β PASS |
| 20 | Invalid time input (< 1 minute) | [Test Command Placeholder] | Should handle minimum time constraint appropriately | β Validates minimum 1 minute requirement | β PASS |
- It could be a good idea to generate the
state.jsonfile from scratch every few days or weeks to avoid having redundant test paths and names.
- Add option for when a user doesn't want to limit by time but wants to limit the maximum runners to use.
- Added instructions in the so-how-does-it-work -> runners -> pro-tip section to explain how to achieve this at callers end.








