RunWright

If you want to skip reading the context and go directly to the getting started section, click here.

🚀 Core features

Time-based completion: Finish thousands of tests in your target timeframe (2-5 minutes).
Dynamic auto scaling: Auto-adjust runners based on test load.
Smart distribution: Balance workload by execution time, not test count.

In the below example, we see more than three thousand tests run in just 1.5 minutes with a desired run time of 2 minutes in total.

Scope

This action covers both execution modes in Playwright:

When fullyParallel=true - Parallel run of all individual test cases on runners
When fullyParallel=false - Parallel run of all individual test files on runners

Why this action? What's wrong with Playwright Sharding?

We will explain this by looking into the details of how Playwright Sharding works and what problems it brings with its implementation.

With Playwright Sharding

Playwright Sharding is an out-of-the-box solution from Playwright to allow distributed runs on any machine. Its inner workings and GitHub usage examples have two main flaws.

1. Playwright sharding results in uneven test distribution on runners.

Playwright sharding distributes tests based on the total count of tests (balancing shards) and not based on how much time each test takes to complete. Since sharding is not time aware of every test while distributing tests on runners, it results in situations as below.

2. Fixed runners that do not scale up or down based on test load.

Playwright gives a GitHub actions example that shows how we can use a GitHub matrix strategy to distribute tests on a fixed number of runners (4 in the given example). This results in inefficiencies as shown below.

With RunWright

1. Even test load distribution, based on your pre-decided total run time, to finish tests.

2. Dynamic runners that scale up or down based on test load.

Pros and Cons: Playwright Sharding vs RunWright

Aspect	🚫 Playwright Sharding	✅ RunWright
Timely feedback on pull requests	🐌 Test runs that take a long time cannot be run with every pull request.	⚡️ Fast and predictable run times make it possible to run system tests with every PR.
Trust in Tests	📉 Tests that aren't run with each PR don't get fixed with each PR. They are often run after new changes are already merged into the main branch. As seen frequently, such tests give false positives due to new changes and break the team's trust in them.	📈 Tests that are run with every PR get fixed with the PRs. They provide timely feedback to developers, give true positives, and improve the team's trust in them.
Maintenance Fatigue	😩 Tests that are not fixed with PRs get passed on to QAs. When this happens frequently, which it often does, it results in maintenance fatigue in QAs. QAs find themselves demotivated and stuck in this never-ending cycle of fixing broken tests, with little to no time to do anything else that is meaningful.	😇 When developers are responsible for fixing the tests that are broken due to their own changes, it frees up time for testers in the team to do more meaningful work such as exploratory testing, writing new tests for missing functionality, learning new ways of testing, and mentoring team members on testing and automation.
To increase Test Coverage or not?	📉 Increased test run times create pressure on the team to limit test suite growth and over-optimize existing tests rather than adding new tests to increase test coverage.	📈 When teams have a solution and setup that can always finish tests in a fixed time (say 2 to 5 minutes), it encourages them to write new tests to increase test coverage for missing functionality and not worry about over-optimization to keep run times in check.
Runner Scaling Efficiency	📉 Adding more runners has diminishing returns and doesn't guarantee proportional time savings	💡 Smart auto-scaling based on test load gives consistent and directly proportional performance benefits
Costs and returns	💸 As we have seen, with more added runners, the infrastructure costs grow in proportion but with diminishing performance results.	💰 Infrastructure costs are always in proportion to our test run demands, and we only pay for what we use. Nothing more. Nothing less.
Scalability Potential	🔒 Approach doesn't scale well with an increased number of tests.	🚀 Excellent scalability that grows efficiently with test suite expansion, always keeping total run time fixed to our desired times (say 2 to 5 minutes regardless of total tests to run)

Key Takeaway: RunWright transforms system testing from a burden into an enabler, allowing teams to maintain fast feedback loops while scaling their test suites confidently.

** At the time of writing this document, there are no known other solutions (paid or open source) that can do this using Playwright and GitHub.

💡 So how does it work?

To build a solution that is "time aware" and that can "auto-scale" based on the "current test load," there are a few things that we need.

🔁 i.e.:

Σ T_i = TestRunTimeForEachTest(i) = execution time of test i (from state.json)
- We get this value from the state.json file that is generated using a custom state-reporter.js file and committed on a post-commit hook.
N = total number of tests to run.
- We get the test scope by running the playwright command with the --list option.
TargetRunTime = total desired time to complete the run (in minutes)
- We get this as input from the user.
TotalLoad = Σ T_i = total test load (in terms of test run time)
- We iterate over each runner to keep the Σ T_i <= TargetRunTime.
- Note that the total run time for each runner is affected by the number of parallel threads and is explained in more detail in the next section.
Cores = number of cores per runner.
- Default cores on GitHub linux public runners is 4.
- Default cores on GitHub linux private runners is 2.
- For enterprise projects, it is possible to request for custom powerful larger runners that have higher cores.
- For Linux runners, the action can calculate the cores at run time with this command: NUM_CORES=$(nproc)
Threads (Parallel threads per runner).
- Recommended Threads per runner is half of cores; i.e. (Threads = Cores / 2).
Runners = Total number of required runners.
- We calculate the optimal required runners as shown in the next section by using all the above available information.
- Providing runners as a GitHub dynamic matrix.
  - GitHub fromJSON and GITHUB_OUTPUT variables makes it possible to pass dynamic matrix from one job to another.
  - Note: It is good to note that it is not straightforward to pass the matrix variables using other variable options (such as setting as environment variables or taking from user as workflow input variables). Because of this reason, creating dynamic matrix remains a bit of a mystery and thats why most teams end up using hardcoded matrix in their workflows.
- Pro Tip: Users can Limit the maximum number of runners to a sensible limit (say 20) in their caller workflow to avoid spinning up hundreds of runners, in case of a huge test set with very long tests and short total run time wishes. Also good to note that this variable cannot be passed as input variable. So it must be hardcoded and fixed in the workflow files.

📐 Equation

Since we know every individual TestRunTimeForEachTest(i) from state.json, the total workload is:

Total parallel capacity available on the runners:

Equating Load and Capacity:

Solving for Runners:

Finally, we piece all this information together in this custom runwright GitHub action and give you:

a dynamic-matrix
and test-load-distribution-json as output variables.

How does the end-to-end setup look?

Locally

Dev/Tester adds/updates a test.
Tries to commit changes.
Pre-commit git hook automatically runs the npx playwright test --only-changed command to run tests.
Custom state-reporter.js, which is added in the playwright.config.ts file as a reporter, runs after the tests are finished, running and updating the state.json file with the affected test run times.
The changes staged at the time of running the pre-commit hook are committed (except the state.json file, which is updated as a result of the pre-commit hook itself).
Post-commit, a post-commit hook runs and commits the state.json file as well (skipping running of pre-commit hooks again).

Remote (on GitHub)

User provides a desired total run time (either pre-defined and hard-coded for pull-request/push triggers, to say 2 or 4 minutes, or by giving it manually if using a workflow dispatch workflow).
Based on the above input run time and the information in the state.json file, the runwright action has all the information to calculate the minimum required runners as per test load.
The exact tests that need to run on each runner are also calculated and received from the action to the caller workflow.
Each runner then runs the tests in its scope and creates a blob report for the tests that it has run.
Once all runners are finished running tests, another job consolidates all the blob reports to create a consolidated final test report.
You can now verify if the total test run time was within your desired run time. Your desired total run time should always be a little higher than your slowest tests, since that is your limiting factor. Tip: If you have very lengthy tests, try breaking them down into smaller atomic or partial integration tests.

Getting Started

There are 3 main steps involved:

Step 1: One-time setup (in your test project)

Install the latest version of Node (or at least >=18)
Install husky in your test project.
Add a pre-commit hook file as shown here.
- This will run --only-changed tests on local commits.
Copy the state-reporter.js file and put it in the root repository.
- This will create a state.json file that contains the mapping of test path and the time it took to run (in ms).
Update the playwright.config.ts file reporters to include this reporter as shown below. reporter: [["list"], ["html"], ["github"], ["./state-reporter.js"]],
Add a post-commit hook file as shown here.
- This will automatically stage and commit the updated state.json file to the feature branch.
Add a reusable workflow that can take inputs from the user to run playwright commands and finish tests in x minutes.
Add an example trigger workflow that shows how to use the reusable workflow to run desired tests. Here are a few examples of trigger workflows:

Step 2: Run tests based on the defined triggers (push to main, pull_request to main, schedule, on demand, etc.)

To test the setup, use one of the below workflows (either in your own test repository or fork the above sandbox repository to try it out).
- run selected tests on demand - using a reusable workflow
- run selected tests on demand - using a standalone workflow

For reference, an example-workflow.yml file is also available in the root of the RunWright GitHub project.

Step 3: Report found issues

If you find any issues, use the issues page to raise them.

Things to remember

Do not use sharding-related commands in the input playwright command to run, since this solution is meant to overcome the flaws of sharding. Using sharding again would introduce those shortcomings again.
If you are using custom powerful GitHub runners, use the same custom runner type for the job that evaluates "RunWright" as what you would use in the subsequent job for running tests. This is important to correctly calculate the workers (threads), which is half of the cores of the runners.

🧪 Tests

Sr No	Test Description	Test Condition	Expected Result	Actual Result	Status
1	Run with 0 tests found	`npx playwright test --grep="non-existent-test"`	Action should handle gracefully, set RUNNER_COUNT=1, and exit successfully	✅ Action exits gracefully with "No tests found" message	✅ PASS
2	Test with fullyParallel=true	`npx playwright test` with -fully-parallel=true	Should distribute individual tests across runners	✅ Looking into runwright job, we can see that tests from same file are distributed across different runners	✅ PASS
3	Test with fullyParallel=false	`npx playwright test` with -fully-parallel=false	Should distribute tests files across runners	✅ File-level distribution works correctly. No tests from same file + project seen in different runners. With each file size of 102 seconds run time, and 2 workers, each runner getting 2 files is also accurate. Slowest runner time was 2 mins 48 seconds. Total run time in html report = 1.9m	✅ PASS
4	Run with very few tests (with < 2 min total run time)	`npx playwright test --grep='Wait for 5 seconds'`	Should create 1 runner with optimal worker allocation	✅ Creates 1 runner, completes in less than <2 mins>	✅ PASS
5	Run with all tests (~30 mins when run sequentially)	`npx playwright test`	Should create around 8 runners with optimal worker allocation	✅ Creates 9 runners. Slowest runner time was 2 mins 59 seconds. Total run time on html report: 2.1m	✅ PASS
6	Run ~1.5k tests in 2 minutes	[Test Command Placeholder]	Should create optimal number of runners to finish within 2 minutes	✅ Completes in ~2 minutes with dynamic runner allocation (as seen in the early tests image. runner id not available)	✅ PASS
7	Run ~3k tests in 2 minutes	`npx playwright test`	Should scale up runners appropriately to meet time constraint	✅ Scales to multiple runners, finishes in ~2 minutes	✅ PASS
8	Test missing from state.json	Delete tests from file06 for 20,25 seconds for firefox in state.json. Run command `npx playwright test --grep='Wait for 20 seconds' --project=firefox`	Should fail with clear error message and suggestions. No grace failure to avoid "false positive" situation from runs.	✅ Provides clear error with post-commit hook guidance	✅ PASS
9	Single project configuration	`npx playwright test --project='chromium'`	Should work with single browser project	✅ Handles single project scenarios correctly	✅ PASS
10	Multiple project configuration	`npx playwright test --project='chromium' --project='webkit'`	Should group tests by project within runners	✅ Correctly groups and distributes multi-project tests	✅ PASS
11	CPU core detection	[check any of previous runs]	Should detect available cores and calculate optimal workers	✅ Detects cores correctly, sets workers = cores/2	✅ PASS
12	Dynamic matrix generation	[check any of previous runs]	Should create proper GitHub Actions matrix format	✅ Generates valid JSON array for matrix strategy	✅ PASS
13	Load balancing accuracy	[check any of previous runs]	Distribution should be based on execution time, not test count	✅ Uses actual test execution times for optimal distribution	✅ PASS
14	Runner utilization	[check any of previous runs]	All runners should finish at approximately the same time	✅ Even load distribution across all runners	✅ PASS
15	Browser caching	[check any of previous runs]	Should cache and reuse Playwright browsers efficiently	✅ Implements proper browser caching strategy	✅ PASS
16	Error handling for malformed state.json	[Test Command Placeholder]	Should provide clear error when state.json is corrupted	✅ Handles JSON parsing errors gracefully	✅ PASS
17	Large test suite scalability	run with .5 seconds (before putting a restriction for minimum time)	Should handle test suites with 5k+ tests efficiently	✅ Scales appropriately for large test suites (tested with tests of size 3k+)	✅ PASS
18	Custom runner types compatibility	[Test Command Placeholder]	Should work with custom GitHub runner configurations	✅ Compatible with custom runner specifications	？ NOT-YET-TESTED
19	Output format validation	[check any of previous runs]	All outputs should be valid JSON and consumable by workflows	✅ All outputs are properly formatted and consumable	✅ PASS
20	Invalid time input (< 1 minute)	[Test Command Placeholder]	Should handle minimum time constraint appropriately	✅ Validates minimum 1 minute requirement	✅ PASS

Troubleshooting

It could be a good idea to generate the state.json file from scratch every few days or weeks to avoid having redundant test paths and names.

What's next?

Add option for when a user doesn't want to limit by time but wants to limit the maximum runners to use.
- Added instructions in the so-how-does-it-work -> runners -> pro-tip section to explain how to achieve this at callers end.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
example-workflow.yml		example-workflow.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RunWright

🚀 Core features

Scope

Why this action? What's wrong with Playwright Sharding?

With Playwright Sharding

1. Playwright sharding results in uneven test distribution on runners.

2. Fixed runners that do not scale up or down based on test load.

With RunWright

1. Even test load distribution, based on your pre-decided total run time, to finish tests.

2. Dynamic runners that scale up or down based on test load.

Pros and Cons: Playwright Sharding vs RunWright

💡 So how does it work?

How does the end-to-end setup look?

Locally

Remote (on GitHub)

Getting Started

Step 1: One-time setup (in your test project)

Step 2: Run tests based on the defined triggers (push to main, pull_request to main, schedule, on demand, etc.)

Step 3: Report found issues

Things to remember

🧪 Tests

Troubleshooting

What's next?

Like my work and want to support or sponsor?

About

Uh oh!

Releases 10

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

License

PramodKumarYadav/runwright

Folders and files

Latest commit

History

Repository files navigation

RunWright

🚀 Core features

Scope

Why this action? What's wrong with Playwright Sharding?

With Playwright Sharding

1. Playwright sharding results in uneven test distribution on runners.

2. Fixed runners that do not scale up or down based on test load.

With RunWright

1. Even test load distribution, based on your pre-decided total run time, to finish tests.

2. Dynamic runners that scale up or down based on test load.

Pros and Cons: Playwright Sharding vs RunWright

💡 So how does it work?

How does the end-to-end setup look?

Locally

Remote (on GitHub)

Getting Started

Step 1: One-time setup (in your test project)

Step 2: Run tests based on the defined triggers (push to main, pull_request to main, schedule, on demand, etc.)

Step 3: Report found issues

Things to remember

🧪 Tests

Troubleshooting

What's next?

Like my work and want to support or sponsor?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Packages