Skip to content

The one and only solution that can finish a given Playwright test load in a pre defined execution time by dynamically scaling runners up or down (ex: finish 3k tests in 2 mins)

License

Notifications You must be signed in to change notification settings

PramodKumarYadav/runwright

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

96 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RunWright

If you want to skip reading the context and go directly to the getting started section, click here.

πŸš€ Core features

  • Time-based completion: Finish thousands of tests in your target timeframe (2-5 minutes).
  • Dynamic auto scaling: Auto-adjust runners based on test load.
  • Smart distribution: Balance workload by execution time, not test count.

In the below example, we see more than three thousand tests run in just 1.5 minutes with a desired run time of 2 minutes in total.

fast test run

Scope

This action covers both execution modes in Playwright:

  • When fullyParallel=true - Parallel run of all individual test cases on runners
  • When fullyParallel=false - Parallel run of all individual test files on runners

Why this action? What's wrong with Playwright Sharding?

We will explain this by looking into the details of how Playwright Sharding works and what problems it brings with its implementation.

With Playwright Sharding

Playwright Sharding is an out-of-the-box solution from Playwright to allow distributed runs on any machine. Its inner workings and GitHub usage examples have two main flaws.

1. Playwright sharding results in uneven test distribution on runners.

Playwright sharding distributes tests based on the total count of tests (balancing shards) and not based on how much time each test takes to complete. Since sharding is not time aware of every test while distributing tests on runners, it results in situations as below.

uneven distribution

2. Fixed runners that do not scale up or down based on test load.

Playwright gives a GitHub actions example that shows how we can use a GitHub matrix strategy to distribute tests on a fixed number of runners (4 in the given example). This results in inefficiencies as shown below.

fixed runners

With RunWright

1. Even test load distribution, based on your pre-decided total run time, to finish tests.

even-load-distribution

2. Dynamic runners that scale up or down based on test load.

dynamic-scaling-of-runners

Pros and Cons: Playwright Sharding vs RunWright

Aspect 🚫 Playwright Sharding βœ… RunWright
Timely feedback on pull requests 🐌 Test runs that take a long time cannot be run with every pull request. ⚑️ Fast and predictable run times make it possible to run system tests with every PR.
Trust in Tests πŸ“‰ Tests that aren't run with each PR don't get fixed with each PR. They are often run after new changes are already merged into the main branch. As seen frequently, such tests give false positives due to new changes and break the team's trust in them. πŸ“ˆ Tests that are run with every PR get fixed with the PRs. They provide timely feedback to developers, give true positives, and improve the team's trust in them.
Maintenance Fatigue 😩 Tests that are not fixed with PRs get passed on to QAs. When this happens frequently, which it often does, it results in maintenance fatigue in QAs. QAs find themselves demotivated and stuck in this never-ending cycle of fixing broken tests, with little to no time to do anything else that is meaningful. πŸ˜‡ When developers are responsible for fixing the tests that are broken due to their own changes, it frees up time for testers in the team to do more meaningful work such as exploratory testing, writing new tests for missing functionality, learning new ways of testing, and mentoring team members on testing and automation.
To increase Test Coverage or not? πŸ“‰ Increased test run times create pressure on the team to limit test suite growth and over-optimize existing tests rather than adding new tests to increase test coverage. πŸ“ˆ When teams have a solution and setup that can always finish tests in a fixed time (say 2 to 5 minutes), it encourages them to write new tests to increase test coverage for missing functionality and not worry about over-optimization to keep run times in check.
Runner Scaling Efficiency πŸ“‰ Adding more runners has diminishing returns and doesn't guarantee proportional time savings πŸ’‘ Smart auto-scaling based on test load gives consistent and directly proportional performance benefits
Costs and returns πŸ’Έ As we have seen, with more added runners, the infrastructure costs grow in proportion but with diminishing performance results. πŸ’° Infrastructure costs are always in proportion to our test run demands, and we only pay for what we use. Nothing more. Nothing less.
Scalability Potential πŸ”’ Approach doesn't scale well with an increased number of tests. πŸš€ Excellent scalability that grows efficiently with test suite expansion, always keeping total run time fixed to our desired times (say 2 to 5 minutes regardless of total tests to run)

Key Takeaway: RunWright transforms system testing from a burden into an enabler, allowing teams to maintain fast feedback loops while scaling their test suites confidently.

** At the time of writing this document, there are no known other solutions (paid or open source) that can do this using Playwright and GitHub.

πŸ’‘ So how does it work?

To build a solution that is "time aware" and that can "auto-scale" based on the "current test load," there are a few things that we need.

πŸ” i.e.:

  • Ξ£ T_i = TestRunTimeForEachTest(i) = execution time of test i (from state.json)
    • We get this value from the state.json file that is generated using a custom state-reporter.js file and committed on a post-commit hook.
  • N = total number of tests to run.
    • We get the test scope by running the playwright command with the --list option.
  • TargetRunTime = total desired time to complete the run (in minutes)
  • TotalLoad = Ξ£ T_i = total test load (in terms of test run time)
    • We iterate over each runner to keep the Ξ£ T_i <= TargetRunTime.
    • Note that the total run time for each runner is affected by the number of parallel threads and is explained in more detail in the next section.
  • Cores = number of cores per runner.
  • Threads (Parallel threads per runner).
  • Runners = Total number of required runners.
    • We calculate the optimal required runners as shown in the next section by using all the above available information.
    • Providing runners as a GitHub dynamic matrix.
      • GitHub fromJSON and GITHUB_OUTPUT variables makes it possible to pass dynamic matrix from one job to another.
      • Note: It is good to note that it is not straightforward to pass the matrix variables using other variable options (such as setting as environment variables or taking from user as workflow input variables). Because of this reason, creating dynamic matrix remains a bit of a mystery and thats why most teams end up using hardcoded matrix in their workflows.
    • Pro Tip: Users can Limit the maximum number of runners to a sensible limit (say 20) in their caller workflow to avoid spinning up hundreds of runners, in case of a huge test set with very long tests and short total run time wishes. Also good to note that this variable cannot be passed as input variable. So it must be hardcoded and fixed in the workflow files.

πŸ“ Equation

Since we know every individual TestRunTimeForEachTest(i) from state.json, the total workload is:

alt text

Total parallel capacity available on the runners:

alt text

Equating Load and Capacity:

alt text

Solving for Runners:

alt text

Finally, we piece all this information together in this custom runwright GitHub action and give you:

  • a dynamic-matrix
  • and test-load-distribution-json as output variables.

How does the end-to-end setup look?

Locally

  • Dev/Tester adds/updates a test.
  • Tries to commit changes.
  • Pre-commit git hook automatically runs the npx playwright test --only-changed command to run tests.
  • Custom state-reporter.js, which is added in the playwright.config.ts file as a reporter, runs after the tests are finished, running and updating the state.json file with the affected test run times.
  • The changes staged at the time of running the pre-commit hook are committed (except the state.json file, which is updated as a result of the pre-commit hook itself).
  • Post-commit, a post-commit hook runs and commits the state.json file as well (skipping running of pre-commit hooks again).

Remote (on GitHub)

  • User provides a desired total run time (either pre-defined and hard-coded for pull-request/push triggers, to say 2 or 4 minutes, or by giving it manually if using a workflow dispatch workflow).
  • Based on the above input run time and the information in the state.json file, the runwright action has all the information to calculate the minimum required runners as per test load.
  • The exact tests that need to run on each runner are also calculated and received from the action to the caller workflow.
  • Each runner then runs the tests in its scope and creates a blob report for the tests that it has run.
  • Once all runners are finished running tests, another job consolidates all the blob reports to create a consolidated final test report.
  • You can now verify if the total test run time was within your desired run time. Your desired total run time should always be a little higher than your slowest tests, since that is your limiting factor. Tip: If you have very lengthy tests, try breaking them down into smaller atomic or partial integration tests.

end-to-end

Getting Started

There are 3 main steps involved:

Step 1: One-time setup (in your test project)

Step 2: Run tests based on the defined triggers (push to main, pull_request to main, schedule, on demand, etc.)

For reference, an example-workflow.yml file is also available in the root of the RunWright GitHub project.

Step 3: Report found issues

  • If you find any issues, use the issues page to raise them.

Things to remember

  • Do not use sharding-related commands in the input playwright command to run, since this solution is meant to overcome the flaws of sharding. Using sharding again would introduce those shortcomings again.
  • If you are using custom powerful GitHub runners, use the same custom runner type for the job that evaluates "RunWright" as what you would use in the subsequent job for running tests. This is important to correctly calculate the workers (threads), which is half of the cores of the runners.

πŸ§ͺ Tests

Sr No Test Description Test Condition Expected Result Actual Result Status
1 Run with 0 tests found npx playwright test --grep="non-existent-test" Action should handle gracefully, set RUNNER_COUNT=1, and exit successfully βœ… Action exits gracefully with "No tests found" message βœ… PASS
2 Test with fullyParallel=true npx playwright test with -fully-parallel=true Should distribute individual tests across runners βœ… Looking into runwright job, we can see that tests from same file are distributed across different runners βœ… PASS
3 Test with fullyParallel=false npx playwright test with -fully-parallel=false Should distribute tests files across runners βœ… File-level distribution works correctly. No tests from same file + project seen in different runners. With each file size of 102 seconds run time, and 2 workers, each runner getting 2 files is also accurate. Slowest runner time was 2 mins 48 seconds. Total run time in html report = 1.9m βœ… PASS
4 Run with very few tests (with < 2 min total run time) npx playwright test --grep='Wait for 5 seconds' Should create 1 runner with optimal worker allocation βœ… Creates 1 runner, completes in less than <2 mins> βœ… PASS
5 Run with all tests (~30 mins when run sequentially) npx playwright test Should create around 8 runners with optimal worker allocation βœ… Creates 9 runners. Slowest runner time was 2 mins 59 seconds. Total run time on html report: 2.1m βœ… PASS
6 Run ~1.5k tests in 2 minutes [Test Command Placeholder] Should create optimal number of runners to finish within 2 minutes βœ… Completes in ~2 minutes with dynamic runner allocation (as seen in the early tests image. runner id not available) βœ… PASS
7 Run ~3k tests in 2 minutes npx playwright test Should scale up runners appropriately to meet time constraint βœ… Scales to multiple runners, finishes in ~2 minutes βœ… PASS
8 Test missing from state.json Delete tests from file06 for 20,25 seconds for firefox in state.json. Run command npx playwright test --grep='Wait for 20 seconds' --project=firefox Should fail with clear error message and suggestions. No grace failure to avoid "false positive" situation from runs. βœ… Provides clear error with post-commit hook guidance βœ… PASS
9 Single project configuration npx playwright test --project='chromium' Should work with single browser project βœ… Handles single project scenarios correctly βœ… PASS
10 Multiple project configuration npx playwright test --project='chromium' --project='webkit' Should group tests by project within runners βœ… Correctly groups and distributes multi-project tests βœ… PASS
11 CPU core detection [check any of previous runs] Should detect available cores and calculate optimal workers βœ… Detects cores correctly, sets workers = cores/2 βœ… PASS
12 Dynamic matrix generation [check any of previous runs] Should create proper GitHub Actions matrix format βœ… Generates valid JSON array for matrix strategy βœ… PASS
13 Load balancing accuracy [check any of previous runs] Distribution should be based on execution time, not test count βœ… Uses actual test execution times for optimal distribution βœ… PASS
14 Runner utilization [check any of previous runs] All runners should finish at approximately the same time βœ… Even load distribution across all runners βœ… PASS
15 Browser caching [check any of previous runs] Should cache and reuse Playwright browsers efficiently βœ… Implements proper browser caching strategy βœ… PASS
16 Error handling for malformed state.json [Test Command Placeholder] Should provide clear error when state.json is corrupted βœ… Handles JSON parsing errors gracefully βœ… PASS
17 Large test suite scalability run with .5 seconds (before putting a restriction for minimum time) Should handle test suites with 5k+ tests efficiently βœ… Scales appropriately for large test suites (tested with tests of size 3k+) βœ… PASS
18 Custom runner types compatibility [Test Command Placeholder] Should work with custom GitHub runner configurations βœ… Compatible with custom runner specifications ? NOT-YET-TESTED
19 Output format validation [check any of previous runs] All outputs should be valid JSON and consumable by workflows βœ… All outputs are properly formatted and consumable βœ… PASS
20 Invalid time input (< 1 minute) [Test Command Placeholder] Should handle minimum time constraint appropriately βœ… Validates minimum 1 minute requirement βœ… PASS

Troubleshooting

  • It could be a good idea to generate the state.json file from scratch every few days or weeks to avoid having redundant test paths and names.

What's next?

  • Add option for when a user doesn't want to limit by time but wants to limit the maximum runners to use.

Like my work and want to support or sponsor?

Buy Me a Coffee Sponsor Me on GitHub

About

The one and only solution that can finish a given Playwright test load in a pre defined execution time by dynamically scaling runners up or down (ex: finish 3k tests in 2 mins)

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published