Flaky tests

What's a flaky test?

It's a test that sometimes fails, but if you retry it enough times, it passes, eventually.

Quarantined tests

When a test frequently fails in master, a ~"master:broken" issue should be created. If the test cannot be fixed in a timely fashion, there is an impact on the productivity of all the developers, so it should be placed in quarantine by assigning the :quarantine metadata with the issue URL.

it 'should succeed', quarantine: 'https://gitlab.com/gitlab-org/gitlab/-/issues/12345' do
  expect(response).to have_gitlab_http_status(:ok)
end

This means it is skipped unless run with --tag quarantine:

bin/rspec --tag quarantine

Before putting a test in quarantine, you should make sure that a ~"master:broken" issue exists for it so it doesn't stay in quarantine forever.

Once a test is in quarantine, there are 3 choices:

  • Should the test be fixed (i.e. get rid of its flakiness)?
  • Should the test be moved to a lower level of testing?
  • Should the test be removed entirely (e.g. because there's already a lower-level test, or it's duplicating another same-level test, or it's testing too much etc.)?

Quarantine tests on the CI

Quarantined tests are run on the CI in dedicated jobs that are allowed to fail:

  • rspec-pg-quarantine (CE & EE)
  • rspec-pg-quarantine-ee (EE only)

Automatic retries and flaky tests detection

On our CI, we use RSpec::Retry to automatically retry a failing example a few times (see spec/spec_helper.rb for the precise retries count).

We also use a home-made RspecFlaky::Listener listener which records flaky examples in a JSON report file on master (retrieve-tests-metadata and update-tests-metadata jobs).

This was originally implemented in: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/13021.

If you want to enable retries locally, you can use the RETRIES environment variable. For instance RETRIES=1 bin/rspec ... would retry the failing examples once.

Problems we had in the past at GitLab

Order-dependent flaky tests

These flaky tests can fail depending on the order they run with other tests. For example:

To identify the tests that lead to such failure, we can use rspec --bisect, which would give us the minimal test combination to reproduce the failure:

rspec --bisect ee/spec/services/ee/merge_requests/update_service_spec.rb ee/spec/services/ee/notes/quick_actions_service_spec.rb ee/spec/services/epic_links/create_service_spec.rb  ee/spec/services/ee/issuable/bulk_update_service_spec.rb
Bisect started using options: "ee/spec/services/ee/merge_requests/update_service_spec.rb ee/spec/services/ee/notes/quick_actions_service_spec.rb ee/spec/services/epic_links/create_service_spec.rb ee/spec/services/ee/issuable/bulk_update_service_spec.rb"
Running suite to find failures... (2 minutes 18.4 seconds)
Starting bisect with 3 failing examples and 144 non-failing examples.
Checking that failure(s) are order-dependent... failure appears to be order-dependent

Round 1: bisecting over non-failing examples 1-144 . ignoring examples 1-72 (1 minute 11.33 seconds)
...
Round 7: bisecting over non-failing examples 132-133 . ignoring example 132 (43.78 seconds)
Bisect complete! Reduced necessary non-failing examples from 144 to 1 in 8 minutes 31 seconds.

The minimal reproduction command is:
  rspec ./ee/spec/services/ee/issuable/bulk_update_service_spec.rb[1:2:1:1:1:1,1:2:1:2:1:1,1:2:1:3:1] ./ee/spec/services/epic_links/create_service_spec.rb[1:1:2:2:6:4]

We can reproduce the test failure with the reproduction command above. If we change the order of the tests, the test would pass.

Time-sensitive flaky tests

Array order expectation

Feature tests

Capybara viewport size related issues

Capybara JS driver related issues

PhantomJS / WebKit related issues

Capybara expectation times out

Resources


Return to Testing documentation