Fix flaky ChangelogBackendTestCase timeout in assertChangelogAttributesInRootDSE#607
Merged
vharseko merged 2 commits intocopilot/fix-inverted-logic-substring-checkfrom Mar 31, 2026
Conversation
…0s to fix flaky test Agent-Logs-Url: https://github.com/OpenIdentityPlatform/OpenDJ/sessions/c1bf34bd-39f6-486b-aacd-6f05ba144d41 Co-authored-by: vharseko <6818498+vharseko@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix timing issue in ChangelogBackendTestCase test
Fix flaky ChangelogBackendTestCase timeout in assertChangelogAttributesInRootDSE
Mar 31, 2026
8f999cf
into
copilot/fix-inverted-logic-substring-check
10 of 12 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
searchInChangeNumberModeOnOneSuffixMultipleTimeswas intermittently failing withfirstchangenumber: expected "1" but was "0"because change number assignment is asynchronous and the 3-second retry window inassertChangelogAttributesInRootDSEwas too short on slower runners (notably macOS).Change
ChangelogBackendTestCase.java: IncreaseTestTimer.maxSleepinassertChangelogAttributesInRootDSEfrom 3 s to 10 s, matching the timeout already used bysearchChangelogand other retry loops in the same class.Original prompt
Problem
The CI build on PR #599 is failing with a test failure in
ChangelogBackendTestCase.searchInChangeNumberModeOnOneSuffixMultipleTimes. The failure is a timing/race condition unrelated to PR #599's changes (which fixAttributeValuePasswordValidator).Failing assertion
The test publishes 4 update messages to the replication server and then immediately asserts that
firstchangenumberin the Root DSE equals1. However, change number assignment is asynchronous — the replication server needs time to process changes. The current 3-second timeout inassertChangelogAttributesInRootDSEis insufficient, especially on macOS runners.Root Cause
In
opendj-server-legacy/src/test/java/org/opends/server/backends/ChangelogBackendTestCase.java, the methodassertChangelogAttributesInRootDSE(line 854) uses aTestTimerwith only a 3-secondmaxSleep:Other similar retry loops in the same test class (e.g.,
searchChangelogat line 1124) already use 10 seconds. The 3-second timeout is inconsistent and causes flaky failures.Fix
In
opendj-server-legacy/src/test/java/org/opends/server/backends/ChangelogBackendTestCase.java, change line 858 in theassertChangelogAttributesInRootDSEmethod from:to:
This aligns the timeout with the rest of the test infrastructure in this class and gives the replication server sufficient time to assign change numbers before asserting.
The following is the prior conversation context from the user's chat exploration (may be truncated):
User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.
Assistant:
I'll start by fetching the job logs and the workflow definition to understand the failure.
Assistant: ## Failing Job Analysis
The build is failing due to 1 actual test failure (the PgSQL skips are expected and not the cause of the build failure).
Root Cause
The failing test is in
ChangelogBackendTestCase.searchInChangeNumberModeOnOneSuffixMultipleTimesat line 589, which callsassertChangelogAttributesInRootDSE(line 861), which in turn callsassertAttributeValue(line 1431):What's Happening
In the test method
searchInChangeNumberModeOnOneSuffixMultipleTimes(line 581), the very first call at line 589 is:This asserts that
firstchangenumberin the Root DSE equals1. However, the Root DSE is returning0instead.Looking at
assertChangelogAttributesInRootDSE(line 854), there's aTestTimerthat retries for up to 3 seconds (line 858). The change number assignment is asynchronous — the replication server needs time to assign change numbers after update messages are published. In this case, the 3-second timeout is insufficient; thefirstchangenumberattribute hasn't been updated from0to1before the timer expires.Solution
Increase the timeout in
assertChangelogAttributesInRootDSEto give the replication server more time to assign change numbers. Change line 858 from 3 seconds to a longer duration (e.g., 10 seconds):This is a timing/race condition issue. The test at line 583 sets purge delay to
0:Then publishes 4 update messages and immediately expects the change numbers to be assigned. On macOS runners (where this job ran), the...
This pull request was created from Copilot chat.