Skip to content

Conversation

@surajitshil-03
Copy link
Contributor

@surajitshil-03 surajitshil-03 commented Jan 13, 2026

Context

Adding a delay in the Agent listener code before retrying request when there is a retriable exception thrown from the server side.

Related work-item: AB#2338439


Description

When the server is unavailable (not an authentication error), the server returns a different exception (e.g., VssServiceResponseException with status code 404). The agent continuously retries the request indefinitely. Each retry invokes the OAuth token provisioning mechanism on the server. This behavior significantly increases load on an already unavailable server.
Hence we have increased the Backoff delay in the agent code before retrying the requests.


Risk Assessment (Low / Medium / High)

Low


Unit Tests Added or Updated (Yes / No)

NA


Additional Testing Performed

Manually tested connecting the agent with devfabric and then stopping the tfs devfabric web service. The agent delayed the request based on the continuous error count.

  • For error count <=2:
image
  • For error count >2 and <=5:
image
  • For error count >5:
image

Change Behind Feature Flag (Yes / No)

No


Tech Design / Approach

This is done to reduce the load on the server where the agent continuously make requests to the server.


Documentation Changes Required (Yes/No)

No


Logging Added/Updated (Yes/No)

NA


Telemetry Added/Updated (Yes/No)

NA


Rollback Scenario and Process (Yes/No)

NA


Dependency Impact Assessed and Regression Tested (Yes/No)

NA

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03 surajitshil-03 changed the title Adding a delay in the listener when it retries to get connection Adding a delay in the listener when there is exception in the connection Jan 13, 2026
@surajitshil-03 surajitshil-03 marked this pull request as ready for review January 13, 2026 12:35
@surajitshil-03 surajitshil-03 requested review from a team as code owners January 13, 2026 12:35
@surajitshil-03 surajitshil-03 changed the title Adding a delay in the listener when there is exception in the connection Enhancing the delay in the listener when there is exception in the connection Jan 13, 2026
Copy link
Contributor

@rajmishra1997 rajmishra1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: For higher number of error response from server, can we add log/warning to display continuous error response received

else if (continuousError > 2 && continuousError <= 5)
{
// random backoff [60, 90]
_getNextMessageRetryInterval = BackoffTimerHelper.GetRandomBackoff(TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(90), _getNextMessageRetryInterval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have custom implementation of backoff based on retry count, why not use the standard backoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are increasing the backoff if the number of retries is increasing and the server is still unavailable so that we reduce the load on the server as time progress and it is unavailable.

Basically the ICM which is related to these changes, the ask was if the server is not available the agent should not keep on making requests. So to reduce the frequency of requests we have increased the delay based on the retry count.

Copy link
Contributor Author

@surajitshil-03 surajitshil-03 Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can add a simple exponential backoff delay which increases with attempt numbers but since this retry based custom backoff logic was already there we decided to just increase the delay based on the number of continuous errors.

Earlier it was segregated as <=5 attempts and greater than that, now we have divided that further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants