Skip to content

IceAgent partial timerqueue shutdown fix and the turnconnection sending data after shutdown#2053

Open
sirknightj wants to merge 2 commits into
1.10.3from
cleanup-iceagentshutdown-andturnconnection-infiniteloop
Open

IceAgent partial timerqueue shutdown fix and the turnconnection sending data after shutdown#2053
sirknightj wants to merge 2 commits into
1.10.3from
cleanup-iceagentshutdown-andturnconnection-infiniteloop

Conversation

@sirknightj
Copy link
Copy Markdown
Contributor

What was changed?*

  1. Stop all three timer queues and shutdown all the turn connections when 'iceAgentShutdown' is called.
  2. Add missing error log in 'freeIceAgent'.
  3. Stop sending data after turn 'shutdown' was called.

Why was it changed?

  1. When timer queue cancel is called, sometimes it takes more than the allotted timeout to stop. When this occurs, CHK will jump to cleanup before stopping the other timer queues. Even after 'iceAgentShutdown' was called (via 'freeSampleStreamingSession'), we sometimes saw the ICE candidate connectivity checks continuing. We often see this log line in the logs, which indicates that the partial cleanup occurred:
iceAgentShutdown(): TurnConnection shutdown did not complete within 1 seconds
  1. To make debugging easier and more robust.
  2. Noticed that after 'turnConnectionShutdown' was called, packets were still being sent.

How was it changed?

  1. CHK was changed to CHK_LOG_ERR, which will ensure that when iceAgentShutdown is called, all of the timer queues will be shutdown and all of the turn connections too.
  2. Added LOG_CHK_ERR.
  3. In 'turnConnectionSendData', we check the value of the 'stopTurnConnection' sentinel, which is changed to 'true' when it is in the process of shutting down.

What testing was done for the changes?

  • Used our standard benchmarking tools to check for any regressions in peer connection connected rate and time to first frame.
  • When run on a constrained device, we saw an increase in the performance for peer connection connected success rate.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment thread src/source/Ice/IceAgent.c
// Shutdown all the timers during iceAgentShutdown; do not partially shut down timers.
if (pIceAgent->iceAgentStateTimerTask != MAX_UINT32) {
CHK_STATUS(timerQueueCancelTimer(pIceAgent->timerQueueHandle, pIceAgent->iceAgentStateTimerTask, (UINT64) pIceAgent));
CHK_LOG_ERR(timerQueueCancelTimer(pIceAgent->timerQueueHandle, pIceAgent->iceAgentStateTimerTask, (UINT64) pIceAgent));
Copy link
Copy Markdown
Contributor

@hassanctech hassanctech Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good so we do not abort trying to cancel the other timers. However you should also set the retStatus so the overall function can still return the ERROR status to the caller to notify it that the iceAgentShutdown operation did not fully succeed.

Perhaps something like this:

retStatus = timerQueueCancelTimer(pIceAgent->timerQueueHandle, pIceAgent->iceAgentStateTimerTask, (UINT64) pIceAgent));
CHK_LOG_ERR(retStatus);

Note that CHK_LOG_ERR is a no-op if the retStatus passed to it is a STATUS_SUCCESS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One adjustment to that suggestion is that we'll also need to not set the status back to a success after something took too long, so an additional check will be needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ If it fails to cancel timer, then what action we are taking? 2/ Considering these three changes which are well within If block and in a shutdown function. Do we need either CHK_STATUS or CHK_LOG_ERR? 3/ CHK_LOG_ERR will print error log. Customers often take this message as an error in a state machine causing them confusion


CHK(pTurnConnection != NULL && pDestIp != NULL, STATUS_NULL_ARG);
CHK(pBuf != NULL && bufLen > 0, STATUS_INVALID_ARG);
CHK(!ATOMIC_LOAD_BOOL(&pTurnConnection->stopTurnConnection), STATUS_SUCCESS); // Do not send data after stop
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

stefankiesz
stefankiesz previously approved these changes Sep 11, 2024
Copy link
Copy Markdown
Contributor

@stefankiesz stefankiesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Base automatically changed from 1.10.3 to main October 4, 2024 00:02
@sirknightj sirknightj dismissed stefankiesz’s stale review October 4, 2024 00:02

The base branch was changed.

@sirknightj sirknightj changed the base branch from main to develop April 7, 2025 18:53
@sirknightj sirknightj changed the base branch from develop to develop-pre-1.11.0 April 7, 2025 18:53
@sirknightj sirknightj changed the base branch from develop-pre-1.11.0 to develop April 7, 2025 18:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 4, 2025

This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

@github-actions github-actions Bot added the Stale label Nov 4, 2025
@sirknightj sirknightj removed the Stale label Nov 4, 2025
@sirknightj sirknightj changed the base branch from release-v1.12.1 to develop-pre-1.11.0 February 26, 2026 23:44
@sirknightj sirknightj changed the base branch from develop-pre-1.11.0 to 1.10.3 February 26, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants