Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize parallel calls to Conn.Close and Conn.handshake #671

Merged

Conversation

Danielius1922
Copy link
Contributor

Description

Reference issue

Fixes #...

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.18%. Comparing base (032d60c) to head (fc419a4).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #671      +/-   ##
==========================================
+ Coverage   80.11%   80.18%   +0.07%     
==========================================
  Files         101      101              
  Lines        5355     5371      +16     
==========================================
+ Hits         4290     4307      +17     
+ Misses        695      694       -1     
  Partials      370      370              
Flag Coverage Δ
go 80.21% <100.00%> (+0.07%) ⬆️
wasm 64.46% <100.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sean-Der
Copy link
Member

Overall LGTM @Danielius1922!

Looks like a different race was found in CI. Once you can get it all to go green happy to test and can fix up any nits myself :)

@Danielius1922
Copy link
Contributor Author

Overall LGTM @Danielius1922!

Looks like a different race was found in CI. Once you can get it all to go green happy to test and can fix up any nits myself :)

Yeah the issue is that c.handshakeLoopsFinished.Wait() (from Conn.Close) and c.handshakeLoopsFinished.Add(2) (from Conn.handshake) are called from different goroutines. Since Close can be called from a different goroutine, the calls on c.handshakeLoopsFinished need to be synchronized.
I think the idea in Conn.Close is that it calls the internal Conn.close method, which might interrupt an ongoing handshake and then we wait using c.handshakeLoopsFinished.Wait(), so there is no ongoing handshake after the call to Conn.Close.

Handshakes themselves are serialized using Conn.handshakeMutex. I could try locking it in Conn.Close, I'm afraid of causing a deadlock if Conn.handshake on some code path invokes Conn.Close.
Better safe than sorry here I think, so I'm gonna introduce a channel, assign/get it under a lock and wait on it outside the lock.

@@ -405,7 +411,12 @@ func (c *Conn) Write(p []byte) (int, error) {
// Close closes the connection.
func (c *Conn) Close() error {
err := c.close(true) //nolint:contextcheck
c.handshakeLoopsFinished.Wait()
c.closeLock.Lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poor c.closeLock gets locked and unlocked thrice in a call to c.Close (twice in close + the call here), I guess I could get the channel in c.close and return it? Doesn't fit semantically for close to return the channel, but it is an internal function, so it doesn't matter much.

cancelHandshakeReader := c.cancelHandshakeReader
c.closeLock.Unlock()

cancelHandshaker()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sean-Der is it important to invoke these cancels at the beginning? if not I could move thes copying to line 1179 (after closeLock gets locked) and the invocations to 1187 (after it gets unlocked)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests succeed locally when I do it like that, so maybe the order of calls here isn't imporant, in which case I can do it like that and avoid multiple locking/unlocking of the lock.

@Sean-Der Sean-Der marked this pull request as ready for review August 22, 2024 16:18
@Sean-Der Sean-Der force-pushed the adam/bugfix/parallel-handshake-and-close branch from 9d39efb to fc419a4 Compare August 22, 2024 16:42
@Sean-Der Sean-Der merged commit 1a02350 into pion:master Aug 22, 2024
15 checks passed
@Sean-Der
Copy link
Member

Tagged v3.0.2

@Danielius1922 I added you to the org! Feel free to make branches and tags.

I think the concurrency of Conn can be cleaned up a lot. If you want to take a hack at it I am fully in support :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants