-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix goroutine leak in EphemeralGC on node cancel #2538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pull Request Revisions
HelpReact with emojis to give feedback on AI-generated reviews:
We'd love to hear from you—reach out anytime at [email protected]. |
ghost
approved these changes
Apr 17, 2025
ghost
approved these changes
Apr 17, 2025
ghost
approved these changes
Apr 17, 2025
ghost
approved these changes
Apr 17, 2025
ghost
approved these changes
Apr 17, 2025
ghost
approved these changes
Apr 17, 2025
kradalby
approved these changes
Apr 23, 2025
Collaborator
kradalby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! thank you!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is a goroutine leak in EphemeralGarbageCollector where each call to Schedule() creates a goroutine, but those goroutines are not properly terminated when a node is canceled or when the garbage collector is shut down i.e. leaving dangling goroutines.
Wrote a test to confirm:
=== RUN TestEphemeralGarbageCollectorGoRoutineLeak
ephemeral_garbage_collector_test.go:16: Initial number of goroutines: 2
ephemeral_garbage_collector_test.go:68: Final number of goroutines: 102
ephemeral_garbage_collector_test.go:71:
Error Trace: github.com/juanfont/headscale/hscontrol/db/ephemeral_garbage_collector_test.go:71
Error: "102" is not less than or equal to "7"
Test: TestEphemeralGarbageCollectorGoRoutineLeak
Messages: There are significantly more goroutines after GC usage, which suggests a leak
--- FAIL: TestEphemeralGarbageCollectorGoRoutineLeak (0.30s)
FAIL
FAIL github.com/juanfont/headscale/hscontrol/db 0.317s
FAIL
The fix ensures a proper cleanup of dangling goroutines and stopping all timers. Also added a safety check in Schedule() to prevent scheduling after Close() - There don't appear to be any current codepaths that call in this order currently, but I have added a hygiene check for this to ensure if someone does this in future, that it is handled correctly.
After this fix, also added a bunch of additional tests to ensure cancel and reschedule are not caught up in the same problem (which they are not).
=== RUN TestEphemeralGarbageCollectorGoRoutineLeak
ephemeral_garbage_collector_test.go:21: Initial number of goroutines: 2
ephemeral_garbage_collector_test.go:71: Final number of goroutines: 2
--- PASS: TestEphemeralGarbageCollectorGoRoutineLeak (0.50s)
=== RUN TestEphemeralGarbageCollectorReschedule
--- PASS: TestEphemeralGarbageCollectorReschedule (0.10s)
=== RUN TestEphemeralGarbageCollectorCancelAndReschedule
--- PASS: TestEphemeralGarbageCollectorCancelAndReschedule (0.20s)
=== RUN TestEphemeralGarbageCollectorCloseBeforeTimerFires
--- PASS: TestEphemeralGarbageCollectorCloseBeforeTimerFires (0.10s)
=== RUN TestEphemeralGarbageCollectorScheduleAfterClose
ephemeral_garbage_collector_test.go:205: Initial number of goroutines: 2
ephemeral_garbage_collector_test.go:248: Final number of goroutines: 2
--- PASS: TestEphemeralGarbageCollectorScheduleAfterClose (0.20s)
=== RUN TestEphemeralGarbageCollectorConcurrentScheduleAndClose
ephemeral_garbage_collector_test.go:260: Initial number of goroutines: 2
ephemeral_garbage_collector_test.go:331: Final number of goroutines: 2
--- PASS: TestEphemeralGarbageCollectorConcurrentScheduleAndClose (0.35s)
=== RUN TestEphemeralGarbageCollectorOrder
--- PASS: TestEphemeralGarbageCollectorOrder (7.00s)
=== RUN TestEphemeralGarbageCollectorLoads
--- PASS: TestEphemeralGarbageCollectorLoads (10.01s)
PASS
ok github.com/juanfont/headscale/hscontrol/db 18.487s
Given the amount of time these tests take to run, you might not want to run these by default.