Skip to content

Conversation

@Enkelmann
Copy link
Contributor

@Enkelmann Enkelmann commented Mar 28, 2025

Querying the whole database becomes a performance issue if a lot of nodes exist in it, even if all those nodes are offline. In PeerChangedResponse we actually know which nodes need to be queried, so we can restrict the database queries to these nodes.

In my tests, PeerChangedResponse usually needed to query less than 5 nodes, so the change resulted in an order of magnitude better performance for a database with several hundred nodes (most of them offline).

  • have read the CONTRIBUTING.md file
  • raised a GitHub issue or discussed it on the projects chat beforehand
  • added unit tests
  • added integration tests
  • updated documentation if needed
  • updated CHANGELOG.md

I can also write a test if desired. The PR should not change any behavior, so I wanted to base tests on existing tests for PeerChangedResponse. However, I did not find any existing tests for this function, so I am happy for pointers where to start for testing PeerChangedResponse.

Copy link
Collaborator

@kradalby kradalby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general change makes sense, in general a change request is (in my observation) not more than a subset of nodes, like you call out.

I am up for this, but it needs to be fetched as ones.

) ([]byte, error) {
resp := m.baseMapResponse()

peers, err := m.ListPeers(node.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is not necessary, but instead of listing all, we should just list the one from the change map.

It would make more sense to make ListPeers(node NodeID, peerIDs ...NodeID) and only fetch the ones that are needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted against changing ListPeers for now. It is also used for full map responses, where I think all peers are needed (although I have not looked too deeply into this function yet). But I can of course change the naming scheme if you like something like ListPeersSubset better than ListNodesSubset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modifying both ListNodes and ListPeers to be able to take a optional list arg would be the best.

changedNodes = append(changedNodes, peer)
for _, changedID := range changedIDs {
if changedID != node.ID {
changedNode, err := m.GetNode(changedID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with doing it this way, and not as mentioned above is that we dont get transactional consistency. In this case, nodes might change in between calls to GetNode. While this might not be an obvious problem, it might lead to incredibly hard to debug problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworked the code to use exactly one database query. I will also add a test to the database query function at least.

@Enkelmann
Copy link
Contributor Author

The code is reworked to use exactly one database query and I also added a unit test. I have some concern that my unit test does not cover the case of a NodeID being zero, although the code to add nodes in the test seem to suggest that this case cannot happen anyway.

Copy link
Collaborator

@kradalby kradalby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, couple of comments to simplify and keep around lest functions/code.

CHANGELOG.md Outdated
[#2493](https://github.com/juanfont/headscale/pull/2493)
- If a OIDC provider doesn't include the `email_verified` claim in its ID
tokens, Headscale will attempt to get it from the UserInfo endpoint.
- Improve performance by only querying relevant nodes from the database for node updates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please link the pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return nodes, nil
}

func (hsdb *HSDatabase) ListNodesSubset(nodeIDs types.NodeIDs) (types.Nodes, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (hsdb *HSDatabase) ListNodesSubset(nodeIDs types.NodeIDs) (types.Nodes, error) {
func (hsdb *HSDatabase) ListNodes(nodeIDs ...types.NodeID) (types.Nodes, error) {

I would prefer to just have the same function for this, but use the list expansion argument like this so it remains optional for getting all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is implemented.

) ([]byte, error) {
resp := m.baseMapResponse()

peers, err := m.ListPeers(node.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modifying both ListNodes and ListPeers to be able to take a optional list arg would be the best.

return peers, nil
}

func (m *Mapper) ListNodes(nodeIDs ...types.NodeIDs) (types.Nodes, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to pass it as types.NodeID (without the s), now you are passing it as a list of lists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is by design. When using an optional argument we face the problem of having two different types of empty lists, both of which are needed:

  • The empty list of not using the optional argument. We want to return all nodes in this case.
  • The empty list of using the optional argument, but with a list where all nodes have been filtered out. We want to return an empty list here and not all nodes.

One could use the nil and []type.NodeID{} variants of empty lists to distinguish these two cases. But in my experience, code depending on different behavior between these two empty list variants is a future bug waiting to happen. I know that my design also has a hidden footgun in that one could pass more than one list of nodes as parameter to the function. But I would prefer having two ListNodes functions before changing the parameter type to ...types.NodeID.

Side remark: I will wait until we have settled on a design for ListNodes before I take a second look at ListPeers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty list of using the optional argument, but with a list where all nodes have been filtered out. We want to return an empty list here and not all nodes.

This scenario does not make any sense to me, in the case where all nodes has been filtered out, you would have an empty list of IDs and you would instead not call for the database and return early.

Copy link
Contributor Author

@Enkelmann Enkelmann Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the proposed code does check it and returns early right before the database access would otherwise occur. In my opinion it is better than relying on the caller to know that he has to check it before calling ListNodes.

As context: In my load tests this case did actually happen and there was no check in PeerChangedResponse before querying the database for zero nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be fine with a docstring explaining the filter. Current behaviour is not that ergonomic and I would want it to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I do not like it from a coding-style perspective, I implemented it now.

@ghost
Copy link

ghost commented Apr 7, 2025

Pull Request Revisions

RevisionDescription
r2
Added peer filtering to ListPeersExpanded ListPeers method to support optional peer ID filtering in database and mapper layers
r1
Optimize node database query performanceEnhanced ListNodes to support filtering by node IDs, improving database query efficiency and reducing unnecessary data retrieval
✅ AI review completed for r2

@Enkelmann
Copy link
Contributor Author

Ok, ListNodes now returns all nodes when no parameter is given and the behavior is documented in a docstring. If you like this version, I can take a look at ListPeers.

@kradalby
Copy link
Collaborator

kradalby commented Apr 7, 2025

Looks great, thank you!

@Enkelmann
Copy link
Contributor Author

I added the same functionality to ListPeers (currently not used anywhere).
As far as I can tell the integration test failures have nothing to do with the PR, so I have not done anything about them.

Copy link
Collaborator

@kradalby kradalby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! thank you!

@kradalby kradalby merged commit 0d31347 into juanfont:main Apr 8, 2025
271 of 276 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants