Skip to content

Conversation

@Rorical
Copy link
Contributor

@Rorical Rorical commented Mar 5, 2024

https://www.rfc-editor.org/rfc/rfc7636.html
To fix the error "Could not exchange code for the token" when using the PKCE method, a verifier should be generated and used during the authentication process.

  • read the CONTRIBUTING guidelines
  • raised a GitHub issue or discussed it on the projects chat beforehand
  • added unit tests
  • added integration tests
  • updated documentation if needed
  • updated CHANGELOG.md

Summary by CodeRabbit

  • New Features

    • Enhanced OpenID Connect (OIDC) registration and callback process with Proof Key for Code Exchange (PKCE) for improved security.
    • Introduced a new data structure, RegistrationInfo, to manage OIDC-related information effectively.
  • Bug Fixes

    • Improved validation processes to ensure secure handling of authorization code exchanges.

@kradalby kradalby added the OIDC OpenID Connect related issues label Jul 18, 2024
@kradalby
Copy link
Collaborator

I imagine this can go in and be rebased on top of #2020, whenever that is ready.

@kradalby
Copy link
Collaborator

@Rorical if you have time to rebase this in the next few days, then it might be able to make it into the next release.

@Rorical
Copy link
Contributor Author

Rorical commented Nov 29, 2024

This could be resolved soon. I'm away from my headscale instance for months and haven't realized current status of this project, so I need some time to check what's going on.

Based on the purpose of pkce, I recommend to add a oidc config parameter EnablePKCE, since pkce will only provide a extra layer of security and totally compitable with currect setup. However, user should have the option to enable or disable this feature.

@Rorical
Copy link
Contributor Author

Rorical commented Nov 29, 2024

Testing and documentation will be added soon

t.Fatalf("unexpected users: %s", diff)
}

// Test that PKCE verifier was properly used in authentication
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this test verify that the PKCE works properly? this test looks more or less the same as the normal test, but with the option enabled?

I would expect two test cases:

  • One with the option enabled, and where the client is joined correctly
  • One with the option enabled, where the verification fails and the client is not joined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockOIDC seems automatically handle code_challenge parameter for pkce, so I expect that normal test should work in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats fine, but we need to make sure that a node is denied if the pkce is wrong/not valid to ensure that the feature works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PKCE verifier is used by OIDC provider to check the validity of client. E.G. if any man-in-middle attack is performed to sniff the authorization code, then it can be used to get access token. Do you mean that "pkce is wrong/not valid" refers that we fake a pkce verifier in headscale to check if the OIDC provider can deny that request?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would expect that we test that the MitM attack is caught, so something with a "failing" test, where the node is unable to join because PKCE ensures it doesnt get through.

@kradalby kradalby added this to the v0.24.0 milestone Dec 2, 2024
Repository owner deleted a comment from coderabbitai bot Dec 2, 2024
@kradalby
Copy link
Collaborator

@Rorical any opportunity to look at the integration tests that are failing? You probably also need to rebase to get a docker fix that broke all the builds.

@Rorical
Copy link
Contributor Author

Rorical commented Dec 17, 2024

Apologize for the delay. The failing test proves to be harder to implement for me, as I can not get the nix test working for me.

@kradalby
Copy link
Collaborator

No worries, can you do:

cd cmd/gh-action-integration-generator
go generate

and commit the updated workflow? this will run the new tests, I will get to reviewing a bit later, thank you!

@Rorical
Copy link
Contributor Author

Rorical commented Dec 19, 2024

The test should works now. Is it possible to still make this change into v0.24.0? I really need this feature to have my instance continue working...

@kradalby
Copy link
Collaborator

If the tests pass, I can try to review it tomorrow, but I am hesitant adding it this late in the release cycle and adding new features in beta, so I won't guarantee it.

Copy link
Collaborator

@kradalby kradalby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure about the tests, but I did test this with my setup and kanidm and it does work.

return resp, err
}

func TestOIDCAuthenticationWithPKCEVerifierTampering(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the challenge with this test is that it doesnt verify that the test work, since the positive test uses the tailscale client, while the negative test does something completely different by running a http client.

I am conflicted, I am not sure if this tests does what is needed, but not sure how we can achieve it with the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually the negative test does something completely same as the positive test did, however the only difference is that I tried to modify the request from user to oidc server (with a custom http transport). The whole redirection logic is completely same:

  1. User request login from tailscale
  2. tailscale request headscale, headscale return machinekey headscale.instance/register/mkey:xxxxxxxxxxx
  3. user request register endpoint, the endpoint redirect user to oidc login oidc.instance/oidc/authorize?xxxxxx
  4. oidc server process request and redirect to headscale headscale.instance/oidc/callback?xxxxxx

What the http client did is that it modify the parameter code_challenge for the third step, so the login request from the headscale redirection is invalid. This is what we did to mimic MiTM. The oidc server will still proceed, but the response will not pass the code verifier stored inside headscale instance.

This is an actual log generated by the test:

2024/12/19 16:33:53 running tailscale up for user user1
2024/12/19 16:33:53 ts-unstable-dnh7da running tailscale up
2024/12/19 16:34:23 ts-unstable-dnh7da login url: https://172.22.0.5:8080/register/mkey:809f6ad985579a4ccabf7ee7880e20c7efb049c6f952dd174b6c93f617c50d1f
2024/12/19 16:34:23 ts-unstable-dnh7da logging in with url
2024/12/19 16:34:23 RoundTrip: GET https://172.22.0.5:8080/register/mkey:809f6ad985579a4ccabf7ee7880e20c7efb049c6f952dd174b6c93f617c50d1f
2024/12/19 16:34:23 Processing GET request
2024/12/19 16:34:23 No code_challenge found in GET query params
2024/12/19 16:34:23 Response status: 302 Found
2024/12/19 16:34:23 RoundTrip: GET http://172.22.0.3:34663/oidc/authorize?access_type=offline&client_id=superclient&code_challenge=6cg09A-VPrE-wgpCxUtfArY6ammErUF4vP3A2LXsm1M&code_challenge_method=S256&redirect_uri=https%3A%2F%2Fhs-oidcauthpkce-lrbl4l%3A8080%2Foidc%2Fcallback&response_type=code&scope=openid+profile+email&state=dcf4af7abc68da1876db3617094bf63b
2024/12/19 16:34:23 Processing GET request
2024/12/19 16:34:23 Found GET verifier: 6cg09A-VPrE-wgpCxUtfArY6ammErUF4vP3A2LXsm1M
2024/12/19 16:34:23 Modified URL to: http://172.22.0.3:34663/oidc/authorize?access_type=offline&client_id=superclient&code_challenge=6cg09A-VPrE-wgpCxUtfArY6ammErUF4vP3A2LXsm1M_tampered&code_challenge_method=S256&redirect_uri=https%3A%2F%2Fhs-oidcauthpkce-lrbl4l%3A8080%2Foidc%2Fcallback&response_type=code&scope=openid+profile+email&state=dcf4af7abc68da1876db3617094bf63b
2024/12/19 16:34:23 Response status: 302 Found
2024/12/19 16:34:23 RoundTrip: GET https://hs-oidcauthpkce-lrbl4l:8080/oidc/callback?code=bzi7BK5ZeYWCS3J7Es0PTIkKZ5riN4yP&state=dcf4af7abc68da1876db3617094bf63b
2024/12/19 16:34:23 Processing GET request
2024/12/19 16:34:23 No code_challenge found in GET query params
2024/12/19 16:34:23 Response status: 400 Bad Request
2024/12/19 16:34:23 ts-unstable-dnh7da response code of oidc login request was 400 Bad Request
2024/12/19 16:34:23 body: could not exchange code for token: oauth2: "invalid_grant" "Invalid code verifier. Code challenge did not match hashed code verifier."
2024/12/19 16:34:23 auth got error: status code not OK

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am really missing something, I think what bothers me is that it isnt the tailscale client that is doing the request, but a httpClient.Do(req). I think the "correct" way to do this is to have the request go through a proxy and that could be passed as an option to hsic or tsic. where you can give it a func that will modify the requests.

That way we can have unmodified (and only one version of) runTailscaleUp, and have failing and passing nodes in the same test, some just get tampered with and some dont, compared to a dedicated test for each.

This is a lot of code that is very specific to one test, and a proxy in between headscale and tailscale for tampering would be quite useful.

I dont want to put it on you to have to write that for this test tho, so I wonder if we will do the following:

  • Can you squash up the commits so there is two commits:
    • one with the code change, changelog and doc/config changes
    • one with the tests
  • Make a PR with only the code change commit
  • Make a draft PR with the tests
  • Make an issue about adding a tampering proxy for headscale integration tests and tag the draft PR.

Then we can merge the code change without tests for now, and do it in a generic way so we dont have to maintain a lot of specific code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see your point. by that way we can have a proxy which could also be useful for some other tests.

return resp, err
}

func TestOIDCAuthenticationWithPKCEVerifierTampering(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am really missing something, I think what bothers me is that it isnt the tailscale client that is doing the request, but a httpClient.Do(req). I think the "correct" way to do this is to have the request go through a proxy and that could be passed as an option to hsic or tsic. where you can give it a func that will modify the requests.

That way we can have unmodified (and only one version of) runTailscaleUp, and have failing and passing nodes in the same test, some just get tampered with and some dont, compared to a dedicated test for each.

This is a lot of code that is very specific to one test, and a proxy in between headscale and tailscale for tampering would be quite useful.

I dont want to put it on you to have to write that for this test tho, so I wonder if we will do the following:

  • Can you squash up the commits so there is two commits:
    • one with the code change, changelog and doc/config changes
    • one with the tests
  • Make a PR with only the code change commit
  • Make a draft PR with the tests
  • Make an issue about adding a tampering proxy for headscale integration tests and tag the draft PR.

Then we can merge the code change without tests for now, and do it in a generic way so we dont have to maintain a lot of specific code.

@Rorical Rorical force-pushed the main branch 2 times, most recently from 4310d11 to 0f9810b Compare December 22, 2024 14:50
@kradalby
Copy link
Collaborator

kradalby commented Dec 22, 2024 via email

@Rorical
Copy link
Contributor Author

Rorical commented Dec 22, 2024

I'm trying to open a new PR for this.

@Rorical
Copy link
Contributor Author

Rorical commented Dec 22, 2024

Closed due to #2314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OIDC OpenID Connect related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants