Skip to content

Conversation

@roganlynch
Copy link
Contributor

Fixes #2899

Summary

This PR fixes the inconsistent DNS resolver behavior reported in issue #2899. The core issue was that DNS resolvers were not being sent correctly based on the override_local_dns setting.

Changes

Root Cause

When override_local_dns=false, the code was incorrectly setting FallbackResolvers instead of Resolvers. FallbackResolvers are only used when primary DNS fails, causing DNS to never work.

The Fix

Updated dnsToTailcfgDNS() in hscontrol/types/config.go to implement the correct logic:

  • MagicDNS supersedes override_local_dns: When MagicDNS is enabled, always send Resolvers
  • When MagicDNS is disabled:
    • override_local_dns=true → Send Resolvers
    • override_local_dns=false/unset → No Resolvers (clients use local DNS)

Behavior Matrix

MagicDNS override_local_dns Resolvers Sent?
true true/false/unset ✅ Always
false true ✅ Yes
false false/unset ❌ No

Testing

  • ✅ Updated unit tests in hscontrol/types/config_test.go
  • ✅ Added comprehensive integration tests in integration/dns_override_local_test.go
  • ✅ All unit tests pass

Files Changed

  • hscontrol/types/config.go: Fixed DNS configuration logic
  • hscontrol/types/config_test.go: Updated test expectations
  • integration/dns_override_local_test.go: Added integration tests

@roganlynch roganlynch force-pushed the fix/dns-override-local-2899 branch from edcad88 to a0f46d4 Compare December 12, 2025 22:50
Copy link
Collaborator

@kradalby kradalby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful, some direction.

// - When MagicDNS=false:
// - override_local_dns=true: Send Resolvers
// - override_local_dns=false/unset: No Resolvers
func TestDNSOverrideLocalBehavior(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think help on this is very helpful, it has been quite painful, and I have never found time to sit down and test all the upstream Tailscale SaaS options to map out exactly how it should work, which often has resulted in this subtle bugs or strange behaviours.

I do think that we are starting a bit in the wrong end here with resolving this once and for all, either by method, or just by the naming.

This is very oriented around testing "what our settings do", while it should focus on first mapping out a test case/scenario for every behaviour of the possible configuration in Tailscale SaaS, and name them accordingly, and then change the Headscale code to behave in the correct way.

The best example of a time I managed to work like this is probably the recent Tags reimplementation in #2931.

I appreciate someone digging into this as it requires due diligence and I have not had time to sit down with it, or put it on the current roadmap.

Copy link
Contributor Author

@roganlynch roganlynch Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a thinking member of the public I care about exactly three outcomes for my clients:
1.) Am I using the "all singing all dancing dns proxy service"
2.) Am I sending alternate DNS servers IPs to my clients
3.) Am I letting the client choose/keep its own DNS

This is MVP.

Making it "work just like TS service" is more of a nice to have than a base requirement - where I come from.

There are undoubtedly many more knobs available with the business logic TS head end and I am not interested in "feature parity" as much as "good enuff"™

Sending overriding DNS servers to the client when they think they've disabled it is obviously not "as intended"

And MagicDNS will not work if it's not sent to the client...

I don't understand what was meant/intended by "Fallback Servers" - If you don't have working DNS servers and you don't understand why you're stuck - I don't have a big enough clue bat to cure what ails you.

If MagicDNS needs proxy targets (at all times) these should be named accordingly in the code and set to one or more of the usual suspects... Unless they're explicitly mapped to configured values.

MagicDNS - overrides the Client DNS - cause that's where the magic happens
- Uses plausible public defaults as its proxy targets in the absence of user configured targets

Override True - ships the customer provided/configured dns directly (w/o proxy)

Override False - ships nothings


Please let me know if I've missed something important that TS provides that we care about deeply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this as a bug fix ( where what I hear you describing above is a desire for feature parity through testing)

Obviously low-level DNS resolution changes will catch flack from outlier users making use of misbehavior - Hopefully I've made a reasonable case for just shipping this (or something akin to it) and then making it incrementally better over time.

LMK - Love the software. Thanks for the feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously low-level DNS resolution changes will catch flack from outlier users making use of misbehavior

I think the particular challenge with this part of the code is that it has been implemented, re-implemented, re-implemented and re-implemented a lot of times, and always not fully correct. This means that over time we have broken the people relying on or working around misbehavior many times.

I feel that unless we can give it the care it needs to sit down and implement it as it should be, we should not patch it further until we have the time to do it properly.
I get that this is likely a bug fix, but without doing the effort of sitting down and validating how it is suppose to work, I cannot really say if that is the case or not by reading the code.

I would greatly appreciate help to get it up to scratch, but it's been "good enuff"™ for too long, so we need to do it properly and partial changes feels like it will cause more work to be untangled when we have time to do it fully.

Copy link
Contributor Author

@roganlynch roganlynch Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to get your input of what more/different use-cases need coverage to "implement it as it should be"; I don't have access to the TS headend code. I am not interested in implementing a test/validation harness for TS headend, especially for what I see as a relatively straightforward branching codepath. I don't have any history with what might make this part of the code "radioactive" - but I am not worried about losing traction with the userbase - as headscale is the only OS alternative to TS commercial offering I am currently aware of.

Would you share what "up to scratch" might look like from your perspective? --understanding I am interested in making the tool actually useful (no sharp edges on the user/admin/operator facing components), not in achieving (transient) ivory tower feature parity.

I see only the three cases described in my diff - MagicDNS (proxied DNS), HS propagated Direct DNS, No Propagated DNS. Please, please, please, let me know if I am being dumb. It wouldn't be the first time.

Would having more integration test cases with different configuration options be a win? Better error messages for incompatible option selections? Better documentation of each configuration knob?

I understand having FUD about what's apparently been a sore spot in the codebase - I've experienced that pain first hand for the last 18+ months running it from my homelab. But, perhaps because I took my drama/improv classes too seriously - I strongly believe in "entering the danger" - Tackling the challenge head on.

I know that if the user gets DNS assigned when they believe that they've disabled the option - the software has violated its implicit contract - which qualifies as a sharp edge from my perspective.

I am also happy to let sleeping dogs lie - if it's too soon to go back to fixing this codepath - if there are exigent circumstances.

Thanks for the dialogue. -Rogan

Copy link
Contributor Author

@roganlynch roganlynch Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've just pushed a "more comprehensive" change set that attempts to comply with the articulated theory of operation in tailcfg.DNSConfig

This add optional config fields for SplitDNSFallbackResolvers that will default to the Global resolvers (previous behavior - and what TS documents) if the setting is omitted - but explicitly setting them will obviously supersede that default behavior.

And perhaps most importantly added an integration check to make sure that IF override_local_dns=false - nothing should leak out into the TS Client env.

Reading through the TS code nothing appears to correlate with the override DNS functionality with a 1:1 correspondence. It treats the presence of Resolvers, plus some Magic/Split DNS flags as indicative of the desired state. Thus the closest approximations of OverrideLocalDNS=false ≈≈ DontConfigureDNS=True or DontConfigureResolvers=True

Clients need resolvers if they have SplitDNS (routed proxy), MagicDNS (Full Proxy), or No Proxy.

LMK if you have cycles to discuss this as I feel like this implementation is at least worth beta-testing.

@roganlynch roganlynch force-pushed the fix/dns-override-local-2899 branch from 3444ba3 to 31d87c8 Compare December 15, 2025 14:35
@roganlynch roganlynch force-pushed the fix/dns-override-local-2899 branch from 9804eba to 063f31f Compare December 15, 2025 23:04
@roganlynch roganlynch force-pushed the fix/dns-override-local-2899 branch from f0d768b to 62a306e Compare December 17, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Different behavior depending on override_local_dns definition

2 participants