Skip to content

Conversation

@Ekleog
Copy link
Member

@Ekleog Ekleog commented Feb 26, 2025

This adds the concept of a desync to the auto-upgrade service. A desync is basically a sub-nixpkgs, that is allowed to stay behind when auto-upgrading the system.

This has been especially useful for me for home-manager, which relatively often fails building on the stable channel. With a desync containing just home-manager and the few packages I want it to have available, I have stopped seeing any failure of the auto-upgrade service: instead, the home-manager part of my system is the only one that stays 2-3 auto-upgrades behind, until the module gets quite quickly fixed upstream.

This is useful for people who don't want to baby-sit their auto-upgrade mechanism, but still want to monitor failures, to avoid having a system very outdated.

I've been using this exact file for something like 2 months now, and its been working great for me! I have baby-sat it daily for this time, and it has been working fine without a single issue for a month with daily auto-upgrades — in this month, there have been a few times where esp. home-manager stopped building and then resumed building.

As I know this is a relatively controversial change, WDYT? I'll leave this PR open for a while, and if it gathers enough positive attention, I'll write up some release notes to get it ready to land 😄

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Feb 26, 2025
@Ekleog
Copy link
Member Author

Ekleog commented Feb 26, 2025

Note for the reviewer: the first commit contains the actual changes; the second commit is a huge lot of noise caused by the requirement to run nixfmt on everything

Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with auto-upgrade, but I have some minor feedback:

}
];

environment.systemPackages = lib.optional (cfg.desync != { }) upgradeDesyncsPkg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use mkIf than optional. Unless this was for a specific reason such as avoiding inf-rec.

Suggested change
environment.systemPackages = lib.optional (cfg.desync != { }) upgradeDesyncsPkg;
environment.systemPackages = lib.mkIf (cfg.desync != { }) [
upgradeDesyncsPkg
];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the advantage of using mkIf here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional false results in a definition of [ ] while mkIf false (usually*) results in no definition.

Practically speaking, there's not much difference to the final value. But there is a subtle difference in how the definitions get merged, how they'd appear in warnings, etc.

* one exception being "lazy" types that have an emptyValue used when undefined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually why I prefer optional whenever possible: IIRC mkIf false results in something like { if = false; value = <thunk>; }, which I'd expect would slow down eval. While otherwise there would be basically no practical impact.

Does that make sense, or has the module system changed too much since I last looked at it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, mkIf will wrap the value. However imo it's better to try to use the module system idiomatically, rather than attempt to micro-optimise these kinda things.

As for which is actually more optimal, I'm unsure. On the one hand mkIf false will result in one less definition being passed to the type's merge function. On the other hand, merging an empty list as one of the definitions shouldn't be expensive.


One scenario where mkIf false is better than an unconditional "empty" definition is when dealing with multiple override priorities.

E.g. a user defines mkDefault [ "foo" ] and some other module defines mkIf false [ "bar" ]. Due to using mkIf, the second definition is skipped and the highestPrio is (mkDefault null).priority (1000). Therefore the final value is [ "foo" ].

If we tweak that example to use if then else or optionals instead of mkIf, then the second definition is now unconditional* and will always override the mkDefault definition. Therefore the highestPrio will be defaultOverridePriority (100) and the final value will be [ ].

* the definition's value is conditional, but the definition itself is not.


Although, in this specific scenario, it seems almost impossible that environment.systemPackages would ever end up with a lower priority than defaultOverridePriority, given it is defined by the majority of modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH my dislike of the module system is probably most of the reason why I'm avoiding mkIf whenever possible. I'm a bit hyped by the reusable module system (IIRC the name right) because it's getting close to my old nixtos idea, though I haven't checked how it's implemented or tried it yet.

All that to say, I don't think this can be a problem, but if it's important to you just let me know and I'll adjust!

@github-actions github-actions bot added 6.topic: module system About "NixOS" module system internals 6.topic: lib The Nixpkgs function library labels Feb 26, 2025
@nix-owners nix-owners bot requested review from infinisil and roberth February 26, 2025 18:16
@github-actions github-actions bot added 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. labels Feb 26, 2025
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very useful, thank you!

@@ -1,12 +1,84 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name desync has a negative connotation for me, as I associate it with something like an electrical breakdown of sorts, but clearly this feature is intended to be useful.

The way I would conceptualize this feature is

  1. The NixOS configuration consists of multiple inputs
  2. The inputs can move independently, or perhaps in a constrained manner where some inputs are not allowed to be newer than the main input (the nixpkgs for NixOS)
  3. Progression of updates is gated by certain checks
  4. Configuring these inputs in conjunction with Nixpkgs and/or NixOS concepts is nice and useful, i.e. setting up an overlay, perhaps even replace a NixOS module with disabledModules, for the daring.

Maybe that's a bit more general and capable than what's implemented, but that's ok.

It's tricky to find a short name that does justice to all these aspects, but depending on where the emphasis should be, Here's some possible alternatives:

  • selection
  • gate
  • signal / pulse (making the feature itself a multiplexer, but let's name it after the parts we're configuring, not the behavior)
  • track / lane

I hope you're open to a different name, and that my ramblings are useful somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're mentioning is exactly what I tried to implement indeed! Just, I'm not sure I understand your point 4 😅

In what you're suggesting, I guess track and gate are the two I'd prefer! I'd still love to hear feedback from other people, I hadn't noticed the negative connotation to desync at all :)

@SuperSandro2000
Copy link
Member

I will call it here: that will unexpectedly break your setup and the example alone is already enough: What if matrix-synapse has an update that changes the module in a backwards incompatible way? Then unexpected things happen.

@roberth
Copy link
Member

roberth commented Feb 27, 2025

I will call it here: that will unexpectedly break your setup and the example alone is already enough: What if matrix-synapse has an update that changes the module in a backwards incompatible way? Then unexpected things happen.

Can be mitigated with clear documentation and the inclusion of tests to gate the upgrade.
Only mitigated though.
OTOH for something like a browser or a CLI tool, it should be rare to have bad interactions and breakage.

@SuperSandro2000
Copy link
Member

A browser is another bad example like most graphical programs. Especially QT stuff super easily breaks mixing versions. Similar things for hardware accelerated things.

@roberth
Copy link
Member

roberth commented Feb 28, 2025

A browser is another bad example like most graphical programs. Especially QT stuff super easily breaks mixing versions. Similar things for hardware accelerated things.

IIUC, these aren't true overlays that use the main Nixpkgs' dependencies; the packages bring their own closure with dependencies from the Nixpkgs version in which they were packaged. It'd have to be a runtime problem. I have a home-manager that's ahead of NixOS, and it seems to work fine.

Hardware acceleration is known to cause problems, but isn't that mostly a Nix-on-other-Linux problem? On NixOS I believe this is mostly solved with the /run symlinks.

Anyway, I might just be lucky or misinformed or something, and it's good to err on the side of caution. Feel free to use simpler examples, e.g. just "basic CLI tools"

@Ekleog
Copy link
Member Author

Ekleog commented Feb 28, 2025

I will call it here: that will unexpectedly break your setup and the example alone is already enough: What if matrix-synapse has an update that changes the module in a backwards incompatible way? Then unexpected things happen.

It only breaks if you use the matrix-synapse from the desync in an overlay and matrix-synapse stops building and there is a backward-incompatible change exactly at that time. A backward-incompatible change that I'd expect has probably never happened yet in the time desyncs will usually live, ie. 14 days by default. For a module to be incompatible with a package 14 days old, it needs to be very seriously unstable.

The only case I could think of would be a program completely changed its configuration, and the derivation has been updated alongside the module, to keep things working. This would happen only with pretty unstable packages, that I'd expect would also break in other ways relatively regularly.

Also, and more importantly, if I had handled that issue manually, I would do exactly what desyncs do: use a a-few-days-old version of matrix-synapse until the rest is fixed, so that I could get the security fixes that land on nixpkgs — the gain/risk ratio is clearly beneficial IMO.

If you really fear the module being incompatible with the desync's packages, then you can also disable the matrix-synapse module from the outer nixpkgs, and import it from the desync's path instead. TBH I personally don't see the point, but you can. Do you want to update the example to give that example? (Disclaimer: I did test on my servers with exactly matrix-synapse and home-assistant, a scheme with a disabledModule would be untested from my end)

Or maybe do you want to add a scary paragraph somewhere in the documentation? If so I would likely be unable to write a good one myself (because I don't really believe in that fear), but I'll be happy to use one you'd write!

@github-actions github-actions bot added the 8.has: documentation This PR adds or changes documentation label Feb 28, 2025
@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Mar 1, 2025

For a module to be incompatible with a package 14 days old, it needs to be very seriously unstable.

No, it doesn't. The smallest change in any module can already trigger serious bugs. Already a missing proxyWebsockets = true like which was required to get mobilizon back to work 611f484 can already break everything and render the program next to unusable.

The only case I could think of would be a program completely changed its configuration, and the derivation has been updated alongside the module, to keep things working. This would happen only with pretty unstable packages, that I'd expect would also break in other ways relatively regularly.

The program doesn't need to completely change its config. It is enough to change one bit of its config which doesn't even need to be considered a breaking change by up stream. This could be as easy as a changed default which conflicts with NixOS, a new required nginx rule or a slightly changed regex for a location. Such changes happen to eg Nextcloud somewhat regular 71dd9c3 or a changes like c82195d or 611f484 or b069f4f .

Also, and more importantly, if I had handled that issue manually, I would do exactly what desyncs do: use a a-few-days-old version of matrix-synapse until the rest is fixed, so that I could get the security fixes that land on nixpkgs — the gain/risk ratio is clearly beneficial IMO.

But then you are on your own and everyone will tell you that this is generally a bad idea to do. If issues like I linked above occur because of this module, we are probably unable to debug them or to provide any kind of support as we are basically undermining the basic promise of NixOS: to have a declarative config with one definitive source. We cannot promise this then anymore. Basically we would need to reject any issue that occurs when this module is enabled.

The more I think about this, the more I am against this insanity.


Basically any change where old code is removed from a module with an update now exposes a undefinable risk which could easily become a security risk. What if we remove a location rule which blocked access to some paths because the new version of a program no longer requires that and by accident the program no longer builds on aarch64? What then?

Just imagine that we removed 63ef033 again because upstream removed the well known value?!

The idea behind this is madness.

To list another commit which could easily break your system 6cd8810

@Ekleog
Copy link
Member Author

Ekleog commented May 25, 2025

If it's decided that this PR will be one big squash commit, then I guess these points will be moot. But I don't think it should be that way and I think most other nixpkgs committers agree with me. And in that case the branch history needs to be clean.

There is literally nothing I could split into commits. I can squash it all now and then we merge, but it'd make no sense. Literally the only thing that could be split into commits would be the extension function type; everything else is an atomic change that'd make no sense to split.

It is still more state and more impurity. Being of the same kind doesn't make it zero. And it's not exactly of the same kind anyway. These are not listed in nix-channel --list, for instance. You don't manage them with the same mechanism or knowledge.

Changes to the configuration are not listed in nix-channel --list either. I agree it'd be great to have a single comprehensive solution to find the source of old generations, but it's not actually something we have already, so introducing desyncs we would not lose anything.

This being said, I do think we could add something like nixos-version --sources that'd list both desyncs and channel versions. And even the git configuration hash, if the configuration.nix is in a git repo. This would allow people to actually find which versions they're running on. But that change is orthogonal to this, as we already have no way to associate generations to channels or configuration versions.

Even if that's correct, which I doubt, I think the desync concept would be a small niche within that subset.

Maybe that's true, but it probably should not. I've been using nixos for something like 9 years without desyncs. A year ago I introduced alerting when my auto-upgrades failed. That scared me enough to actually spend the effort implementing desyncs — I would never have guessed that my software was sometimes several weeks old due to various packages failing to build. Several weeks is enough for attackers to get to my system and use known vulnerabilities. I'm lucky I was never attacked, or maybe I was just saved by the fact that nixos is still niche enough that attackers don't care about it.

I do think that, especially if nixos becomes more widespread, we have some solution to handle this issue that security issues are blocked by any failing package. Desyncs is definitely not the best solution, but it's AFAICT the only one we can implement without very deep and significant changes to nix, that would likely be worse in "impurity" than desyncs are.

Should this be in nixpkgs at all?

I think there's something I don't understand. Can you give me an example for how desyncs could lead to negative side effects?

I can read three arguments against:

  • The added maintenance cost. This is the one I understand most. However, considering the auto-upgrade module seems to be edited roughly once a year for non-treewide/trivial changes, I don't think maintenance would be significantly increased.
  • The added user support issues. To be honest, I don't think this is a problem. The description warnings are already largely dissuasive enough that people asking for support with a desync could just not see an answer and that'd be fine.
  • A system breaking due to "mismatched" nixpkgs versions. My 4-months experiment with matrix-synapse and home-assistant is running with python packages, which AFAIK is the worst related to impurity. Yet everything has been working flawlessly for 4 months on my systems. Sure we can't be sure it will forever be fine, but any theoretical issue already raised would require a combination of both a changed module that does evaluate fine, and a corresponding package that stops to build at the same time. And for the issue to be alive at a time hydra performs a channel bump. On the other hand, in way more frequent situations, security issues are being left live on machines due to missed updates.

@Ekleog Ekleog mentioned this pull request May 25, 2025
13 tasks
@ElvishJerricco
Copy link
Contributor

I'll remove my blocking review, as I think the code accomplishes the goal of this PR well enough. That said, I do hope that before it's merged we'll find a better way to come to a consensus than hoping the right people come by and vote with thumbs.

@ElvishJerricco ElvishJerricco dismissed their stale review May 26, 2025 03:01

Code seems fine

Copy link
Member

@winterqt winterqt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think this idea has merit, I agree with @ElvishJerricco and @K900 that this probably shouldn’t be in-tree.

The added user support issues. To be honest, I don't think this is a problem. The description warnings are already largely dissuasive enough that people asking for support with a desync could just not see an answer and that'd be fine.

This alone is a reason we shouldn’t have this in-tree — people will use it in crazy ways, things will break, and the people who handle user support will regret that this was ever merged. I do not think descriptions are enough; we should just not be supplying users with footguns at all, no matter how useful.

Of course, nothing is stopping people from pulling this into their own configs — that’s the beauty of NixOS, after all — I just don’t think we should be adding new footguns, no matter how many past options we now wish were never added.

@roberth
Copy link
Member

roberth commented May 26, 2025

This looks very useful, thank you!

While true, I have changed my mind on merging this, due to second order effects I did not previously consider.

This may be a fork in the road. We could either
a. Normalize desyncs. Package maintenance does not change or degrades over time
b. Improve our processes and prevent build failures.

Of course the effect may be less pronounced, but we should prefer the latter.

@Ekleog
Copy link
Member Author

Ekleog commented May 26, 2025

@winterqt Thank you for your message, but I still don't understand what is a footgun in this PR. I literally myself did the worst possible thing (desyncing python versions without even containers, one even using an overlay), and nothing broke for 4 months. The footgun-y issues are entirely theoretical, and I have yet to see even one come to light in practice.

I think what detractors are missing is that desyncs are bound to at most 2 weeks of desynchronization by default, and packages basically never change that much in such a short amount of time in practice. Using it as a footgun would be like writing a systemd service that exposes sudo sh on an unauthenticated TCP port, and then coming to complain that nixos let one do that.

I still strongly believe that all the warnings added in the doc are basically meaningless, because there are no issues with the thing in the first place. I added them upon request because there are indeed theoretical issues, but these issues are much more theoretical than the docs in this PR claim.

@roberth I think I understand your comment, and it is indeed a drawback I hadn't considered; probably the worst one listed to date. OTOH I don't really believe in improving nixpkgs' processes seeing how things have (not) changed over the past few years, but… 🤷 Thank you for the comment!

I just know I personally will be relying more and more on desyncs as time goes, considering how things went.

@zimbatm
Copy link
Member

zimbatm commented May 29, 2025

@winterqt is your opinion that strong that it's worth losing @Ekleog as a contributor? I'm asking because sometimes we tend to get lost in the arguments and miss other aspects.

I can't speak for Ekleog of course but if I worked on a feature for 4 months, painstakingly working trough all the reviews and nitpicks, to then be blocked at the last minute, not based on strong technical argument that would be revealed, but based on hypothetical scenarios. I would be thinking twice before contributing anything meaninful to nixpkgs again. From an engineer's perspective, where I get reward from creating things, the only thing that's worse than that is getting my PR reverted.

I'm not saying there is no merit to your opinion, but we can't accurately predict the future either. A better way to find out, is to merge the PR, wait for the future to happen, and then change/remove the feature accordingly. Then ask @Ekleog if they are willing to help fix issues. That would feel more collaborative and construcitve to me and more in line with our values.

@Ma27
Copy link
Member

Ma27 commented May 29, 2025

The footgun-y issues are entirely theoretical, and I have yet to see even one come to light in practice.

The big question si however, on what scale was this tested? How many different setups were involved? Are the scenarios theoretical or unrealistic?

Sorry for being pedantic here, but calling something "hypothetical" is was something I heard too often before seeing it blow up in production a few weeks later.

@winterqt is your opinion that strong that it's worth losing @Ekleog as a contributor

While I understand what you're getting at here and I appreciate the intention, I'm not sure if this is the right signal: there are several maintainers now voicing severe concerns. I don't think it is a good strategy merge things regardless, to make people happy tbh. Especially if @roberth stated to have changed his mind (#385143 (comment)).

Related, personally I don't want to end up in a situation where I don't voice my concerns about patches because I have to be afraid of being deemed responsible for making them quit.

This is another symptom of missing architectural governance: while we've finally reached a point where certain subsystems have actual ownership, this is absolutely not the case for nixpkgs. There's -- to my knowledge -- no team that will be consulted when it comes to larger-scale nixpkgs changes (and this is pretty much a change in comparison to how the system behaved before), so this seems to have reached by far too few people who should've seen this in the first place.

@Ekleog
Copy link
Member Author

Ekleog commented May 29, 2025

@zimbatm Just to be transparent, this is not the only thing that's pushing me away in practice, and @winterqt actually answered after I had passed the turning point — I've had other PRs where contributing has been less than pleasant (the ongoing RFC, the rustic module, and maybe others I don't remember). I'm mostly of an overall feeling that the merging process has been made much more taxing, be it due to CI or human reviewers who are maybe just so exhausted they don't care about more than saying "no" any longer.

As you mention a PR being reverted it's actually "fun", because my first bad experience since coming back was actually my rustic PR getting reverted. I don't have issues with the revert itself: I do believe the committer who landed it was a bit too eager to land, and would have waited 1-2 weeks before doing so myself. But I do have issues with the way it was done, and the tone of exchanges.

All that to say, this PR is not a turning point for me, just the straw that broke the camel's back! I'll probably go back to only contributing small changes and just merging them myself for a few years, and come back then to see if I'm getting motivation again for bigger changes.

This being said, I do think that comments like "the whole feature is bad" should have come a few months ago when I first opened the PR and repeatedly pinged matrix, and not now that it was basically ready to land. @SuperSandro2000 had voiced that idea, but I was considering it handled by the various warnings. I think this is emblematic of the architectural issue @Ma27 mentioned, and I do believe that my several pings should have been enough considering the current governance.


@Ma27

The big question si however, on what scale was this tested? How many different setups were involved? Are the scenarios theoretical or unrealistic?

I have tested it on 3 machines, all on nixos-unstable:

  • one recent x86-64 laptop, only for cups inside a nixos container
  • one old x86-64 server, for matrix-synapse, that I put in an overlay and outside any container — this is the "worst case scenario" I've tested I'm speaking of
  • one aarch64 rpi, for home-assistant outside any container plus cups in a container

Cups inside nixos-container seems pretty wobbly, but I didn't hit any issue related to desyncs, just networking issues and similar.

These three machines have been happily living since I added them to desyncs, not blocking my updates. I haven't set metrics to check how much they were desynced in practice. But for both home-assistant and cups, I set them as desyncs at a time they were not building, and they worked fine as soon as I completed the process of setting up the desync.

IIRC both times there was already a PR up for the fix, but waiting for it would have meant a few more days of alerts and missed security fixes.

Sorry for being pedantic here, but calling something "hypothetical" is was something I heard too often before seeing it blow up in production a few weeks later.

I totally understand this concern, and actually now would be the best time to land this — we've just released 25.05, which means we'd have a full 6 months of baking this in nixos-unstable before it reaches stable; and we could easily revert it until then, even though reverting after it reaches stable would be harder.

If we have stronger doubts, we could even rename it to autoUpgrade.experimental.desync.*; which would make it clear that it's a particularly experimental feature, even for nixos-unstable.

Related, personally I don't want to end up in a situation where I don't voice my concerns about patches because I have to be afraid of being deemed responsible for making them quit.

I totally agree, and I'd like to be clear that I'm not threatquitting! I actually will most likely be stepping back regardless of the result here until I find some motivation again.

This being said, I do think there are solutions to the concerns voiced, like the "experimental" in the option name, that would allow to validate whether the concerns are valid or not; and the current answers I've been reading of "we should not do that regardless of any change" are IMO not good community building. (Just in case, I don't read your comment like this, and the questions you asked are definitely good questions!)

And to be fully transparent... the only good concern I've read why we should not do that regardless of any change was actually well-voiced by @roberth in their change of opinion — the risk of package support deteriorating is a real one that we can't really test better. This being said, I do strongly hope that any package maintainer would not desync their own packages, so I do hope we can try desyncs despite that drawback.

@ElvishJerricco
Copy link
Contributor

As I said before, I am very sorry that I failed to notice this PR earlier. I think there is mischaracterization that because of this failure the PR is being treated unfairly. That isn't true.

A better way to find out, is to merge the PR, wait for the future to happen, and then change/remove the feature accordingly

Our users should be the subjects of a great experiment? This sounds reckless to me.

is your opinion that strong that it's worth losing @Ekleog as a contributor?

If these are the stakes then it is artificial. Why should Ekleog's contributions hold more weight than my own, or anyone else's? It is not a normal conclusion that because your change is not included therefore you will scorch the project from your efforts. We shouldn't cater Nixpkgs to Ekleog's intentions.

not based on strong technical argument that would be revealed, but based on hypothetical scenarios.

This is an extremely disingenuous representation of the arguments made by me and others. The opinions against this PR are not hypothetical. I am also a nixpkgs contributor, so are the others. We're not making up the idea that these sorts of things lead to maintenance struggles. We've lived it. Indeed, this PR will create confusion and struggle that I personally am not looking forward to dealing with.


I feel this PR is experiencing a crisis of identity. It has become political, but that doesn't change its substance. Indeed, the objection I have is the result of analysis of the substance, not a political one. Is my opinion strong enough "that it's worth losing @Ekleog as a contributor"? It's not for me to decide. These are the norms in nixpkgs. If one doesn't like it, there's not much to be done.

@zimbatm
Copy link
Member

zimbatm commented May 30, 2025

I understand the risks, but I feel like they are blown out of proportion. My own sense is that we will get at most 10 users of this feature in the next release cycle. We won't see any difference in quality of the packages because people still have to learn to patch them after 2 weeks. The impact scope is limited because this feature is behind a flag. Which also help redirect users back to @Ekleog if they are having any issues.

So how do we find out who is right or wrong?

That's what I mean by hypothetical, it's the nature of the argument. I don't mean to disparage you. I believe you are being truthful and basing your own asseement based on your extensive personal experience. At the same time, nobody's future prediction are 100% accurate. That's why I am asking if @winterqt (but this also applies to you), if you are really sure. Especially since this is coming in so late in the PR.

Another approach could be to say; since there was an issue with the process and this was found out late, we will let the experiment run behind an experimental flag. Note all the objections, and then re-assess at the next NixOS release how things are going. And then you'll have the pleasure to tell me I was wrong :-). That would feel more fair and collaborative to me.

@alyssais
Copy link
Member

We won't see any difference in quality of the packages because people still have to learn to patch them after 2 weeks.

This makes it more likely the fix will end up being done by one of the small number of extremely active, experienced Nixpkgs contributors for whom it's easier to fix the problem than ignore it, before those two weeks, rather than a new contributor for whom not having an easy way out might provide the boost they need to get over hesitance to contribute to Nixpkgs for the first time.

re-assess at the next NixOS release how things are going.

How will we be able to evaluate that? There will be no way to tell how many contributions we would have received without this feature but haven't because of it.

@zimbatm
Copy link
Member

zimbatm commented May 31, 2025

We don't have the infrastructure to measure it. That would be the same if desync was hosted somewhere else really. Last month we received contributions from 872 different people. Assuming a 1/100 contributor ratio, we have 89k users? If 100 of them adopt desync next month, we will lose 1 PR? Assuming the service is one maintained by the core contributors?

What we can see is how many issues are being raised against desync and check the user footgun claim, and gain more practical experience with it. Maybe we find out that desync kinda sucks, and in the process discover a better way to do things. That's what feature flags are for.

Even as an experienced nixpkgs contributor, sometimes it's nice to be able to apply a security upgrade immediately, and handle the side-quest afterwards. Especially in institutions where they have strict rules, I can see how a feature like this could encourage adoption, and lead to more contributors in aggregate.

I think the main concern of this PR is that it puts more pressure on core contributors. But I really think the risk is inflated. There are other bigger levers to pull. Like improving the tooling, process and contributor experience, that have a bigger impact than anything this PR could do. And if you asked @Ekleog nicely, maybe he could be enclined to help out.

@zimbatm
Copy link
Member

zimbatm commented Jun 14, 2025

It looks like everyone is camped on their position, and since we have nobody empowered to make a decision, this PR is probably going to stay blocked.

@Ekleog you're welcome to host the module in https://github.com/nix-community/srvos or in another repo in the org if you want.

@Ekleog
Copy link
Member Author

Ekleog commented Jun 14, 2025

Thank you for the proposal! TBH I'm relatively happy with my current setup that just overrides the auto-upgrade module. If it could help other people if I upstream it to srvos then why not, but TBH I don't think I have the time/energy to do that without someone I'd know it'd help.

But thanks! I'll leave the PR open for the day we get proper governance, to decide whether we should close or land it 😄

@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 16, 2025
@nixpkgs-ci nixpkgs-ci bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Dec 12, 2025
@winterqt
Copy link
Member

cc @NixOS/nixpkgs-core

@winterqt winterqt requested a review from a team December 12, 2025 21:45
@nixpkgs-ci nixpkgs-ci bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Dec 12, 2025
@wolfgangwalther
Copy link
Contributor

I'll leave the PR open for the day we get proper governance, to decide whether we should close or land it.

The Nixpkgs core team briefly looked into this. In general, we prefer to defer decisions to the subject matter experts. While there is not yet a formal NixOS team, the people most involved in NixOS development were quite strongly against inclusion of this and gave substantive argumentation to that effect. If @Ekleog would still be interested in driving this forward given a review of that decision, we can look into the specifics here, but by default we would consider this to have been rejected.

@Ekleog
Copy link
Member Author

Ekleog commented Dec 13, 2025 via email

@SuperSandro2000
Copy link
Member

what's the point of NixOS compared to eg. Ansible on Fedora, if not the ability to mix packages at different versions?

The NixOS configuration is predictable and (ideally) fully declarative. Ansible depends on the state of the system before running it and the before state completely influences the outcome state. Running the same Ansible playbook on totally different hosts will result in totally different outputs system wise. Applying the same NixOS configuration to different systems ideally results in identically systems. Mixing different versions on NixOS is always a concise choice and not the outcome of mixing state accumulated over time.

And one other big point is, that you get a module which is known to be working with the shipped package without having to adjust multiple config files all over the place and having to configure all kinds of things over and over for every host and every user out there. You basically shift from "I build up a service and all its direct dependencies" to "I want to consume a service".

@wolfgangwalther
Copy link
Contributor

I looked at what the code does a bit more now and have a question for my understanding: Shouldn't this kind of thing be easily possible with your favorite dependency manager (flakes, npins, whatever) and some update tooling (for example https://github.com/Mic92/update-flake-inputs/)?

You can then just have two different references to nixpkgs and update them separately. Pull the desynced packages from one instance, the remaining system from the other. You can advance the latter more quickly than the former - auto-updates, built in CI and automatically merged, will just do that for you.

What am I missing that requires this to be a built-in feature in NixOS?

@Ekleog
Copy link
Member Author

Ekleog commented Dec 15, 2025

@wolfgangwalther It might be! I don't know; I don't use dependency managers myself, I'm only using nix-channel and no flakes yet. Just like I assume anyone who uses the nix-channel-based autoUpgrade module don't use dependency managers — and these users are AFAIR the only ones who could benefit from using this PR.

The only thing requiring this to be a built-in feature in NixOS is the existence of the autoUpgrade module — removing the autoUpgrade module would definitely make sense to me as an alternative to landing this PR, to explicitly indicate that we don't support this use case.

But currently the autoUpgrade module is dangerous, in that it can leave a system vulnerable to known security vulnerabilities for days and sometimes weeks due to another totally unrelated package failing to build.

@alyssais
Copy link
Member

alyssais commented Jan 6, 2026

Perhaps we need a note of the risk in the documentation for the system.autoUpgrade.enable, as well as a recommendation to have some sort of monitoring for failed auto-upgrades?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: lib The Nixpkgs function library 6.topic: module system About "NixOS" module system internals 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog This PR adds or changes release notes 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.