-
-
Notifications
You must be signed in to change notification settings - Fork 17.7k
nixos/rustic: init module #373042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
nixos/rustic: init module #373042
Conversation
|
@emilylange you had concerns about the code , could you point them out ? |
| else if v ? postgres then | ||
| v.postgres |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration with PostgreSQL seems overly specific. Nixpkgs has 20+ database packages; should this module grow to include them all? To me, it would make more sense for the operator to compose this module with the existing postgresqlBackup module. The latter would dump the databases on a schedule, then this one would back them up. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… While it makes sense, it looks to me like the postgresqlBackup can only write to a file. Whereas here I'm using the rustic ability of streaming from stdin.
To give an example, I have one database of 13G (a pretty old matrix-synapse database), and I know some people have much bigger databases. I don't really want to write it to disk each day and then delete it for the backups — especially as I'm considering increasing the frequency of backups, thanks to rustic's deduplication.
Modifying the postgresqlBackup module to add a rustic backup location would be possible, but would likely be pretty complex (from an end-user UI perspective). And I'd argue it'd be even less well-placed there.
As for other databases, I already checked and mysql/mariadb seem to only allow running a global mysqldump without the postgresql-specific pg_dumpall + pg_dump * db; so I don't think it'd make sense to add it as people can just use the command backup type to backup all at once. I unfortunately don't know enough about the other databases to comment.
WDYT? If you have any other idea to support no-write-to-disk backup of databases I'd be happy to try it out!
The only alternative I can think of is something like a dynamic backup type that'd allow users to give one command that generates a JSON of backups, and then it'd generate one backup per such value. But that'd be pretty hard to use, as exhibited by the current postgres backup type — it'd lose the plug-and-play aspect of the nixos expert system :/
| else if o.pipeJsonInto ? command then | ||
| " --json | ${o.pipeJsonInto.command} \"$1\"" | ||
| else if o.pipeJsonInto ? prometheus then | ||
| " --json | ${prometheusWriteScript o.pipeJsonInto.prometheus.nodeExporterCollectorFolder} \"$1\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with PostgreSQL, in my opinion the integration with Prometheus is overly specific for this module. There are other observability clients, such as Amazon CloudWatch Agent added after 24.11; should this module know about them all?
How easy would it be for the operator to integrate this module with Prometheus themselves using the generic constructs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT it'd be very hard: the JSON output of rustic would only be put in journalctl, and parsing metrics from there would be of significant complexity.
This being said, if we do allow the pipeJsonInto.command variant, then the complexity is basically this:
{
prometheusNumericMetrics = [
"files_new"
"files_changed"
"files_unmodified"
"total_files_processed"
"total_bytes_processed"
"dirs_new"
"dirs_changed"
"dirs_unmodified"
"total_dirs_processed"
"total_dirsize_processed"
"data_blobs"
"tree_blobs"
"data_added"
"data_added_packed"
"data_added_files"
"data_added_files_packed"
"data_added_trees"
"data_added_trees_packed"
"backup_duration"
"total_duration"
];
prometheusTimeMetrics = [
"backup_start"
"backup_end"
];
prometheusAllMetrics = prometheusNumericMetrics ++ prometheusTimeMetrics;
prometheusJqTemplate = lib.concatMapStrings (m: ''
rustic_backup_${m}{id=\"{{id}}\"} {{${m}}}
'') prometheusAllMetrics;
prometheusWriteScript =
f:
pkgs.writeScript "rustic-write-to-prometheus" ''
#!${pkgs.bash}/bin/bash
set -euo pipefail
${pkgs.jq}/bin/jq -r --arg template "${prometheusJqTemplate}" '
. as $json
| $template
| gsub("{{id}}"; "'"$1"'")
${lib.concatMapStrings (
m: ''| gsub("{{${m}}}"; $json.summary.${m} | tostring)''
) prometheusNumericMetrics}
${lib.concatMapStrings
# jq's `fromdate` builtins do not support sub-second accuracy (as of early 2025)
(
m:
''| gsub("{{${m}}}"; $json.summary.${m} | sub(".[0-9]{0,9}Z"; "Z") | fromdateiso8601 | tostring)''
)
prometheusTimeMetrics
}
' > "${f}/rustic-backup-$1.prom.next"
mv "${f}/rustic-backup-$1.prom.next" "${f}/rustic-backup-$1.prom"
'';
# Then pipeJsonInto.command = prometheusWriteScript myFolder
}This requires quite a bit of knowledge of rustic-specific information, and I thought it'd make sense to integrate that directly into NixOS.
I also do believe that we can accept any further PR to add support for Amazon CloudWatch Agent or similar: just like for postgresql I believe that we can reuse the generic infrastructure to have the target-specific cost be relatively small, and thus we can avoid each user of such system having to re-pay the cost of implementing it by upstreaming everything into nixpkgs.
| #!${pkgs.bash}/bin/bash | ||
| set -euo pipefail | ||
| TZ="" ${cfg.package}/bin/rustic backup${ifFilesArgs + ifCommandArgs + genericArgs + pipeJsonIntoCmd} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clear TZ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for the JSON output to be relatively easily parsable: otherwise, the dates in the JSON output are given in a TZ-local timezone, and makes parsing unbelievably hard.
Thinking more about it, maybe it'd make sense to add that only when pipeJsonInto is non-empty?
| enable = lib.mkEnableOption "rustic"; | ||
| package = lib.mkPackageOption pkgs "rustic" { }; | ||
|
|
||
| checkProfiles = lib.mkOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the operator not want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that other modules that provide automated checks also provide a way to disable them, so I followed suit 😅 I also don't really see a point, but I guess automated checks are sometimes broken and need to be bypassed? That's also why it's defaulted to true
| ''; | ||
| }; | ||
|
|
||
| profiles = lib.mkOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that the operator can specify profiles and/or backups, with profiles being files the operator writes himself while backups is generated from a config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that profiles determines the configuration of rustic, whereas backups defines a specific backup target.
The split allows a user to eg. define a profile that's not used for any automated backup, to trigger on-demand backups with sudo rustic -P my-manual-backup.
OTOH, having backups without profiles would likely be pretty useless.
|
|
||
| - [git-worktree-switcher](https://github.com/mateusauler/git-worktree-switcher), switch between git worktrees with speed. Available as [programs.git-worktree-switcher](#opt-programs.git-worktree-switcher.enable) | ||
|
|
||
| - [rustic](https://github.com/rustic-rs/rustic), a Rust-based backup tool. Available as [services.rustic](#opt-services.rustic.enable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this module is quite complex, IMO it necessitates some unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I actually just got some motivation/time and added a basic integration test, validating at least the main functionality (including prometheus metrics generation, though checking the exact syntax would be hard) :)
| backupConfigToCmd k { | ||
| command = v.postgres // { | ||
| filename = "${v.postgres.prefix}/globals.sql"; | ||
| command = "${pkgs.sudo}/bin/sudo -u postgres ${config.services.postgresql.package}/bin/pg_dumpall --globals-only"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw the ${pkgs.sudo}/bin/sudo is wrong here: you'd need /run/wrappers/bin/sudo given that the store must not have setuid binaries in it.
However, why is it done like this in the first place? I'd argue it's preferable to reduce the permissions here as far as possible to not run all of the units as root.
Ideally, we'd have a readonly postgres user here, but unfortunately the postgresql ensure part is pretty ugly already and I don't see something like "create a readonly superuser" happening there for now[1].
[1] OK, actually I'm starting a bachelor's thesis that tries to get some ideas on state management with NixOS soon. So we may end up with answers eventually, but that's the status quo to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this means it might make sense to make the user configurable here and document in the manual what to do (i.e. something like create unix user auth, create corresponding DB user with read-only permissions). Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the ${pkgs.sudo}/bin/sudo is actually on purpose: this service does not require gaining any privileges, and will already run as root. So we don't need a wrapper to use sudo only to lower privileges.
The service needs to be run as root anyway, because the password to the backup repo will (should) be readable only by root.
Based on that, the only part we could additionally drop permissions for would be just running the pg_dump/pg_dumpall command. And TBH I'm not sure it'd be worth adding infrastructure just to run these very specific commands without read rights, as they're made to be run as the postgres superuser.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the password to the backup repo will (should) be readable only by root.
This sounds like a use for LoadCredential though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh I hadn't known about that! It's very interesting, thank you :)
Here are the changes that come to mind now that you mention it:
- Rework the
backupsset to not be attrsOf (sum type), and instead be backups.files = attrsOf submodule; backup.postgres = attrsOf submodule and backup.command = attrsOf submodule. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value. - Add to the common backup options a
runAsand acredentialsProfileoption, that'd get loaded by systemd credentials and passed as an additional-Pargument.runAsruns the systemd service as this user. credentialsProfileis always mandatory, unlessbackups.credentialsProfileis set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store- For files backups,
runAsdefaults to root - For command backups,
runAsis mandatory - For postgres backups,
runAsdefaults topostgres
Now, the main question I'm left with is: how to make it as easy as possible for the user to restore/run manual commands like snapshot listing? They can always use -P /root/the-password, but eg. the absence of .toml is pretty counter-intuitive. But OTOH each backup could have different creds, so I can't just add a wrapped rustic to systemPackages. But then maybe we can just ignore this problem anyway.
WDYT, both about the changes ideas and the "run manually" problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I am not part of the postgres maintainers, so take the following with a grain of salt.
A handful of modules across nixpkgs implement either of the approaches documented in https://nixos.org/manual/nixos/unstable/#module-services-postgres-initializing-extra-permissions to bootstrap users, database tables, or enable extensions.
You can find most by searching for either User = "postgres" or $PSQL.
A read-only user for backups should, based on my reading I did a while ago, simply use the pg_read_all_data role from https://www.postgresql.org/docs/15/predefined-roles.html.
For example:
systemd.services.postgresql.postStart = lib.mkAfter ''
$PSQL -c 'GRANT "pg_read_all_data" TO "rustic-postgres-ro";'
'';Note that this restarts postgres.service on initial deploy unlike the oneshot/sidecar approach.
Then your actual service that does the backup could use
serviceConfig = {
User = "rustic-postgres-ro";
DynamicUser = true;
};and whatever credentials you may need should likely use the above-mentioned LoadCredentials.
You may also want to set and configure CacheDirectory for the local repo cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your research about pg_read_all_data!
However, I'd disagree about both DynamicUser and CacheDirectory. By using a regular user with the default rustic repo cache, we allow the user to easily run sudo rustic restore, sudo -u rustic-postgres-ro rustic snapshots, etc.
I do believe this is an important use case, and would feel quite sad if restoring or listing snapshots were any more complex than it could be — good restoring UX is much more important than good backup UX, as restores usually happen in a rush and under stress.
With this in mind, here is the new plan for changes I'm thinking of:
- Rework the
backupsset to not be attrsOf (sum type), and instead bebackups.files = attrsOf submodule; backup.postgres = attrsOf submodule; backup.command = attrsOf submodule. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value but does add complexity. - Add to the common backup options a
runAsand acredentialsProfileoption, that'd get loaded by systemd credentials and passed as an additional-Pargument.runAsruns the systemd service as this user. credentialsProfileis always mandatory, unlessbackups.credentialsProfileis set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store- For files backups,
runAsdefaults to root - For command backups,
runAsis mandatory - For postgres backups,
runAsdefaults torustic-postgres-ro, which getspg_read_all_datapermissions on the postgres databases (I'll test this role on my database and confirm bit-perfect equivalence with a backup made withpostgres; if it's not I'll just keeppostgresfor now)
WDYT about this updated plan?
| #!${pkgs.bash}/bin/bash | ||
| set -euo pipefail | ||
| ${systemctl} start --no-block rustic-postgres-globals-${k}.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first line in the ExecStart, isn't it? And also the reason for the root requirement, correct?
So I'm wondering why not
- give less permissions where possible
- use
Wantsfor this use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "give less permissions" part, I think I answered above.
For Wants here, the only reason I did it this way here is for consistency with the rest of the rustic-postgres-db-${k}@$db.service deps — this way all the "deps" of rustic-backup-${k} are handled in the same way.
Do you think it'd be better to handle the only static dep with Wants? If yes I'll be happy to switch to it!
|
Great news! I received a greenlight from Here is my current plan for the future of this PR, please advise (even if you think it's good, just a 👍 helps a lot in making sure I'm going the right way) :)
|
|
Hey! Sorry for the delay in coming back to this. I'm sorry but I will have to give up on this PR. I have had only limited brain power to allocate to nixpkgs in the past two months. My two current big projects are #385143 and NixOS/rfcs#185. I have been pushing back reminders to get to them and this PR every week for the last few months. Luck wanted that my free time happened at a time when #385143 was the first in my "to-answer" stack. So I answered there earlier today, and spent a fair amount of time handling the last review comments. However, the latest exchanges finished consuming all the emotional availability I had. Sorry about that! After the few years that I have been less active, since I came back last year, I have been under the impression that the NixOS community is being much less fun to contribute to, even though the governance issues seem to be going better. I hoped that my bit of time could be enough for me to contribute again things long past due, but now I'm facing the issue that I do not actually want to contribute so much any longer. Hope someone can take over this PR and push it to the end! And love y'all 💜 |
|
I'll leave the PR open so that it's easy to find for anyone who'd want to make a On my end I'm already using this module on my local machine as an extra module, and will keep doing so until I get some notification that it landed upstream. |
|
Ive never had that much pushback on a PR (the one you linked on auto upgrade). I salute your courage for trying to fight for what you think is right! Thank you so much for this PR! I feel a little guilty for having merged so quickly, i didnt expect anyone to care as much as to revert it. Best of luck in contributing, your efforts are very much appreciated ! |
|
Thank you for your message! And no worries, you were maybe a bit over-enthusiastic, but to be honest it's much more pleasant to be in a community of over-enthusiastic people who don't care about a few breaking changes and consider it's fine to revert when needed, than in what seems to have become of the nixpkgs community… so I hope you can stay this way 💜 |
|
And likewise i hope that despite all the pushback you still find the energy to contribute. You are part of what makes nixpkgs move forward ! |
Second attempt at #372935.
This has been running smoothlessly on my 4 systems for quite a while, on both aarch64 and x86-64 linux.
Considering the complexity of the module, I'll wait for (either a very long time or) a review from another committer before landing. But I can already say I'm very happy with it personally :)
The only future work I'd see on it would be to add easy toggles to automatically run a FS-level snapshot before backing up, which I'd likely implement for ZFS and maybe BTRFS. Or maybe I'd PR that upstream directly, idk, it's not on the short-term todo-list anyway.
cc @emilylange @happysalada @NobbZ @philipmw
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.