You just finished setting up all your services and it works fine - how do you now prepare for eventual drive failure?

@ssdfsdf3488sd@lemmy.world

virtualize the machine with proxmox, use proxmox backup server, load vm on new system if you get catastrophic failure on the machine running the vm currently.

HeartyBeast

carefully configured services on your rpi

I have a back up on an SD Card waiting for the day the SD Card fails. Slot it in and reboot

adONis

Most of the docker services use mounted folders/files, which I usually store in the users home folder /home/username/Docker/servicename.

Now, my personal habit of choice is to have user folders on a separate drive and mount them into /home/username. Additionally, one can also mount /var/lib/docker this way. I also spin up all of these services with portainer. The benefit is, if the system breaks, I don’t care that much, since everything is on a separate drive. In case of needing to re-setup everything again, I just spin up portainer again which does the rest.

However, this is not a backup, which should be done separately in one way or the other. But it’s for sure safer than putting all the trust into one drive/sdcard etc.

@Skies5394@lemmy.ml

On my main server: I have my SSD RAID1 ZFS snapshots of my container appdata, VM VHDs and docker image, that is also backed up as a full backup once per night to the RAID10 array, then rsynced to the backup server which then is uploaded to the cloud.

The data on the RAID is backups, repos or media that I’ve deposited there for an extra copy it for serving via Plex/Jellyfin. I have extra copies of the data, and if I were to lose the array totally, I wouldn’t be pleased, but my personal pictures/videos wouldn’t be in danger.

I run two back up servers, which both upload to the cloud. One of which takes bare metal images of all my computers (sans servers bulk drives), the other which takes live folders.

This is more due to convenience so that I can pull a bare metal image to restore a device, or easily go find a file with versioning online if necessary on both accounts.

As a wise man said, you can never have too many backups.

@friend_of_satan@lemmy.world

I’ve had a complete drive failure twice within the last year (really old hardware) and my ansible + docker + backup made it really easy to recover from. I got new hardware and was back up and running within a few hours.

All of your services setup should be automated (through docker-compose or ansible or whatever) and all your configuration data should be backed up. This should make it easy to migrate services from one machine to another, and also to recover from a disaster.

dr_robot

My configuration and deployment is managed entirely via an Ansible playbook repository. In case of absolute disaster, I just have to redeploy the playbook. I do run all my stuff on top of mirrored drives so a single failure isn’t disastrous if I replace the drive quickly enough.

For when that’s not enough, the data itself is backed up hourly (via ZFS snapshots) to a spare pair of drives and nightly to S3 buckets in the cloud (via restic). Everything automated with systemd timers and some scripts. The configuration for these backups is part of the playbooks of course. I test the backups every 6 months by trying to reproduce all the services in a test VM. This has identified issues with my restoration procedure (mostly due to potential UID mismatches).

And yes, I have once been forced to reinstall from scratch and I managed to do that rather quickly through a combination of playbooks and well tested backups.

@subtext@lemmy.world

Dang I really like your idea of testing the backup in a VM… I was worried about how I’d test mine since I only have the one machine, but a VM on my desktop or something should do just fine.

@namelivia@lemmy.world

I have all my configuration as Ansible and Terraform code, so everything can be destroyed and recreated with no effort.

When it comes to the data, I made some bash script to copy, compress, encrypt and upload them encrypted. Not sure if this is the best but it is how I’m dealing with it right now.

rentar42

I’ve got a similar setup, but use Kopia for backup which does all that you describe but also handles deduplication of data very well.

For example I’ve added older less structured backups to my “good” backup now and since there is a lot of duplication between a 4 year old backup and a 5 year old backup it barely increased the storage space usage.

@tetris11@lemmy.ml

Radical suggestion:

Once a year you buy a hard drive that can handle all of your data.
rsync everything to it
unplug it, put it back in cold storage

@CarbonatedPastaSauce@lemmy.world

I actually run everything in VMs and have two hypervisors that sync everything to each other constantly, so I have hot failover capability. They also back up their live VMs to each other every day or week depending on the criticality of the VM. That way I also have some protection against OS issues or a wonky update.

Probably overkill for a self hosted setup but I’d rather spend money than time fixing shit because I’m lazy.

@surewhynotlem@lemmy.world

HA is not redundancy. It may protect from a drive failure but it completely ignores data corruption issues.

I learned this the hard way when my cryptomator decided to corrupt some of my files, and I noticed but didn’t have backups.

@CarbonatedPastaSauce@lemmy.world

That’s why I also do backups, as I mentioned.

rentar42

yeah, there’s a bunch of lessons that tend to only be learned the hard way, despite most guides mentioning them.

similarly to how RAID should not be treated as a backup.

rentar42

There’s lots of very good approaches in the comments.

But I’d like to play the devil’s advocate: how many of you have actually recovered from a disaster that way? Ideally as a test, of course.

A backup system that has never done a restore operations must be assumed to be broken. similar logic should be applied to disaster recovery.

And no: I use Ansible/Docker combined approach that I’m reasonably sure could quite easily recover most stuff, but I’ve not yet fully rebuilt from just that yet.

Kaldo

I’m not sure what Ansible does that a simple Docker Compose doesn’t yet but I will look into it more!

My real backup test run will be soon I think - for now I’m moving from windows to docker, but eventually I want to get an older laptop, put linux on it and just move everything to the docker on it instead and pretend it’s a server. The less “critical” stuff I have on my main PC, the less I’m going to cry when I inevitably have to reinstall the OS or replace the drives.

rentar42

I just use Ansible to prepare the OS, set up a dedicated user, install/setup Rootless Docker and then Sync all the docker compose files from the same repo to the appropriate server and launch/update as necessary. I also use it to centrally administer any cron jobs like for backup.

Basically if I didn’t forget anything (which is always possible) I should be able to pick a brand new RPi with an SSD and replace one of mine with a single command.

It also allows me to keep my entire setup “documented” and configured in a single git repository.

@deepdive@lemmy.world

While rsync is great, I recovered partially from an outtage… Containers with databases need special care: dumping there database…

Lesson learned !

Outcide

Back everything up
rm -rf /
Now rebuild.

Congratulations, you now know what’s required. :-P

@RegalPotoo@lemmy.world

Infrastructure as code/config as code.

The configurations of all the actual machines is managed by Puppet, with all its configs in a git repo. All the actual applications are deployed on top of Kubernetes, with all the configurations managed by helmfile and also tracked in git. I don’t set anything up - I describe how I want things configured, and the tools do the actual work.

There is a “cold start” issue in my scheme - puppet requires a server component that runs on Kubernetes but I can’t deploy onto kubernetes until the host machines have had their puppet manifests applied, but at that point I can just read the code and do enough of the config by hand to bootstrap everything up from scratch if I have to

@ikidd@lemmy.world

I run everything on a 2 node proxmox cluster with ZFS mirror volumes and replication of the VMs and CTs between them, run PBS with hourly snapshots, and sync that to multuple USB drives I swap off site.

The docker VM can be ZFS snapshotted before major updates so I can rollback.

idunnololz

I eat a cyanide tablet. Drive won’t fail on me if I’m dead. Taps temple

@DLSantini@lemmy.ml

Pre…pare…? What’s that? Some sorta fruit?

You just finished setting up all your services and it works fine - how do you now prepare for eventual drive failure?

You just finished setting up all your services and it works fine - how do you now prepare for eventual drive failure?

Selfhosted