this post was submitted on 19 Mar 2026

51 points (98.1% liked)

Selfhosted

56957 readers

921 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Luck is not a Disaster Recovery Plan (sh.itjust.works)

submitted 2 hours ago by jubilationtcornpone@sh.itjust.works to c/selfhosted@lemmy.world

8 comments fedilink hide all child comments

I recently made a huge mistake. My self-hosted setup is more of a production environment than something I do for fun. The old Dell PowerEdge in my basement stores and serves tons of important data; or at least data that is important to me and my family. Documents, photos, movies, etc. It's all there in that big black box.

A few weeks ago, I decided to migrate from Hyper-V to Proxmox VE (PVE). Hyper-V Server 2019 is out of mainstream support and I'm trying to aggressively reduce my dependence on Microsoft. The migration was a little time consuming but overall went over without a hitch.

I had been using Veeam for backups but Veeam's Proxmox support is kind of "meh" and it made sense to move to Proxmox Backup Server (PBS) since I was already using their virtualization system. My server uses hardware raid and has two virtual disk arrays. One for VM virtual disk storage and one for backup storage. Previously, Veeam was dumping backups to the backup storage array and copying them to S3 storage offsite. I should note that storing backups on the same host being backed up is not advisable. However, sometimes you have to make compromises, especially if you want to keep costs down, and I figured that as long as I stayed on top of the offsite replications, I would be fine in the event of a major hardware failure.

With the migration to Proxmox, the plan was to offload the backups to a PBS physical server on-site which would then replicate those to another PBS host in the cloud. There were some problems with the new on-site PBS server which left me looking for a stop-gap solution.

Here's where the problems started. Proxmox VE can backup to storage without the need for PBS. I started doing that just so I had some sort of backups. I quickly learned that PBS can replicate storage from other PBS servers. It cannot, however, replicate storage from Proxmox VE. I thought, "Ok. I'll just spin up a PBS VM and dump backups to the backup disk array like I was doing with Veeam."

Hyper-V has a very straight forward process for giving VM's direct access to physical disks. It's doable in Proxmox VE (which is built on Debian) but less straight forward. I spun up my PBS VM, unmounted the backup disk array from the PVE host, and assigned it as mapped storage to the new PBS VM. ...or at least I thought that's what I did.

I got everything configured and started running local backups which ran like complete and utter shit. I thought, "Huh. That's strange. Oh well, it's temporary anyways." and went on with my day. About two days later, I go to access Paperless-ngx and it won't come up. I check the VM console. VM is frozen. I hard reset it aaaannnnddd now it won't boot. I start digging into it and find that the virtual HDD is corrupt. fsck is unable to repair it and I'm scratching my head trying to figure out what is going on.

I continued investigating until I noticed something. The physical disk id that's mapped to the PBS VM is the same as the id of the host VM storage disk. At that point, I realize just how fucked I actually am. The host server and the PBS VM have been trying to write to the same disk array for the better part of two days. There's a solid chance that the entire disk is corrupt and unrecoverable. VM data, backups, all of it. I'm sweating bullets because there are tons of important documents, pictures of my kids, and other stuff in there that I can't afford to lose.

Half a day working the physical disk over with various data recovery tools confirmed my worst fears: Everything on it is gone. Completely corrupted and unreadable.

Then I caught a break. After I initially unmounted the [correct] backup array from PVE it's just been sitting there untouched. Every once in a great while, my incompetence works out to my advantage I guess. All the backups that were created directly from PVE, without PBS, were still in tact. A few days old at this point but still way better than nothing. As I write this, I'm waiting on the last restore to finish. I managed to successfully restore all the other VM's.

What's really bad about this is I'm a veteran. I've been in IT in some form for almost 20 years. I know better. Making mistakes is OK and is just part of learning. You have to plan for the fact that you WILL make mistakes and systems WILL fail. If you don't, you might find yourself up shit creek without a paddle.

So what did I do wrong in this situation?

First, I failed to adequately plan ahead. I knew there were risks involved but I failed to appreciate the seriousness of those risks, much less mitigate them. What I should have done was go and buy a high capacity external drive, using it to make absolutely sure I had a known good backup of everything stored separately from my server. My inner cheapskate talked me out of it. That was a mistake.
Second, I failed to verify, verify, verify, and verify again that I was using the correct disk id. I already said this once but I'll repeat it: storing backups on the host being backed up is ill advised. In an enterprise environment, it would be completely unacceptable. With self-hosting, it's understandable, especially given that redundancy is expensive. If you are storing backups on the server being backed up, even if it's on removable storage, you need to make sure you have a redundant offsite backup and that it is fully functional.

Luck is not a disaster recovery plan. That was a close call for me. Way too close for my comfort.

top 8 comments

sorted by: hot top controversial new old

[–] otacon239@lemmy.world 9 points 1 hour ago* (last edited 1 hour ago)

Hindsight is 20:20, but for anyone else reading this, my method for server transfers like this is to have a physical offline backup from the start of the transfer process. (obviously would need more disks if you have a big array, but at this scale you should have enough experience to handle this)

Once I have the physical backup, I set it aside, unplugged, until the entire process is done and I confirm it all went well. Then I feel safe enough to use that drive again after the first week or so goes smoothly.

[–] fozid@feddit.uk 4 points 1 hour ago

Appreciate hearing about your experience 👍 I dont use any of those tools, but self host a complex setup, and is always good to hear how people operate and resolve issues, even if it is luck that saves the day 👍

[–] irmadlad@lemmy.world 9 points 2 hours ago

storing backups on the host being backed up is ill advised

I've had this notion for a long time. I do store backups on a separate drive on the server, but those are replicated almost immediately, elsewhere. I learned my backup lesson quite a while ago and I do not wish to repeat that disaster.

[–] Brkdncr@lemmy.world 12 points 2 hours ago

Sounds like you’re missing an on-site copy on different storage and an offsite copy.

[–] Infernal_pizza@lemmy.dbzer0.com 6 points 2 hours ago (1 children)

If you hadn't got lucky with the backup array would your existing off site backups have been any help or were they too out of date?

[–] jubilationtcornpone@sh.itjust.works 5 points 2 hours ago (1 children)

To a point. I completely revamped the off site backups when I ditched Veeam. Four (out of probably ten) VM's were able to be backed up off-site successfully in PBS. I think I restored two of those because they were newer than what I had locally. The rest never made it off-site, probably due to the local PBS VM choking on disk write conflicts.

[–] Infernal_pizza@lemmy.dbzer0.com 1 points 1 hour ago

Ah I see, I didn't realise you'd got rid of the old backups. Glad you managed to get most of it back!

And at least you have an off site backup, I still haven't found a decent way to do it that isn't prohibitively expensive

[–] civ@lemmy.civl.cc 1 points 2 hours ago

Yikes. I backup to a local PBS VM too, but I'm passing the NVMe controller through as a PCI device and it works with no issues. I do also backup to the cloud and an external SSD, I'm not just trusting my proxmox host