Background and Scope
Data handling and backup can be hard, and everything below is my work-in-progress notes and effort towards achieving a system that works for me.
Being self-employed in technology, I’ve got quite an array of devices, machines, and systems to support even just in my household. Keeping everything backed up and synced in a way that’s sufficiently hands-off and automatic to be reliable, as well as easy to recover from in the event of data loss, is not a simple task. In particular, it’s critical to adopt a 3-2-1 backup strategy: 3 copies of your data, 2 different physical copies, with at least 1 off-site copy.
In the simplest “one laptop” case, this is straightforward: have one backup regularly scheduled to a USB hard drive. Maybe every time you sit down at your desk, you plug this in, and Time Machine (Mac) or File History (Windows) takes care of it. Then, on top of that, install your cloud backup provider of choice to regularly take things off-site. Hint: IDrive.com is an excellent deal, and supports client-side encryption to keep everything private. 3 copies of the data (Laptop, back up USB disk, and IDrive), 2 different physical media (laptop and USB disk, though also IDrive), and 1 off-site copy (IDrive). Why these 3 copies? Two different copies so when you delete a file accidentally, you can quickly go find the backup. So when your laptop gets stolen from the car, you’ve still got everything on your desk. One remote backup so that if your house gets robbed or burns down, your data is still safe in the offsite backup. Or so that when you delete a file while you’re on vacation, you can grab it from the cloud backup. 3-2-1 provides near-perfect data threat coverage, which is why it’s the gold standard.
My case is a bit trickier – I have a bunch of different machines, LOTS of data, some big chunks that need to get shifted around kind of frequently (big virtual machines I shelve and retrieve for infrequent jobs), and so on. In particular, the simple 3-2-1 case above doesn’t work because “production” data, or the “primary copy” isn’t all in one spot. Some “primary” data lives on removable drives. Some lives on my NAS. Some lives on one laptop, some on another machine.
I’ve chosen to build my data management and backup scheme around a Synology NAS, specifically a DS1019+. Synology DSM offers an immense array of software tools to handle nearly anything: out of the box, you get Photo Station/Video Station/Music Station for media serving, “Media Server” for making that content available via DLNA, Synology Drive as basically a feature-by-feature clone of Google Drive, including both backup and “synced folder” functionality… And that’s without even touching third-party packages. With the more powerful units like the “plus” series my DS1019+ hails from, you also get virtual machine capability and, in particular, native Docker support. With Docker, the entire Hub full of containerized applications is ready to install on your NAS. Interestingly, this means there are two officially-supported and almost equally-easy ways to install Plex on your Synology NAS.
I’ve got a few different goals I’d like to achieve with my home IT setup:
- A Dropbox-like shared folder, synced between all my machines, with the main files I use day to day. This should have the ability to turn off individual files or folder trees, as to keep some heavy-weight data off of machines that won’t need it. This should include version history, so deleted or modified files can be rolled back in time.
- A means of regular system-level backup from each machine to the NAS, such that a complete recovery would be possible from the NAS copy. That is, any given morning I might leave the house, if my laptop were stolen from the car, I wouldn’t lose any data. This might be an image-level or file-level backup, though ideally it would have image-level restore capability (see below).
- A means of copying that system-level backup to cloud storage for offsite safe-keeping.
- Data where the primary copy is on removable storage must also be 3-2-1 backed up, first to the NAS and then to cloud storage, and this process should be as automatic as possible. Ideally, this should be set up as immediate sync to NAS as soon as the drive gets connected. The use-case here is my Photos drive where I keep my Lightroom catalog and video to edit. The media library is simply too big to remain on internal storage, and primary storage on the NAS device would be too slow for access.
- Probably some more I’m leaving out, TBD.
Here’s some interesting stuff I’ve come across.
Image Backup vs File Backup
File backup is a scheme where your system is copied to a backup location on a file-by-file basis. For instance, Dropbox syncs your local folder with a cloud folder, and you can see the individual files and folders on either side. Image backup is a scheme where your entire system is boiled down into one monolithic image file on disk (whatever disk that is). You might be able to browse your files, but only with some kind of tool that lets you peer inside the image. Acronis uses such a scheme, backing up your entire system to a .tib image, which contains everything about your system’s disk. You can browse the image, but only going through the Acronis interface. The image file contains sufficient data to restore your entire system exactly as it was at time of backup.
Some schemes adopt a bit of a blend of both, specifically Time Machine. TM is a file-level backup, and doesn’t bother copying a number of system directories easily restored by the macOS installer. Yet, it DOES save everything to a monolithic disk image (or slightly-less-monolithic .sparsebundle), and allows for “exactly as it was” system-level restore. You can browse the backup filesystem using either the Time Machine UI, or simply by mounting the .sparsebundle on your mac.
Convenient Features of Synology DSM
DSM has come a long way since I started using it, in particular by implementing support for Btrfs. Btrfs is a copy-on-write filesystem, which has huge implications for deduplication and filesystem integrity. It enables a concept called “snapshotting” where, in effect, data is “tagged” at a particular point in time, and any additions/modifications/deletions from that point forward are stored as a delta. Synology DSM uses this to allow for scheduled snapshots, where ever hour, say, the entire filesystem is frozen in time without taking up any more disk space than “every unique byte that’s ever been written.” This is a great way to make a kind of pseudo-backup that protects against user error/accident like “rm -r” or accidentally saving a document after deleting half its contents.
Synology Drive Server takes this a step further, and offers versioning, where, if enabled, every file change or deletion creates a new version of the share that can be rewound to any point in time or any version of any file, and any recently deleted files can be recovered. Eventually, depending on version retention settings, old versions and files deleted too long ago can be expunged automatically.
Finally, across all shared folders, DSM supports the concept of recycling bins, where any deleted files will be placed in purgatory until some scheduled future deletion.
Keep these features in mind when considering the following.
Synology Universal Search for Deleted Files
Synology Universal Search is cool, since it lets you search for any document name across your entire NAS (at least, within indexed folders), as well as search on contents of certain kinds of files. You might be wondering, with Synology Drive file history and Recycle Bins, can I search for deleted files or old versions? No. Universal Search DOES NOT index recycle bins, and DOES NOT index the Drive file history database. I’ve tested and searched at length, and can’t find a way to make it search in these, even though at least in the case of recycle bins, it’s only a ‘#’ away. Synology support swears Universal Search WILL look in the Drive history database, but I can’t make that work or find a way to enable it, so I’m pretty sure it’s not actually true. Given the recent test that US ignores #recycle directories, it seems even more likely.
Using GoodSync with Synology
Of all “folder sync” utilities, GoodSync is the most robust I’ve come across in terms of feature set. I say “folder sync” utilities because lots of tools do “syncing of files.” For instance, Dropbox, Synology Drive, Sync.com, etc etc. The set of tools is smaller that tackles the task “take a folder on the left and a folder on the right and make them equal” – tools like FreeFileSync, SyncToy, SyncBack, and of course GoodSync.
There are actually a handful of ways to use GoodSync with a Synology NAS – since GS supports all kinds of folder pairs, you can have your pick of access technologies on the NAS side, like FTP, SMB, AFP, and so on. I’ll call “use a normal remote file access scheme on one side” method 1. The other way, method 2, is to use GoodSync Connect on the Synology side. This is a DSM package, available in Package Center, that theoretically improves performance by working around the inherent performance limitations of whatever file access protocol you would otherwise use.
GoodSync and Synology Recycle Bins
One question I have about GS Connect is how it behaves with the recycle bin functionality on the NAS, since that’s quite a useful feature to avoid data loss. If you connect the NAS-side via SMB (or presumably FTP/AFP/etc), it works just as you’d hope: delete a file on the left-side, GS syncs the deletion to the right side (on the NAS), and that file goes into the recycle bin instead of actually getting deleted. There’s one caveat: Synology supports ITS OWN recycle bin functionality, INCLUDING versioning, in Options>Save deleted/replaced files to recycle bin. If you do this, the “deleted” file actually just gets moved into a hidden _gsdata_ folder, and thus never triggers the Synology recycle bin functionality. Watch out for this, since it’s a bit confusing. Although the upshot of doing it this way is that Universal Search will index things inside the _gsdata_ version of a recycle bin, but NOT the Synology #recycle directory. If you connect the NAS-side via GS Connect, things you delete from the local-side of the sync pair get deleted from the NAS without #recycling. If you’re going to use the GS server on your Synology, you’ll want to use GoodSync’s built-in recycle bin, if you want a recycle bin.
Acronis True Image Considerations
Acronis True Image is an image-level backup utility that works across Mac and Windows. With respect to Mac clients, it’s a little less-ideal than Time Machine because, while you can opt to keep a version history and while you can browse files on the backup, the interface to do so is way less seamless vs TM. However, Time Machine backed up to a network location has always been a bit fiddly and error prone. I’ve never had a network TM backup that didn’t eventually require substantial manual repair effort, or a complete reset. Mac disk images, from an engineering perspective, are unfortunately a bit old-fashioned and fragile.
One major question with respect to keeping a True Image backup on the NAS is whether I can simply mirror that backup to the cloud, in a sane way, to satisfy requirement 4 above. It’s probably OK in my case to keep an image-level backup in the cloud AND file-level backups separately, even though that’s duplicated storage: IDrive is, for now, very cheap. What I DON’T want to do is copy the entire .tib image up to the cloud every time it changes. Luckily, IDrive advertises “block-level incremental backups and restores to optimize transfer speed,” which I’d like to test in the case of a .tib backup. We’d expect that, if I make an Acronis backup, then add one file to the desktop, then make an incremental, the .tib file checksum will change, but the IDrive incremental backup of that file should happen almost-instantly.
To test this, I set up a macOS VM, and made a True Image backup of it to my NAS. I then set IDrive running on the Synology to back up that directory. True Image it seems compresses its backup image, and the initial image size was 8.61GB. I created an IDrive backup set and clicked “go” on the Synology, and used my ISP’s router to measure that, indeed, about that much data got uplinked in 15 minutes or so (9.04GB per the router and my calculation based on “average traffic over the measurement period”).
I then downloaded a beefy Ubuntu 20.04 ISO as a stand-in for “changed a bunch of stuff on the computer” and fired off a new True Image backup. The image file grew from 8.61GB to 11.98GB, or by about 3.37GB – notably more than the 2.27GB the downloaded image takes up, so I’m not sure where the extra came in. The image backup IS set to store versions, so perhaps enough other files changed to require this delta space.
Regardless, I then ran an incremental backup on the same set with IDrive on the Synology. It claimed to be backing up 11.98GB, but the initial few GB blew by very quickly, and looking at the router statistics, only a bit over 2.5GB actually uploaded in the roughly 15 minutes the incremental took. Note that the incremental was quite a bit slower than the initial backup, and I’m not sure if that’s down to processing time or down to “IDrive throttled my upload for whatever reason.” So the long and short of it is, I’ve verified that, indeed, IDrive does a block-level incremental backup, and will only upload the CHANGES in a given file. Thus, IDrive is a good option for replicating image files from NAS storage to cloud storage.
“Acronis True Image” is damaged and can’t be opened (Mac error)
Installing and uninstalling True Image 2020, I ran into an error where I couldn’t reinstall TI from a fresh download. I came across this forum post which led me to the answer: Acronis makes a “cleanup tool for <windows/mac>” which you can run to eliminate all traces of True Image. If you run into the error above, use the cleanup tool, then installation should succeed, which is how things worked out for me. As such, it seems like the cleanup tool is probably the best way to accomplish a clean, full uninstall if you ultimately decide against True Image.
To Be Continued
There’s much more to report here, but in the interest of ever publishing this article I’ll leave it at this for now.