Secure personnal backup in the Cloud(s) using Linux

Like everyone, I have important data on my computer. Like everyone, I have a backup (several, actually) of this important data —you do too, don’t you?— But while this backup is good enough in case I have a hardware failure, it won’t help me if my apartment gets flooded or catches fire. That’s because the data and its backup are stored in the same place. Several solutions exist.

I could burn discs. I used to burn CDs (mostly for photographs), then I briefly began burning DVDs, before I realized that I could not keep up with the data this way:

  • More and more important papers (bills, invoices…) are provided in electronic form, so that it’s become logical to just manage all these papers on the computer, with the help of a scanner.
  • Digital photographs get bigger and bigger with each new generation of digital cameras…
  • And discs are not well suited to data that changes over time.

I could have used a backup-only hard disk: I would have bought the disk, and backed-up the data on this disk, and then I would have brought it at some relatives’ place. But this method creates long delays while the data is physically travelling, which implies that I would probably have made no more than one backup per month. There had to be a better way.

The Cloud

I finally chose to use “the Cloud”, in other words network storage on the Internet. There are some nice, very affordable offers, such as Amazon’s Glacier.

But for my humble personal needs, I chose a different course, and went the free way, relying only on freely available Internet storage. However, the technique exposed below is perfectly usable with paid accounts, which provide bigger storage spaces and better reliability.

Besides, in order to ease the backup process, I selected only “mountable” storage offers; most are. For my backups, I currently use:

I chose these because they are rather easy to use on Debian Linux. I can add more if I need to. I may eventually even use storage plans that do not allow “mounting” in the Linux filesystem, but allow synchronizing… In case one of these storage services disappears, I only need to replace it, and synchronize the data again (read further).

Anyway, the different places where data resides (mount points, or places where data gets synchronized) have to be merged somehow. For this task, I chose mhddfs, again because it is easily usable on Debian Linux, and also because I’ve read good reports on its reliability on the Internet. Now I have one single mount point, worth 80GB, transparently handled by mhddfs.

The security

Having network storage is nice, but given the level of (lack of) trust I grant this kind of storage regarding privacy, it is essential that the data (and much of the metadata) gets encrypted on my own computer. Besides, as I only have ADSL at home, which implies a very poor upload speed, I need to be able to only send data for files that have changed locally. The tools I chose are :

  • rsyncrypto for the encryption : This tool (once again readily available in Debian) encrypts files in a very nice way:
    • The encryption is rsync-friendly, although this did not help much since rsync proved to be too stressful to some remote storage mount points (eg: Mega).
    • Each file is encrypted using its own key, so that the accidental discovery of a file’s key does not disclose the contents of other files.
    • Both the file-names and the paths (directory structure) get completely hidden behind a list of actual random file-names.
    • Any file can be individually decrypted without having to setup some kind of filesystem, like EncFS for example would require; a securely-kept file-map makes finding the right files to process a breeze.
    To be honest, though, I am a bit concerned that this tool has not seen any update since 2013… Please send me suggestions if you know of a better alternative for my use case ;-)
  • some custom scripts for easing the backup from a local cache of rsyncrypto-encrypted data to the mhddfs mount point.

Data layout

Here’s the layout that I’m using, on which my helper scripts are based:

  • /backup/local/ is where encryption is done, and where I keep log files and such.
  • /backup/local/megafuse.cache/ is a cache for the Mega mount point, as my /tmp is not suitable for the task.
  • /backup/local/rsyncrypto.enc/ is where encrypted files are stored.
  • /backup/secret/rsyncrypto.crt is the big secret from rsyncrypto initialization.
  • /backup/secret/rsyncrypto.key is the master key to rule them all, and must be kept from Sauron.
  • /backup/secret/rsyncrypto.filelist is the list of files/directories to encrypt.
  • /backup/secret/rsyncrypto.filemap gives the mapping between real paths and random names.
  • /backup/secret/rsyncrypto.keys/ is where per-file keys are stored.
  • /backup/.mega/ is the mount point for Mega.
  • /backup/.hubic/ is the mount point for Hubic.
  • /backup/.yandex/ is the mount point for Yandex disk.
  • /backup/cloud/ is the mount point for the union of the above cloud storage.

Preparations

The first step was to create the master key and certificate:

openssl req -nodes -newkey rsa:1536 -x509 \
  -keyout /backup/secret/rsyncrypto.key \
-out /backup/secret/rsyncrypto.crt

Then I just had to put into /backup/secret/rsyncrypto.filelist the list of places that I wished to backup, for example:

/nas/shared/Paperwork/
/nas/shared/financial/
/nas/shared/sharedMaildir/
/nas/shared/photos/
/nas/private/yves/Maildir/
/nas/private/iris/Maildir/

Finally, on one of my network storage accounts, I created a Backup directory.

Usage

The first helper script handles the encryption of data. Its only purpose is to make sure I do not forget or mistype some parameters:

#!/bin/bash
# usage: rsyncrypto.sh

(date
rsyncrypto -v --delete --changed --trim=0 --filelist \
--name-encrypt=/backup/secret/rsyncrypto.filemap \
/backup/secret/rsyncrypto.filelist \
/backup/local/rsyncrypto.enc/ \
/backup/secret/rsyncrypto.keys/ \
/backup/secret/rsyncrypto.crt
date) \
1> >(tee /backup/local/rsyncrypto.log) \
2> >(tee /backup/local/rsyncrypto.err)

The second helper script handles the proper mounting of the different network storage accounts, and their grouping under a single mount point:

#!/bin/bash
# usage: cloud_mount.sh
# Ctrl+C to abort

fs=(mega hubic yandex mhddfs)

fs_mega='/backup/.mega'
fs_hubic='/backup/.hubic'
fs_yandex='/backup/.yandex'
fs_mhddfs='/backup/cloud'

mount_hubic=(/usr/bin/hubicfuse "$fs_hubic" -o noauto_cache,sync_read,allow_other)
mount_mega=(bg_log /backup/local/megafuse.log /usr/bin/MegaFuse -c /backup/local/megafuse.conf)
mount_yandex=(mount -t davfs https://webdav.yandex.com/ "$fs_yandex")
mount_mhddfs=(/usr/bin/mhddfs "${fs_mega},${fs_yandex},${fs_hubic}" "$fs_mhddfs" -o logfile=/backup/local/mhddfs.log)

function bg_log() {
local log="$1"; shift
"$@" &>"$log" &
}
function ensurefs() {
local -n loc=fs_$1
local -n cmd=mount_$1
if ! mount | grep -qF "$loc"; then
"${cmd[@]}"
while ! mount | grep -qF "$loc"; do sleep 5s; done
fi
}
function leave() {
for f in $(printf '%s\n' "${fs[@]}" | tac); do eval umount "\$fs_$f"; done
exit 1
}
trap leave INT QUIT ABRT TERM

set -x
for f in $(printf '%s\n' "${fs[@]}"); do ensurefs $f; done

After both of the above scripts have run, the time has come to actually backup the data, which is handled by the next helper script:

#!/bin/bash
# usage: sync.sh [-C]
# -C: overwrite cloud-list cache data

MAP='/backup/secret/rsyncrypto.filemap'
CACHE='/backup/local/cloud.find'
LOCAL='/backup/local/rsyncrypto.enc'
CLOUD='/backup/cloud/Backup'

# $1: sign, $2: file, $3: old size, $4: new size
function log() {
local realf="$(grep -E -a -m 1 -o "/$2 [[:print:]]+" "$MAP" | cut -d' ' -f2)"
printf '%s %s (%10s → %10s) %s\n' "$1" "$2" "$3" "$4" "$realf"
}

(
echo CACHE $(date)
[ "$1" == '-C' ] \
&& find "$CLOUD/" -type f -printf '%f\tc\t%s\n' >"$CACHE"

echo SYNC $(date)
{ { find "$LOCAL/" -type f -printf '%f\tl\t%s\n'; cat "$CACHE"; } | sort
echo
} | {
IFS=$'\t' read f t s
while true; do case "$t" in
'')
break
;;
c)
f1="$f"; s1="$s"; IFS=$'\t' read f t s
if [ "$f" == "$f1" ]; then
if [ "$s" != "$s1" ]; then
# FILE UPDATE
log '±' "$f" "$s1" "$s" &
cp -f "$LOCAL/$f" "$CLOUD/${f:0:2}/" \
&& sed -i "/^$f"$'\t/s/\tc\t.*$/\tc\t'"$s/" "$CACHE"
fi
IFS=$'\t' read f t s
else
# OLD FILE
log '−' "$f1" "$s1" '' &
rm -f "$CLOUD/${f1:0:2}/$f1" \
&& sed -i "/^$f1"$'\t/d' "$CACHE"
[ $(grep -c "^${f1:0:2}" "$CACHE") -lt 2 ] \
&& rm -rf "$CLOUD/${f1:0:2}"
fi
;;
l)
# NEW FILE
log '+' "$f" '' "$s" &
grep -q "^${f:0:2}" "$CACHE" || mkdir "$CLOUD/${f:0:2}"
cp -f "$LOCAL/$f" "$CLOUD/${f:0:2}/" \
&& printf '%s\tc\t%s\n' "$f" "$s" >>"$CACHE"
IFS=$'\t' read f t s
;;
esac; done
}

echo END $(date) ) 2>&1 \
| tee "${CACHE%/*}/sync.log"

This script must be used with the -C option the first time, and each time there is any doubt concerning the quality of the local cache of the cloud’s contents. This custom script sort-of does what rsync should have done: it knows what is on the remote storage (or scans the mount point in order to know), and then writes the data that needs to be written. However, it ends up being better suited than rsync, because earlier tests showed that some network storage offers had a hard time handling a large quantity of files in a single directory. This script thus writes each file in a subdirectory named after the file’s first two letters.

After everything has been backed up, there only remains to unmount each mount point. Sometimes, one mount point or another gets stuck, and then the right commands need to happen in the right order, else the terminal gets stuck too. In these cases, it helps to kill the processes that handle the Fuse filesystems. I have a helper script for unmounting, too, but it is very specific to my situation. Here it is, as an example; I won’t pretend it is a good example, though :-D

#!/bin/bash
# usage: cloud_umount.sh [-t] [-9]
# -t: test if Mega is mounted before attempting to umount
# -9: force-kill processes while unmounting

while getopts t9 opt; do case "$opt" in
t) killif=true ;;
9) kill_9=true ;;
esac; done

[ -n "$killif" ] \
&& { mount | grep -qF _mega || exit 0; } \
&& { [ ! -e /backup/Mega.ERR ] || exit 0; } \
&& ls /backup/.mega/ 2>/backup/Mega.ERR | grep -qF Backup \
&& rm -f /backup/Mega.ERR

if [ $? -ne 0 ]; then
set -x
[ -n "$kill_9" ] && killall -9 MegaFuse
[ -n "$kill_9" ] && killall -9 hubicfuse
fuser -k -M ${kill_9:+-9} -m /backup/cloud
fuser -k -M ${kill_9:+-9} -m /backup/.yandex
fuser -k -M ${kill_9:+-9} -m /backup/.mega
fuser -k -M ${kill_9:+-9} -m /backup/.hubic
umount /backup/cloud
umount /backup/.yandex
umount /backup/.mega
umount /backup/.hubic
fi

And then?

Sometimes, I run my last helper script, the purpose of which is to clean rsyncrypto’s file-map by removing references to files that have been deleted:

#!/bin/bash
# usage: cleanFilemap.sh

tr '\0' '\n' </backup/secret/rsyncrypto.filemap \
| while read e f; do if [ -f "/$f" ]; then echo "$e $f"; fi; done \
| tr '\n' '\0' >/backup/secret/rsyncrypto.filemap.new
diff \
<(tr '\0' '\n' </backup/secret/rsyncrypto.filemap) \
<(tr '\0' '\n' </backup/secret/rsyncrypto.filemap.new) \
| sort -k1,1 -k3,3 | grep '^[<>]' \
&& mv -i /backup/secret/rsyncrypto.filemap.new /backup/secret/rsyncrypto.filemap \
|| rm -f /backup/secret/rsyncrypto.filemap.new

One last thing: all the encrypted data is worthless if there is no way to decrypt it. It is thus important, after each new backup, to ensure that the contents of the /backup/secret directory, as well as the credentials for each network storage account, are saved somewhere else (a USB flash drive, whatever…). This amounts to less than 1GB… And of course, some random file must be restored once in a while to ensure that restoring works as expected.

Ajouter un commentaire

Le code HTML est affiché comme du texte et les adresses web sont automatiquement transformées.

La discussion continue ailleurs

URL de rétrolien : http://yalis.fr/cms/index.php/trackback/82

Fil des commentaires de ce billet