Creating a backup solution with S3 and cron

August 05, 2020

It’s important to always have a backup of your data. In general, you should view your computer as a tool (replacable), but your data as indispensable (not replacable). I had a boot failure recently that prompted me to make a basic backup solution using S3.

Configure a centralized synced folder

First, I centralized all my data into a single folder that would be backed up. I kept this folder on my Desktop for easy visibility.

I also removed the default Deskop/Documents/Videos links in the Ubuntu file manager. I did this so I do not accidentally store files in any other locations. It was pretty easy to do so, I just followed this guide which removed those folders from Nautilus. You can optionally delete these folders from your root directory.

Create S3 bucket and Configure IAM User

Next, we need to create an S3 bucket, an IAM user (and download its credentials to programmatically interact with S3), and install the aws cli. Sign up for an AWS account if you do not have one.

Follow this guide to learn how to create an IAM User and an S3 bucket. Although not needed, more information on IAM users in general is found here.

Install the AWS SDK

Run aws configure to set up the credentials. Choose the correct region you are creating AWS resources in.

AWS Configure

(or open the ~/.aws/credentials and manually add the credentials)

AWS Credentials

Create Syncing Script

Next, I created a script to sync my local folder with my S3 bucket. The script is as below (folder paths redacted):

aws s3 sync ${LOCAL_FOLDER_PATH} s3://${S3_PATH} \
    --exclude "*/node_modules" \
    --exclude "*/build/" \
    --exclude "*/target" \
    --exclude "*.classpath" \
    --exclude "*.factorypath" \
    --exclude "*.settings" \
    --exclude "*.iml" \
    --exclude "*.project" \
    --no-follow-symlinks \
    --sse \
    --delete

The script syncs the local folder with the s3 path, excluding some files that should not be uploaded (node_modules, target, etc..).

The --no-follow-symlinks flag is set to ignore possible infinite recursion with symlinks.
--sse is set to enable encryption on the files, by default it uses AES-256 encryption when not specified (you can provide your own keys with AWS KMS).
--delete is set to delete any files in the s3 bucket that are not present in the local folder, essentially this means the folders are kept in sync as opposed to the s3 folder only adding new files and not deleting old files.

Schedule the script with cron

Lastly, I configured the script to be run everyday at 6pm.

To view your current user cron configuration, use crontab -l

crontab -l

Edit the crontab to run your script whenever you desire,

crontab -e

Sample of edited crontab to run sync script at 6pm everyday

SHELL=/bin/bash
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
# For details see man 4 crontabs
# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed
# Run S3 sync at 6pm everyday
0 18 * * * /sync-folder-to-s3.sh
# Edit this file to introduce tasks to be run by cron.

Things to note when using crontab

The script will only run if the computer is on when the cron execution time arrives.
The cron will only run for the current user. There are other ways to schedule cron jobs which include system wide options and anacron which makes up missed scheduled executions.