Skip to content
Brian D. Burns edited this page May 7, 2013 · 39 revisions

Storages

Currently supported storages:

  • Amazon Simple Storage Service (S3)
  • Rackspace Cloud Files (Mosso)
  • Ninefold Cloud Storage
  • Dropbox Web Service
  • Remote Server (Protocols: FTP, SFTP, SCP, RSync)
  • Local Storage

Amazon Simple Storage Service (S3)

store_with S3 do |s3|
  s3.access_key_id      = 'my_access_key_id'
  s3.secret_access_key  = 'my_secret_access_key'
  s3.region             = 'us-east-1'
  s3.bucket             = 'bucket-name'
  s3.path               = '/path/to/my/backups'
  s3.keep               = 10
end

You will need an Amazon AWS (S3) account. You can get one here.

AWS Regions

  • us-east-1 - US Standard (Default)
  • us-west-2 - US West (Oregon)
  • us-west-1 - US West (Northern California)
  • eu-west-1 - EU (Ireland)
  • ap-southeast-1 - Asia Pacific (Singapore)
  • ap-southeast-2 - Asia Pacific (Sydney)
  • ap-northeast-1 - Asia Pacific (Tokyo)
  • sa-east-1 - South America (Sao Paulo)

Multipart Uploading

Amazon's Multipart Uploading will be used to upload each of your final package files which are larger than the default chunk_size of 5 MiB. Each package file less than or equal to the chunk_size will be uploaded using a single request. This may be changed using:

store_with S3 do |s3|
  # Minimum allowed setting is 5.
  s3.chunk_size = 10 # MiB
end

Error Handling

Each request involved in transmitting your package files will be retried if an error occurs. By default, each failed request will be retried 10 times, pausing 30 seconds before each retry. These defaults may be changed using:

store_with S3 do |s3|
  s3.max_retries = 10
  s3.retry_waitsec = 30
end

If the request being retried was a failed request to upload a chunk_size portion of the file being uploaded, only that chunk_size portion will be re-transmitted. For files less than chunk_size in size, the whole file upload will be attempted again. For this reason, it's best not to set chunk_size too high.

When an error occurs that causes Backup to retry the request, the error will be logged. Note that these messages will be logged as informational messages, so they will not generate warnings.

Data Integrity

All data is uploaded along with a MD5 checksum which AWS uses to verify the data received. If the data uploaded fails this integrity check, the error will be handled as stated above and the data will be retransmitted.

Server-Side Encryption

You may configure your AWS S3 stored files to use Server-Side Encryption by adding the following:

store_with S3 do |s3|
  s3.encryption = :aes256
end

Reduced Redundancy Storage

You may configure your AWS S3 stored files to use Reduced Redundancy Storage by adding the following:

store_with S3 do |s3|
  s3.storage_class = :reduced_redundancy
end

Rackspace Cloud Files (Mosso)

store_with CloudFiles do |cf|
  cf.api_key   = 'my_api_key'
  cf.username  = 'my_username'
  cf.container = 'my_container'
  cf.path      = '/path/to/my/backups'
  cf.keep      = 5
  cf.auth_url  = 'lon.auth.api.rackspacecloud.com'
end

The cf.auth_url option allows you to provide a non-standard auth URL for the Rackspace API. By default the US API will be used; to use a different region's API, provide the relevant URL for that region. The example above demonstrates usage for the London region.

You will need a Rackspace Cloud Files account. You can get one here.

Ninefold Cloud Storage

store_with Ninefold do |nf|
  nf.storage_token   = 'my_storage_token'
  nf.storage_secret  = 'my_storage_secret'
  nf.path            = '/path/to/my/backups'
  nf.keep            = 10
end

You will need a Ninefold account. You can get one here.

Dropbox Web Service

store_with Dropbox do |db|
  db.api_key    = 'my_api_key'
  db.api_secret = 'my_api_secret'
  # Dropbox Access Type
  # The default value is :app_folder
  # Change this to :dropbox if needed
  # db.access_type = :dropbox
  db.path       = '/path/to/my/backups'
  db.keep       = 25
end

To use the Dropbox service as a backup storage, you need two things:

The default db.access_type is :app_folder. This is the default for Dropbox accounts. If you have contacted Dropbox and upgraded your account to Full Dropbox Access, then you will need to set the db.access_type to :dropbox.

NOTE The first link I provided is a referral link. If you create your account through that link, then you should receive an additional 500MB storage (2.5GB total, instead of 2GB) for your newly created account.

FOR YOUR INFORMATION you must run your backup to Dropbox manually the first time to authorize your machine with your Dropbox account. When you manually run your backup, backup will provide you with a URL which you must visit with your browser. Once you've authorized your machine, Backup will write out the session to a cache file and from there on Backup will use the cache file and won't prompt you to manually authorize, meaning you can run it in the background as normal using for example a Cron task.

Chunked Uploader

The Dropbox Storage uses Dropbox's /chunked_upload API. By default, this will upload the final backup package file(s) in chunks of 4 MiB. If an error occurs while uploading a chunk, Backup will retry the failed chunk 10 times, pausing 30 seconds between retries. If you wish to customize these values, you can do so as follows:

store_with Dropbox do |db|
  db.chunk_size     = 4 # MiB
  db.chunk_retries  = 10
  db.retry_waitsec  = 30
end

Note: This has nothing to do with Backup's Splitter. If you have a Splitter defined on your model using split_into_chunks_of, your final backup package will still be split into multiple files, and each of those files will be uploaded to Dropbox.

Also note that in Backup versions prior to 3.3.0, the Splitter was required to upload files to Dropbox that were larger than 150MB. This is no longer the case. You may still use the Splitter, and you may now split your final backup package into chunks larger than 150MB.

Remote Server (FTP)

store_with FTP do |server|
  server.username = 'my_username'
  server.password = 'my_password'
  server.ip       = '123.45.678.90'
  server.port     = 21
  server.path     = '~/backups/'
  server.keep     = 5
end

TIP use SFTP if possible, it's a more secure protocol.

Remote Server (SFTP)

store_with SFTP do |server|
  server.username = 'my_username'
  server.password = 'my_password'
  server.ip       = '123.45.678.90'
  server.port     = 22
  server.path     = '~/backups/'
  server.keep     = 5
end

Remote Server (SCP)

store_with SCP do |server|
  server.username = 'my_username'
  server.password = 'my_password'
  server.ip       = '123.45.678.90'
  server.port     = 22
  server.path     = '~/backups/'
  server.keep     = 5
end

Remote/Local Server (RSync)

Say you just transferred a backup of about 2000MB in size. 12 hours later the Backup gem packages a new backup file for you and it appears to be 2050MB in size. Rather than transferring the whole 2050MB to the remote server, it'll lookup the difference between the source and destination backups and only transfer the bytes that changed. In this case it'll transfer only around 50MB rather than the full 2050MB.

Note: If you only want to sync particular folders on your filesystem to a backup server then be sure to take a look at Syncers. They are, in most cases, more suitable for this purpose.

Configuring the RSync Storage

There are 3 different modes of remote operation available:

  • :ssh (default) -- Connects to the remote host via SSH and does not require the use of an rsync daemon.

  • :ssh_daemon -- Connects via SSH, then spawns a single-use rsync daemon to allow certain daemon features to be used.

  • :rsync_daemon -- Connects directly to an rsync daemon on the remote host via TCP.

Note that :ssh and :ssh_daemon modes transfer data over an encrypted connection. :rsync_daemon does not.

The following is all of the configuration options available, along with information about there use depending on which mode you are using:

store_with RSync do |storage|
  ##
  # :ssh is the default mode if not specified.
  storage.mode = :ssh # or :ssh_daemon or :rsync_daemon
  ##
  # May be a hostname or IP address.
  storage.host = "123.45.678.90"
  ##
  # When using :ssh or :ssh_daemon mode, this will be the SSH port (default: 22).
  # When using :rsync_daemon mode, this is the rsync:// port (default: 873).
  storage.port = 22
  ##
  # When using :ssh or :ssh_daemon mode, this is the remote user name used to connect via SSH.
  # This only needs to be specified if different than the user running Backup.
  #
  # The SSH user must have a passphrase-less SSH key setup to authenticate to the remote host.
  # If this is not desirable, you can provide the path to a specific SSH key for this purpose
  # using SSH's -i option in #additional_ssh_options
  storage.ssh_user = "ssh_username"
  ##
  # If you need to pass additional options to the SSH command, specify them here.
  # Options may be given as a String (as shown) or an Array (see additional_rsync_options).
  # These will be added to the rsync command like so:
  #   rsync -a -e "ssh -p 22 <additional_ssh_options>" ...
  storage.additional_ssh_options = "-i '/path/to/id_rsa'"
  ##
  # When using :ssh_daemon or :rsync_daemon mode, this is the user used to authenticate to the rsync daemon.
  # This only needs to be specified if different than the user running Backup.
  storage.rsync_user = "rsync_username"
  ##
  # When using :ssh_daemon or :rsync_daemon mode, if a password is needed to authenticate to the rsync daemon,
  # it may be supplied here. Backup will write this password to a temporary file, then use it with rsync's
  # --password-file option.
  storage.rsync_password = "my_password"
  # If you prefer to supply the path to your own password file for this option, use:
  storage.rsync_password_file = "/path/to/password_file"
  ##
  # If you need to pass additional options to the rsync command, specify them here.
  # Options may be given as an Array (as shown) or as a String (see additional_ssh_options).
  storage.additional_rsync_options = ['--sparse', "--exclude='some_pattern'"]
  ##
  # When set to `true`, rsync will compress the data being transerred.
  # Note that this only reduces the amount of data sent.
  # It does not result in compressed files on the destination.
  storage.compress = true
  ##
  # The path to store the backup package file(s) to.
  #
  # If no `host` is specified, this will be a local path.
  # Otherwise, this will be a path on the remote server.
  #
  # In :ssh mode, relative paths (or paths that start with '~/') will be relative to the directory
  # the `ssh_user` is placed in upon logging in via SSH.
  #
  # For both local and :ssh mode operation, if the given path does not exist, it will be created.
  # (see additional notes about `path` below)
  #
  # For :ssh_daemon and :rsync_daemon modes, `path` will be a named rsync module; optionally followed
  # by a path. In these modes, the path referenced must already exist on the remote server.
  #
  storage.path = "~/backups"
end

If no host is configured, the operation will be local and the only options used would be path and additional_rsync_options.

Using Compression:
Only the Gzip Compressor should be used with your backup model if you use this storage option. And only if your version of gzip supports the --rsyncable option, which allows gzip to compress data using an algorithm that allows rsync to efficiently detect changes. Otherwise, even a small change in the original data will result in nearly the entire archive being transferred.
See the Compressor page for more information.

Using Encryption:
An Encryptor should not be added to your backup model when using this storage option. Encrypting the final archive will make it impossible for rsync to distinguish changes between the source and destination files. This will result in the entire backup archive will be transferred, even if only a small change was made to the original files.

Additional Notes Regarding path:
Currently for :ssh mode or when operating locally, the given path will have an additional directory added to it named after the backup model's trigger. For example, if you set path to ~/backups, and your trigger is :my_backup, then the final path where your backup package file(s) will be stored will be ~/backups/my_backup/. As mentioned above, this path will be created if needed.

This will be changed with Backup v4.0. At that time, the creation of this additional directory will no longer be done and your backup package file(s) will simply be stored in the path as given.

Note that this is the current behavior for :ssh_daemon and :rsync_daemon modes. No additional directory will be added to the path given in these modes. However, the path you specify must already exist for these modes.

I encourage you to look into using :ssh_daemon mode. Setting this up can be as simple as adding a rsyncd.conf file (with 0644 permissions) in the $HOME dir of the ssh_user on the remote system (most likely the same username running the backup) with the following contents:

[backup-module]
path = backups
read only = false
use chroot = false

Then simply use storage.path = 'backup-module', making sure ~/backups exists on the remote.

Splitter:
If you use the Splitter with your backup model, understand that the RSync Storage will never remove any files from path. For example, say your backup usually results in 2 chunk files being stored: my_backup.tar-aa and my_backup.tar-ab. Then one day, it results in 3 chunks for some reason - an additional my_backup.tar-ac file. You discover a ton of files you meant to delete the next day, and your backup returns to it's normal 2 chunks. That 3rd my_backup.tar-ac file will remain until you delete it.

Cycling:
The RSync Storage option does not support cycling, so you cannot specify server.keep = num_of_backups here. With this storage, only one copy of your backup archive will exist on the remote, which rsync updates with the changes it detects.

If you're looking for a way to keep rotated backups, you can simply change the path each time the backup runs.
For example, to keep:

  • Monthly backups
  • Weekly backups, rotated each month
  • Daily backups, rotated each week
  • Hourly backups, rotated every 4 hours

Create the following backup model:

Backup::Model.new(:my_backup, 'My Backup') do
  # Archives, Databases...

  # Make sure you compress your Archives and Databases
  # using an rsync-friendly algorithm  
  compress_with Gzip do |gzip|
    gzip.rsyncable = true
  end

  store_with RSync do |storage|
    time = Time.now
    if time.hour == 0   # first hour of the day
      if time.day == 1  # first day of the month
        # store a monthly
        path = time.strftime '%B'             # January, February, etc...
      elsif time.sunday?
        # store a weekly
        path = "Weekly_#{ time.day / 7 + 1 }" # Weekly_1 thru Weekly_5
      else
        # store a daily
        path = time.strftime '%A'             # Monday thru Saturday
      end
    else
      # store an hourly
      path = "Hourly_#{ time.hour % 4 + 1 }"  # Hourly_1 thru Hourly_4
    end
    storage.path = "~/backups/#{ path }"
  end
end

Then simply setup cron to run the job every hour.
Note that this will require space for 27 full backups.
You could use a different storage.host for the monthly, weekly, etc...
Remember that for :ssh_daemon and :rsync_daemon modes, each of these paths must already exist.

Or of course, think of your own use cases (and let me know if you figure out any good ones!).

Local Storage (Copy)

store_with Local do |local|
  local.path = '~/backups/'
  local.keep = 5
end

If multiple Storage options are configured for your backup, then the Local Storage option should be listed last. This is so the Local Storage option can transfer the final backup package file(s) using a move operation. If you configure a Local Storage and it is not the last Storage option listed in your backup model, then a warning will be issued and the final backup package file(s) will be transferred locally using a copy operation. This is due to the fact that the each Storage configured is performed in the order in which you configure it in you model.

Storage File Size Restrictions

Most storage services place restrictions on the size of files being stored. To work around these limits, see the Splitter page.

Storage Cycling

Each Storage (except for RSync) supports the keep setting, which specifies how many backups to keep at this location.

store_with SFTP do |sftp|
  sftp.keep = 5
end

Once the keep limit has been reached, the oldest backup will be removed.

Note that if keep is set to 5, then the 6th backup will be transferred and stored, before the oldest is removed.

For more information, see the Cycling page.

Default Configuration

If you are backing up to multiple storage locations, you may want to specify default configuration so that you don't have to rewrite the same lines of code for each of the same storage types. For example, say that the Amazon S3 storage always has the same access_key_id and secret_access_key. You could add the following to your ~/Backup/config.rb:

Backup::Storage::S3.defaults do |s3|
  s3.access_key_id     = "my_access_key_id"
  s3.secret_access_key = "my_secret_access_key"
end

So now for every S3 database you wish to back up that requires the access_key_id and secret_access_key to be filled in with the defaults we just specified above, you may omit them in the actual store_with block, like so:

store_with S3 do |s3|
  s3.bucket = "some-bucket"
  # no need to specify access_key_id
  # no need to specify my_secret_access_key
end

You would set defaults for CloudFiles by using:

Backup::Storage::CloudFiles.defaults do |storage|
  # ...and so forth for every supported storage location.
end