[Pybackpack] backup support types

Thu Aug 23 06:51:35 BST 2007

I've been thinking a bit about the types of things a user would do with
pybackpack and how best to make them more obvious. here are the things I
can think of:

1. use it to make reverse diffs of changes to a set of files over a
period of time - as it is being used now with rdiff-backup. 
   Problems: reverse diff is conceptually difficult to users
             restores are difficult for any specific file on any date 
               from the pybackpack interface
             backups require rdiff-backup on the remote system as well

2. use it to make a copy of files from place one to place two regularly.
This is not exactly a backup in the sense that most sysadmins think of
them. There's no history, there's no going back to a specific date or
time. However, for a huge number of users this is what they think of
when they think backups. They think: copy all of these files to that
other place. Or - make all the files on this place match that other
place. In short backups are just type of synchronization tool which have
the virtue of keeping a separate copy.

3. use it to make N copies of the files to a specific place. This is
sort of line use case 1 but not as intelligent. Instead of diffs some
users want ALL of the items in multiple copies so they can swap the
whole set out. In short, some people are morons. :)

4. encrypt the files before you back them up, keep everything safe and
sound and away from prying eyes of your storage provider.

It seems like for the first case rdiff-backup does what it should,
however the rdiff-backup interface needs to be a bit more pythonic and
involve a lot more in the shape of progress callbacks.  For the second
and third use cases it seems like the user doesn't need the overhead of
rdiff-backup they really just need plain rsync or even just cp/scp.

For the 4th case it seems like duplicity is the only tool for the job.

For case 3 scp/sftp (maybe in the form of python-paramiko so the
interface is nice) would be easiest.

for case 2 rsync helps a lot to speed up synchronizing the copy and
librsync's python interface (which is in both rdiff-backup and
duplicity) would probably be the best place to start.

So, I'm wondering if a reasonable set of goals for multiple
backup-backends would be:

rdiff-backup: add support for specific-file restore and progress
calculation

librsync: a generic rsync interface in pybackpack

sftp/scp: using python-paramiko to shuffle the files along

duplicity: encrypted file stores (requires more config data for this
backup set)

Does this sound complete full of crazy? It's nearly 2am here and I admit
that crazy has a tendency to run amuck. I was mostly thinking that by
having a few different types to need to support to start with it would
ensure that the abstraction layer for pybackpack is not too tied up with
any one.

let me know how ridiculous you think this is and thanks again for this
tool.

-sv