Sunday, June 08, 2014

RAID, Archives and Tape v Disk

There's a long raging question in I.T. Operations: How best to achieve data? [What media to use?]
This question arose again for me as I was browsing retail site.

Conclusions:

  1. The break-even for 2.5TB/6.25TB tapes is 85 and 140 tapes (compressed/uncompressed), or
    • $13,150 and $17,400 capital investment.
  2. At just 2 times data duplication, uncompressed tapes are not cost effective.
    • Enterprise backup show data duplication rates of 20-50 times.
  3. Compressed tapes are cost-effective up to 5-times data duplication.
    • If you run 10 Virtual Machines and do full backups, you've passed that threshold.


An LTO-5 internal SAS drive (HP): $2550, 3TB tapes $45, 6.25TB tapes, $90.

The 6250GB is based on 2.5:1 compression, according to LTO.ORG, or 2500GB uncompressed. This would also affect the claimed 400MB/sec write speed.

LTO tapes now support LTFS, the Linear Tape File System, available across multiple platforms.

So, is this really $15/TB, or $37.50/TB for tape, vs $80/TB for 1TB USB drives?
Is the gap $65/TB or $40/TB?

At a minimum, you need two drives for archival use because you need to read/rewrite the data on tapes regularly, to both know you still can (many parts of the chain can fail) and tightly wound tape on spools has a habit of "print-through", of the bits getting corrupted.

You also need a SAS Host-Bus-Adaptor (HBA), to connect the two drives. One from HP for ~$300 and another for $215.

If you're supporting the compressed writes at 400MB/sec, you'll need 10Gbps ethernet interfaces. At least two for this important server, and that means a 10Gbps switch and 10Gbps HBA's in each client machine. Maybe you'll stick to 1Gbps and load the server with ethernet interfaces and provide a decent (5-10TB) of local buffer, allowing you to backup multiple clients at once and keep data streaming to that tape drive. If you can't keep the data flowing to the drive, it has to stop, backup, then accelerate back up to write speed. Not only is this as slow as it sounds (5-10 times slower), but it increases wear on the drive substantially. Yes, drives wear out, especially the heads that are in direct contact with the tape. Too many times I've worked with unmaintained hardware that finally fails - leaving 30 or more unreadable tapes in their wake.

The server will need to have a bit of heft to run the Database for the Backup software you'll use, like Vertias, Legato, Backup Exec or Tivoli, and you'll want to run at least 4 drives in a RAID for that buffer. The last thing you can afford in backups and archives is to lose the data stored in the buffer, it may be your only copy.

So there we have $1,000 is disk drives, $4,000 in the server, $5,000 in the two drives and $2-5,000 in the backup software. If you're grafting the drives and backup software onto an existing file server, you'll need more expensive software licenses (that's how they charge) and

And you still get to manually load, unload, store and retrieve those tapes: a process that's been fraught for more than 50 years, which is why even modest sized sites have used either small "tape stackers" or robots. There's clean, fast and reliable. And work 24/7 for housekeeping.

Did I mention your Disaster Recovery site? You'll need more drives there too, an up-to-date copy of the backup/archive database and a licensed copy of that program. And regular testing of full restores.

What's the point of all this palaver if you don't check it works?

The "gotcha" in this isn't losing the data, but the cost of not having just one copy of your archived files, but many. That's not a few copies, but the same data stored 20-50 times. Even then, you might just lose all your data if, like Microsoft's "Pink"/Sidekick service, there's an Oops! with the backup catalogue, or the data that's needed only exists on a couple of tapes and there's problems finding or reading them. I've had or seen all those problems. If you store your backups on RAID volumes, you know exactly how well protected they are and accessing any file or folder, in real-time, is trivial.

Enterprises over the last 15 years have moved to Virtual Tape Libraries, VTL, causing the explosion in "Purpose Built Backup Appliances", PBBA's, into a $3 billion/year business growing at ~20% year.

There's a very simple problem caused by the proliferation of Virtual Machines: if you backup entire systems, you have a very large number of copies (~100) of exactly the same system files. And you absolutely need an automatic system to handle backing up all the VM images.

I'm sure they roll some data onto tape and export it "somewhere secure". But that's a last resort.

Unless you've got a mainframe and can afford a VTL/PBBA, your best bet at the moment is to run USB drives, or even sets of RAID protected drives. There small, light, cheap, robust and highly portable. They don't need expensive software to manage them and you're guaranteed to be able to read them on new equipment, immediately.

If you run OS/X, then you already have 'snapshots' to disk. The system only stores copies of new data, existing files are linked into the new backup trees. The same mechanism will be available for other systems ('rsnapshot' & 'rsync' on Linux/Unix systems is free).

This is still an unresolved argument for many people. The PBBA sales figures suggests that for an increasing number of firms, tapes are dead.

No comments: