tag:blogger.com,1999:blog-298751432017-04-28T09:12:15.220-07:00SteveJ's lab-notesMy "Laboratory Note Book" on a Miscellanea of Topics. <br> If I believe I.T. isn't a "professional discipline" and two of the missing elements are "Lab Note Books" and "Robust Critique" (as in the Academic sense of Robust Defence) - then I've got to do as I say...Steve Jenkinnoreply@blogger.comBlogger80125tag:blogger.com,1999:blog-29875143.post-35567680212733029922014-12-22T19:58:00.000-08:002016-11-02T19:35:39.502-07:00Disk / Storage TimelineFirst cut at timeline of significant events in Disk and Storage, ignoring <a href="http://www.allaboutcircuits.com/vol_4/chpt_15/4.html">"historical" devices</a> like floppies and bubble memory. <a href="http://edwgrochowski.com/publications.html">Edward Grochowski</a>'s 2012 "<a href="http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_S102A_Grochowski.pdf">Flash Memory Summit</a>" talk tracks multiple storage capacity, price & technology from 1990.<br /><br />First commercial computers were built in 1950 and 1951: <a href="http://www.leo-computers.org.uk/">LEO</a>[UK], <a href="http://en.wikipedia.org/wiki/Konrad_Zuse">Zuse</a>[DE] and <a href="http://www.computerhistory.org/revolution/early-computer-companies/5/100">UNIVAC</a>[US].<br />LEO claim the first working Application in 1951.<br /> [1949: <a href="http://en.wikipedia.org/wiki/BINAC" style="background-image: none; color: #0b0080; font-family: sans-serif; font-size: 13px; line-height: 20px; text-decoration: none;" title="BINAC">BINAC</a> built by the Eckert–Mauchly Computer Corporation for Northrup]<br /><br />Ignored technologies include:<br />Tapes: used in the first computers as large, cheap linear access storage.<br />Drums: in use a little later and continued for some time, often in specialist roles (paging).<br /><a name='more'></a><br /><table><tbody><tr><th></th><td><b>Disk / Storage Timeline</b></td> </tr><tr><td></td><td></td></tr><tr><td>1956</td><td>IBM 305/350 RAMAC, Disk. [Volumes, DASD, extents, Partition Datasets, CHS-addressing, count-key-data structure - variable-length blocks ]</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>1964</td><td>IBM OS/360 "Partition Datasets", count-key-data disks, IBM-DOS et al: Datasets and ISAM</td></tr><tr><td>1969</td><td>Unix V5?. PDP-7/11, hierarchical file system, inodes + "setuid" bit (patented).</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>1974</td><td>UNIX V6, CACM. 512by sectors, RK-05 [2.5MB + 36.5MB tape (2400', 1600bpi)</td></tr><tr><td></td><td>tools: ar, tar, dump, restore</td></tr><tr><td></td><td>PDP-11.hierarchical fsys, single-root + mount, pipes, setuid, dirs, inodes, hard links, {owner, groups, permissions}, times.</td></tr><tr><td></td><td>read(), write(), seek(), stat(), open(), close(), creat(), unlink(), link()</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>1974</td><td>HSM by IBM. Tape Cartridges. IBM 3850 Mass Storage Facility. 6250bpi?</td></tr><tr><td>1974</td><td>CP/M, Kidall, Digital Research. File System with "A:" volumes.</td></tr><tr><td></td></tr><tr><td>1976</td><td>PWB, Programmers Workbench. SCCS - Version Control, RJE (Remote Job Entry), shell, troff -mm, make, {find, xargs, cpio, expr, egrep}, yacc, lex.</td></tr><tr><td></td></tr><tr><td>1979</td><td>IBM “Fixed-block Architecture” 3370, 512by sectors</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>1980</td><td>FAT-12, vols 32MB/512by, 256MB/4K sectors</td></tr><tr><td>1981</td><td>ACID for DB’s. Jim Gray, “Transaction Concept: Virtues & Limitations”</td></tr><tr><td>1982</td><td>RCS (Revision Control System), BSD</td></tr><tr><td>1983</td><td>4.2BSD, symlinks [1978: DEC and RDOS]</td></tr><tr><td>1983</td><td>MBR boot, partitions</td></tr><tr><td>1983</td><td>Reuter & Haerder, coined 'ACID': "Principles of transaction-oriented database recovery”</td></tr><tr><td>1983</td><td>Tolerant Systems [later Veritas]. check-pointing Apps, Journaling FileSys +N-plex (RAID 1, N-vol)</td></tr><tr><td>1983</td><td>‘locate/updatedb’ utility for Unix. DB of all files on system.</td></tr><tr><td>1984</td><td>FAT-16, vols 2GB vol, 4GB with Large File Support</td></tr><tr><td>1984</td><td>NFS, Sun.</td></tr><tr><td>1984</td><td>compress (LZW)</td></tr><tr><td>1985</td><td>High Sierra (CD-ROM), ECMA</td></tr><tr><td>1986</td><td>CVS (Concurrent Version Sys) in shell</td></tr><tr><td>1986</td><td>Filesystem Switch, System VR3</td></tr><tr><td>1986</td><td>RFS - System VR3.</td></tr><tr><td>1986</td><td>IDE/ATA Logical Block Addrs, lba-22 [lba-28, 1996, all drives. max 128GiB/137GB] lba-48 current. 128PiB. FileSys/inode functionality</td></tr><tr><td>1988</td><td>ISO 9660 (CD-ROM)</td></tr><tr><td>1988</td><td>Tivoli Storage Mgr, TSM. HSM with GPFS.</td></tr><tr><td>1989</td><td>Lotus Notes. DB/file replication.</td></tr><tr><td>1989</td><td>RAID: Rethinking Single Large Expensive Drives. IBM 3390, last of this line: 1989-1993</td></tr><tr><td>1989</td><td>Veritas, VxVM, LVM? AT&T, SUN, AIX, HP-UX.</td></tr><tr><td>1989</td><td>PKzip. 1st ".zip” archival & compression format</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>1990</td><td>ACL papers, MITRE Corp.</td></tr><tr><td>1990</td><td>CVS (Concurrent Version Sys) in 'C'</td></tr><tr><td>1990</td><td>EMC Symmetrix 4200 integrated cached disk array, 24Gb. RAID-0, striped.</td></tr><tr><td>1991</td><td>Plan 9, Overlay Mounts, isolated user views, fossil (dedupe), cached storage</td></tr><tr><td>1991</td><td>Veritas File System, VxFS. JFS on HP-UX1992 - RBAC paper. Ferraiolo & Kuha, Mandatory & Discretionary Access Controls.</td></tr><tr><td>1992</td><td>WAFL, NetApp</td></tr><tr><td>1993</td><td>MOSIX distributed O/S. SSI?</td></tr><tr><td>1993</td><td>NetApp Snapshots, RAID-4 (single parity, WAFL) 1st filer shipped.. NAS: NFS, SMB</td></tr><tr><td>1993</td><td>Trusted Solaris (B1), ACL's</td></tr><tr><td>1993</td><td>ext2, 32TB</td></tr><tr><td>1994</td><td>IBM patent on RBAC</td></tr><tr><td>1994</td><td>MS-DOS 6.22 DoubleSpace [1991 1st vrs]</td></tr><tr><td>1994</td><td>RockRidge, Joleiet (CR-ROM)</td></tr><tr><td>1994</td><td>StorTech Iceberg: RAID + Thin Provisioning/Over-Allocation</td></tr><tr><td>1994</td><td>XFS, SGI, 8EB</td></tr><tr><td>1995</td><td>ATA/IDE interface on all new consumer disks</td></tr><tr><td>1995</td><td>Beowulf clusters</td></tr><tr><td>1996</td><td>FAT-32, 2TB / 16TB</td></tr><tr><td>1996</td><td>Garth Gibson paper, Object Stores [successor to RAID]</td></tr><tr><td>1996</td><td>NDMP: Network Data Management Protocol, NetApp, Legato. direct backup of Datastores.</td></tr><tr><td>1996</td><td>Solaris MC: Single System Image, SSI</td></tr><tr><td>1996</td><td>rsync - FOSS replication</td></tr><tr><td>1997</td><td>1Gbps FCAL. SAN’s begin.</td></tr><tr><td>1997</td><td>RBAC DB products</td></tr><tr><td>1998</td><td>HFS+, Apple</td></tr><tr><td>1998</td><td>Unixware Cluster: Single System Image</td></tr><tr><td>1999</td><td>1Gbps Ethernet. IEEE 802.3ab, twisted-pair</td></tr><tr><td>1999</td><td>ISO-9660 (CD-ROM)</td></tr><tr><td>1999</td><td>Intel EFI 0.9. GUID Partition Table: GPT</td></tr><tr><td>1999</td><td>Linuxcare Bootable Business Card: cloop - compressed ISO [Knoppix] avg 2.5:1 ratio, 64Kb pages</td></tr><tr><td>1999</td><td>VAX Multipathing [OpenVMS]</td></tr><tr><td>1999</td><td>ext3, 32TB</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>2000</td><td>Subversion (client-server version control)</td></tr><tr><td>2002</td><td>10Gbps Ethernet Fibre. 802.3ae</td></tr><tr><td>2002</td><td>ATA-6 spec: LBA-48 and CHS obsolete</td></tr><tr><td>2002</td><td>FAT-X </td></tr><tr><td>2003</td><td>Power over Ethernet. 802.3af</td></tr><tr><td>2003</td><td>VMware: vMotion, live image migration</td></tr><tr><td>2004</td><td>Data Domain: Purpose-built backup appliances, PBBA. 1.25TB. 50:1 compression, Data DeDupe</td></tr><tr><td>2004</td><td>Panasas ships 1st Object Store Product</td></tr><tr><td>2004</td><td>ZFS, 256ZB</td></tr><tr><td>2004</td><td>Lustre 1.2.0,March 2004, Linux kernel 2.6</td></tr><tr><td>2005</td><td>UEFI boot + GPT. OS/X 10.4.0 ‘Tiger’ on Intel, </td></tr><tr><td>2005</td><td>Full text search, consumer desktop. “Spotlight”, OS/X 10.4</td></tr><tr><td>2005</td><td>exFAT, 512TB</td></tr><tr><td>2005</td><td>Git - distributed version control</td></tr><tr><td>2005</td><td>Mercurial - distributed version control</td></tr><tr><td>2006</td><td>ext4, 1EB</td></tr><tr><td>2007</td><td>Brtfs, Oracle </td></tr><tr><td>2007</td><td>Fusion-io PCI Flash demonstrated</td></tr><tr><td>2008</td><td>Flash ‘TRIM’ command defined. ACS-2 command set.</td></tr><tr><td>2008</td><td>VMware: Storage vMotion, Local Datastores to/from SAN/shared datastores</td></tr><tr><td>2009</td><td>Advanced Format: 4096by sectors, hard disk</td></tr><tr><td>2009</td><td>SquashFS, 2.6.29 kernel. max 1Mb pages</td></tr><tr><td>2009</td><td>Spotlight full-text search, iOS 3.0</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr><tr><td>2010</td><td>40&100Gbps Ethernet. 802.3ba. Fibre mostly.</td></tr><tr><td>2011</td><td>PureStorage: Flash Array, 1st data centre all-flash device</td></tr><tr><td>2012</td><td>VMware: Storage vMotion, Local Datastores to/from local datastores</td></tr><tr><td>2014</td><td>Seagate Kinetic drive, ethernet [2x 1Gbps SGMI ethernet], 1Kb Keys, 1Mb Objects</td></tr><tr><td>2014/5</td><td>WD/HGST Open Ethernet Drive Architecture</td></tr><tr><td>2015</td><td>Seagate release SMR (Shingled) drives. Market seems underwhelmed.</td></tr><tr><td>2016</td><td>Samsung release 16TB SSD in 2.5", smaller, faster, less power-hungry than HDD's. x8 GB/Rack Unit</td></tr><!-- rule --><tr><td></td><td><hr /></td></tr></tbody></table>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-42759157760588216712014-07-04T23:44:00.003-07:002014-07-07T22:26:39.509-07:00OS/X Time Machine, performance comparison to command line tools.A performance comparison for Mac Owners:<br /><br /><b>Q:</b> Just how quick is Apple’s Time Machine?<br /><b>A: </b>Way faster than you can do with OS/X command line tools.<br /><br />The headline is that command line tools take 80 minutes to do what Time Machine does in 3-10 mins.<br /><a name='more'></a><br /><br />TimeMachine does this in 3mins (1st log example) or 10 mins (2nd log example),<br />vs 38mins for “cp -al” + 42mins for rsync. [80 mins]<br /><br />While there’s only 40Mb-50Mb in actual directory entries [2.1M * 21ch], which can notionally be written in seconds, each directory takes one block minimum on disk, plus an inode. I wasn’t able to quickly discover the size of an in ode. In V6 Unix, it was 512by, from memory.<br /><br />Finder’s “Get Info” on any small file shows the “Allocation Unit” - 4Kb on my HFS+ volumes.<br />or "/usr/bin/stat -f %k .” - the ‘optimum’ block size for a volume.<br /><br />That’s 1.4GB, minimum, to be written. [360,000 * 4Kb = 360 * 4MB = 1.4GB]<br />Wikipedia maintains that HFS+ allows hard-links of directories to support Time Machine. This was removed from Unix pre SVR4 to prevent “cycles” in filesystems (infinite recursion).<br /><br /><hr />Outputs from my Mac Mini, OS/X Mavericks.<br />Details:<br />Internal 300GB 2.5" drive, external USB 1TB 2.5" drive. Both 5400RPM<br />358,00 directories and 2.1M files<br /><hr /><code><br />$ uname -a </code><br /><code></code><br /><div style="font-family: Courier; font-size: 12px;">Darwin mini 13.2.0 Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64 x86_64</div><div style="font-family: Courier; font-size: 12px;"><br /></div><code>$ sudo cp -a / /Volume/SJ-1TB-2014/bkup/last</code><br /><code># full initial copy to USB drive<br /><br />real 574m59.327s<br />user 0m57.402s<br />sys 26m43.417s<br /></code><br /><br /><code>mini:SJ-1TB-2014 steve$ df -h .<br />Filesystem Size Used Avail Capacity iused ifree %iused Mounted on<br />/dev/disk2s1 932Gi 262Gi 669Gi 29% 68776493 175413457 28% /Volumes/SJ-1TB-2014<br /></code><br /><br /><code>mini:SJ-1TB-2014 steve$ df -h /<br />Filesystem Size Used Avail Capacity iused ifree %iused Mounted on<br />/dev/disk0s2 297Gi 250Gi 47Gi 85% 65515369 12417533 84% /<br /></code><br /><br /><code>mini:SJ-1TB-2014 steve$ time sudo gcp -al bkup/last/ bkup/new</code><br /><code> # Apple's "cp" does not support "-l" option. This used gnu's "cp" from Darwin Ports.<br /><br />real 38m53.462s<br />user 0m19.785s<br />sys 7m2.447s<br /></code><br /><br />Try 2: [user & sys time similar, no explanation of longer elapsed time]<br /><code>real 68m13.564s, user 0m22.919s, sys 7m35.730s<br /></code><br /><br /><code>mini:SJ-1TB-2014 steve$ df -h .<br />Filesystem Size Used Avail Capacity iused ifree %iused Mounted on<br />/dev/disk2s1 932Gi 264Gi 667Gi 29% 69220039 174969911 28% /Volumes/SJ-1TB-2014<br /><br />Fri 4 Jul 2014 14:09:28 EST<br />mini:SJ-1TB-2014 steve$ time sudo rsync -aHSx / bkup/new # trimmed errors<br />rsync warning: some files vanished before they could be transferred (code 24) at /SourceCache/rsync/rsync-42/rsync/main.c(992) [sender=2.6.9]<br /><br />real 41m0.578s<br />user 1m24.502s<br />sys 8m25.340s<br /></code><br /><br /><hr /><code><br />$ time sudo rm -rf bkup/new<br /><br />real 52m32.459s<br />user 0m10.641s<br />sys 5m31.523s<br /></code><br /><hr /><br />Dirs - 360,000<br /><code><br /># time find bkup/last -type d|wc -l # added commas<br />358,314<br /><br />real 12m39.225s<br />user 0m7.775s<br />sys 2m55.545s<br /></code><br /><br />Files - 2.1M<br /><code><br /># time find bkup/last -type f|wc -l # added commas<br />2,109,216<br /><br />real 13m31.380s<br />user 0m8.334s<br />sys 2m54.752s<br /></code><br /><hr />Time Machine logs, using command:<br /><code><br />$ grep com.apple.backupd /var/log/system.log [selected and trimmed]<br /></code><br /><i><br /></i> <i>2.5 minutes, no system activity, trimmed</i><br /><code><br />Jul 4 09:06:37 mini com.apple.backupd[33481]: Starting automatic backup<br />Jul 4 09:08:06 mini com.apple.backupd[33481]: Copied 245 items (9.3 MB) from volume Macintosh HD. Linked 3392.<br />Jul 4 09:08:36 mini com.apple.backupd[33481]: Copied 35 items (2 KB) from volume Macintosh HD. Linked 582.<br />Jul 4 09:08:52 mini com.apple.backupd[33481]: Backup completed successfully.<br /></code><br /><i><br /></i> <i>10.5 minutes, system active. full log.</i><br /><code><br />Jul 4 10:10:05 mini com.apple.backupd[34131]: Starting automatic backup<br />Jul 4 10:10:08 mini com.apple.backupd[34131]: Backing up to /dev/disk1s1: /Volumes/SJ-1TB-SG/Backups.backupdb<br />Jul 4 10:10:20 mini com.apple.backupd[34131]: Will copy (13.6 MB) from Macintosh HD<br />Jul 4 10:10:20 mini com.apple.backupd[34131]: Found 202 files (13.6 MB) needing backup<br />Jul 4 10:10:20 mini com.apple.backupd[34131]: 2.97 GB required (including padding), 321.67 GB available<br />Jul 4 10:16:38 mini com.apple.backupd[34131]: Copied 377 items (13.6 MB) from volume Macintosh HD. Linked 3548.<br />Jul 4 10:16:49 mini com.apple.backupd[34131]: Will copy (19.3 MB) from Macintosh HD<br />Jul 4 10:16:49 mini com.apple.backupd[34131]: Found 131 files (19.3 MB) needing backup<br />Jul 4 10:16:49 mini com.apple.backupd[34131]: 2.97 GB required (including padding), 321.66 GB available<br />Jul 4 10:19:16 mini com.apple.backupd[34131]: Copied 310 items (19.3 MB) from volume Macintosh HD. Linked 2862.<br />Jul 4 10:19:50 mini com.apple.backupd[34131]: Created new backup: 2014-07-04-101949<br />Jul 4 10:20:33 mini com.apple.backupd[34131]: Starting post-backup thinning<br />Jul 4 10:20:33 mini com.apple.backupd[34131]: No post-backup thinning needed: no expired backups exist<br />Jul 4 10:20:33 mini com.apple.backupd[34131]: Backup completed successfully.<br /></code><br /><i><br /></i> <i>7.5 minutes, system active, trimmed</i><br /><code><br />Jul 4 14:39:33 mini com.apple.backupd[37148]: Starting automatic backup<br />Jul 4 14:44:45 mini com.apple.backupd[37148]: Created new backup: 2014-07-04-144444<br />Jul 4 14:45:29 mini com.apple.backupd[37148]: Starting post-backup thinning<br />Jul 4 14:47:02 mini com.apple.backupd[37148]: Deleted /Volumes/SJ-1TB-SG/Backups.backupdb/mini too/2014-07-03-134254 (8.2 MB)<br />Jul 4 14:47:02 mini com.apple.backupd[37148]: Post-backup thinning complete: 1 expired backups removed<br />Jul 4 14:47:05 mini com.apple.backupd[37148]: Backup completed successfully.<br /></code>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-69378723168208342402014-06-18T23:58:00.000-07:002014-12-23T16:57:56.293-08:00RAID-1: Errors and Erasures calculations<b>RAID-1 Overheads</b> (treating RAID-1 and RAID-10 as identical)<br /><br />N = number of drives mirrored. N=2 for duplicated<br />G = number of drive-sets in a Volume Group.<br />\(N \times G\) is the total number of drives in Volume Group.<br />An array may be composed of many Volume Groups.<br /><br /><i>Per-Disk:</i><br /><ul><li>Effective Capacity</li><ul><li>N=2. \( 1 \div 2 = 50\% \) [duplcated]</li><li>N=3. \(1 \div 3 = 33.3\% \) [triplicated]</li></ul></ul><ul><li>I/O Overheads & scaling</li><ul><li>Capacity Scaling: linear to <i>max disks</i>.</li><li>Random <i>Read</i>: \(N \times G \rm\ of\ rawdisk = N \times G \rm\ singledrive = RAID-0\)</li><li>Randdom <i>Write</i>: \(1 \times G \rm\ of\ rawdisk = 100\% \rm\ singledrive\)</li><li>Streaming <i>Read</i>: \(N \times G \rm\ of\ rawdisk = N \times G \rm\ singledrive = RAID-0\)</li><li>Streaming <i>Write</i>: \(1 \times G \rm\ of\ rawdisk = 100\% \rm\ singledrive\)</li></ul></ul><i></i><br /><a name='more'></a><i>RAID Array Overheads</i>:<br /><ul><li><i>Read</i>: Nil. 100% of available bandwidth of \(N \times G\) drives, same as RAID-0</li><li><i>Write</i>: single drive performance per replicant (50%, for N=2)</li><ul><li>Total RAID bandwidth increases linearly with scale.</li></ul><li>CPU & RAM: Low: buffering, scheduling, block addrs calculation and error handling.</li><ul><li>Zero Parity calculation.</li></ul><li>Cache needed: zero or small cache needed</li><li>Impact of Caching: </li><ul><li>Random I/O:</li><ul><li>Nil for low locality/readback of blocks</li><li>High impact for high locality/readback of writes</li><li>High in coalescing spread Random I/O to streams</li></ul><li>Streaming write: 50% total available bandwidth (N=2)</li></ul></ul><i>Tolerance to failures and errors</i><br /><ul><li>For N=2</li><ul><li>Error recovery:</li><ul><li>concurrently, 'read duplicate': 1/2 revolution + seek time to block on alt. drive</li><li>reread drive, 1 revolution, no seek, for "soft error".</li><ul><li>Number of reseeks for "soft" vs "hard" error </li></ul><li>On "hard error", mark block 'bad' and map to a spare block,</li><ul><li>write block copy </li></ul></ul><li>failure of second drive in a matched pair = Data Loss Event</li><li>Up to \(N \div 2\) drive failures <i>possible</i> without Data Loss</li></ul></ul><br /><i>Read Error correction cost</i><br />TBC<br /><br /><b>RAID-1 Rebuild</b><br /><br /><i>Performance Impact of Disk Failure</i><br /><ul><li>N=2, nominal write IO/sec unaffected</li><ul><li>read IO/sec: for 1/N'th of RAID address space, 50% nominal throughput</li><ul><li>For G = 12, total read bandwidth reduces from \(N \times G = 2 \times 12 = 24\) single drive throughput to \((N \times G) - 1 = (2 \times 12) - 1 = 23\) times = 4% reduction in </li></ul></ul><li>N=3, nominal write IO/sec unaffected</li><ul><li>read IO/sec: for 1/N'th of RAID address space, 66.6% nominal throughput for a single drive-set.</li><ul><li>For G = 12, total read bandwidth reduces from \(3 \times\) 12 = 36 \) to 35, or 2.78%.</li></ul></ul></ul><i>Performance Impact of RAID Rebuild</i><br /><ul><li>N=2,</li><ul><li>streaming copy of primary drive to spare,</li><ul><li>blocks interrupted, on average, by every Nth access</li><li>time to rebuild affected by Array Utilisation</li><li>For G = 12, 8.3% reduction in read throughput, write throughput same.</li></ul><li>or for distributed spares, streaming reads spread across (N-1) drives, limited by streaming throughput of destination drive</li></ul><li>N=3,</li><ul><li>streaming copy of 2nd drive to spare</li><ul><li>rebuild time consistent and minimum possible</li><li>impact is 33% read performance for 1/N'th of RAID address space</li><ul><li>For G = 12, 5.5% reduction in read throughput, write throughput same.</li></ul></ul></ul></ul><br /><br /><b>RAID-1 Failures (Erasures)</b><br /><br />A 3% AFR (250,000hr MTBF) for 5,000 drives gives 150 failed drives per year, or 3 per week, approx. 1 every 50-hours.<br /><br />A 2TB drive (16Tbit, or 1.6x10<sup>13</sup> bits) at a sustained 1Gbps, will take a minimum 4.5 hours to scan.<br /><br />For the benchmark configuration, RAID-1 with two drives, we'll have a whole-drive Data Loss event if the source drive fails during the rebuild (via copy) of a failed drive to a spare.<br /><br />What is the probability of a second drive failure within that time?<br />\begin{equation}<br />\begin{split}<br />P_{fail2nd}& = 4.5 hrs \div 250,000hrs\\<br />& = 0.000018\\<br />& = 1.8\times10^{-5}<br />\end{split}<br />\end{equation}<br />Alternatively, how often will a drive rebuild in a single array fail due to a second drive failure?<br />\begin{equation}<br />\begin{split}<br />N_{fail2nd}& = \frac{1}{P_{fail2nd}}\\<br />& = 250,000hrs \div 4.5 hrs\\<br />& = \rm 1\ in\ 5,555\rm\ events<br />\end{split}<br />\end{equation}<br />At 150 events/year, this translates to:<br />\begin{equation}<br />\begin{split}<br />Y_{fail2nd}& = \frac{N_{fail2nd}}{150}\\<br />& =\rm 5,555\ events \div 150 events/year\\<br />& = \rm\ once\ in\ 370\ years<br />\end{split}<br />\end{equation}<br />If you're an individual owner, that risk is probably acceptable. If you're a vendor with 100,000 units in the field, is that an acceptable rate?<br />\begin{equation}<br />\begin{split}<br />YAgg_{fail2nd}& = \frac{Units}{Y_{fail2nd}}\\<br />& = 100,000 \div 370\\<br />& = \rm\ 270\ dual\mathpunct{-}failures\ per\ \it year<br />\end{split}<br />\end{equation}<br /><br />Array Vendors may not be happy with that level of customer data-loss. They can engineer their product to default to using triplicated RAID-1. The Storage Industry has a problem as no standard naming scheme exists to differentiate dual, triple or more mirrors.<br />\begin{equation}<br />\begin{split}<br />P_{fail3rd}& = P_{fail2nd} \times 4.5 hrs \div 250,000hrs\\<br />& = (0.000018)^2 =(1.8\times10^{-5})^2\\<br />& = 3.24\times10^{-10}<br />\end{split}<br />\end{equation}<br />\begin{equation}<br />\begin{split}<br />N_{fail3rd}& = \frac{1}{P_{fail3rd}}\\<br />& = 1 \div 3.24\times10^{-10}\\<br />& = \rm 1\ in\ 3,080,000,000\ (3.08\times 10^9)\ events<br />\end{split}<br />\end{equation}<br />\begin{equation}<br />\begin{split}<br />Y_{fail3rd}& = \frac{N_{fail3rd}}{150}\\<br />& = \rm 3.08\times 10^9\ events \div 150\ events/year\\<br />& = \rm\ once\ in\ 20,576,066\ (2.0576066\times 10^7)\ years<br />\end{split}<br />\end{equation}<br />\begin{equation}<br />\begin{split}<br />YAgg_{fail3rd}& = \frac{Units}{Y_{fail3rd}}\\<br />& = 100,000 \div 20,576,066\\<br />& = \rm\ 0.004860\ triple{-}failures\ per\ year\\<br />& = \rm\ one\ triple{-}failure\ per\ 205.76\ years<br />\end{split}<br />\end{equation}<br />Or, per 100,000 units of 5,000 drives, one triple-failure in 205 years with triplicated RAID-1.<br /><br /><br /><br /><br /><b>RAID-1 Errors</b><br /><br />TBCSteve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-66458345019071253012014-06-12T20:02:00.000-07:002014-12-23T16:59:22.311-08:00mathjax test & Demo<a href="http://www.mathjax.org/">MathJax</a> setup in Blogger:<br /><a href="http://mytechmemo.blogspot.com.au/2012/02/how-to-write-math-formulas-in-blogger.html">http://mytechmemo.blogspot.com.au/2012/02/how-to-write-math-formulas-in-blogger.html</a><br /><br /><a href="http://cdn.mathjax.org/mathjax/latest/test/examples.html">MathJax Examples</a><br /><br />Note:<br /><ol><li>I had to hunt for the "HTML/Javascript" gadget, down the list aways.</li><li>I ended up putting the gadget in as a <i>footer.</i></li><li>You'll have to add that gadget to all blogs you want it to work for.</li><li>Preview and Edit mode don't compute the TeX. You need to save the doc, then view the post.</li><li>In compose "Options", "Line Breaks", I'm using 'Press "Enter" for line breaks.</li><li>The "MyTechMemo" author doesn't use the exact code he suggests, though it works for me. His actual gadget is:</li></ol><blockquote class="tr_bq"><code> Powered by <a href="http://www.mathjax.org/docs/1.1/start.html">MathJax</a><br /><br /><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"><br /></script><br /></code></blockquote>Alternate Hub Config in gadget, replace just first line.<br /><pre>MathJax.Hub.Config({<br /> TeX: { equationNumbers: { autoNumber: "AMS" } },<br /> tex2jax: {<br /> inlineMath: [ ['$','$'], ["\\(","\\)"] ],<br /> displayMath: [ ['$$','$$'], ["\\[","\\]"] ],<br /> processEscapes: true }<br /> });<br /></pre><br />Using "all", numbers all equations.<br />"AMS" numbers only specified equations.<br /><blockquote class="tr_bq"><code><script type="text/x-mathjax-config"><br />MathJax.Hub.Config({<br />TeX: { equationNumbers: {autoNumber: "all"} }<br />});<br /></script><br /></code></blockquote><br /><a name='more'></a><br /><br /><hr />Equation Examples:<br /><a href="http://cdn.mathjax.org/mathjax/latest/test/sample-eqnum.html">http://cdn.mathjax.org/mathjax/latest/test/sample-eqnum.html</a><br /><br />Numbered Equn:<br />\begin{equation}<br />E = mc^2<br />\end{equation}<br /><br />Align:<br />\begin{align} <br />a_1& =b_1+c_1\\ <br />a_2& =b_2+c_2-d_2+e_2 <br />\end{align}<br /><br />Split:<br />\begin{equation}<br />\begin{split}<br />a& =b+c-d\\<br />& \quad +e-f\\<br />& =g+h\\<br />& =i<br />\end{split}<br />\end{equation}<br /><br /><a href="http://cdn.mathjax.org/mathjax/latest/test/sample.html">http://cdn.mathjax.org/mathjax/latest/test/sample.html</a><br /><b>The probability of getting \(k\) heads when flipping \(n\) coins is:</b><br /><br />\[P(E) = {n \choose k} p^k (1-p)^{ n-k} \]<br /><br /><hr />Raw Code:<br /><br /><pre>\begin{equation}<br />E = mc^2<br />\end{equation}<br /><br />\begin{align} <br />a_1& =b_1+c_1\\ <br />a_2& =b_2+c_2-d_2+e_2 <br />\end{align}<br /><br />\begin{equation}<br />\begin{split} <br />a& =b+c-d\\ <br />& \quad +e-f\\ <br />& =g+h\\ <br />& =i <br />\end{split} <br />\end{equation}<br /><br />\[P(E) = {n \choose k} p^k (1-p)^{ n-k} \] </pre><br /><hr /><br />Using AsciiMath (with TeX and MathML), <a href="http://www1.chapman.edu/~jipsen/mathml/asciimathsyntax.html">http://www1.chapman.edu/~jipsen/mathml/asciimathsyntax.html</a><br /><code><br /><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_HTMLorMML"><br /></script><br /></code><br /><br />Or, without Tex:<br /><code><br /><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=AM_HTMLorMML"><br /></script><br /></code><br /><br />`x = (-b +- sqrt(b^2-4ac))/(2a) .`<br /><br />Raw Code (back ticks as start/end tokens):<br /><pre>`x = (-b +- sqrt(b^2-4ac))/(2a) .`<br /></pre><br />Alternate start/end tokens, part of main package, doesn't work in Mathjax:<br /><br />amath x^2 or a_(m n) or a_{m n} or (x+1)/y or sqrtx endamath<br />`x^2 or a_(m n) or a_{m n} or (x+1)/y or sqrtx`<br /><br />Raw Code:<br /><pre>amath x^2 or a_(m n) or a_{m n} or (x+1)/y or sqrtx endamath<br /></pre><br />TeX graph example from AsciiMath, doesn't work in MathJax.<br />`\begin{graph} width=300; height=200; xmin=-6.3; xmax=6.3; xscl=1; plot(cos(log(x))); plot(2.2sin(x)) \end{graph}`<br /><br /><pre>\begin{graph} width=300; height=200; xmin=-6.3; xmax=6.3; xscl=1; plot(cos(log(x))); plot(2.2sin(x)) \end{graph}<br /></pre><br /><hr />Example 1:<br /><a href="http://mytechmemo.blogspot.com.au/2012/12/determine-whether-ellipse-intersect.html">http://mytechmemo.blogspot.com.au/2012/12/determine-whether-ellipse-intersect.html</a><br /><br />Assume an ellipse of width \(\sigma\) and length \(\kappa \sigma\) is centered at \((x_0, y_0)\), and has angle \(\theta_0\) with the \(x\)-axis. How do we determine whether it intersects a horizontal line or a vertical line? <br /><br />It turns out the criteria is very simple:<br /><ul><li> The ellipse intersects a horizontal line \(y = y_1\) if and only if the following equation holds: \[ \triangle_1 = \sigma^2 \left(\cos^2(\theta_0) + \kappa^2 \sin^2(\theta_0)\right) - (y_1-y_0)^2 \geq 0 \] </li><li> The ellipse intersects a vertical line \(x = x_1\) if and only if the following equation holds:<br />\[ \triangle_2 = \sigma^2 \left(\sin^2(\theta_0) + \kappa^2 \cos^2(\theta_0)\right) - (x_1-x_0)^2 \geq 0 \]</li></ul><br /><hr />Raw code:<br /><br /><pre>Assume an ellipse of width \(\sigma\) and length \(\kappa \sigma\) is centered at \((x_0, y_0)\), and has angle \(\theta_0\) with the \(x\)-axis. How do we determine whether it intersects a horizontal line or a vertical line?<br /><br />It turns out the criteria is very simple:<br /><ul><br /><li>The ellipse intersects a horizontal line \(y = y_1\) if and only if the following equation holds: \[ \triangle_1 = \sigma^2 \left(\cos^2(\theta_0) + \kappa^2 \sin^2(\theta_0)\right) - (y_1-y_0)^2 \geq 0 \]</li><br /><li>The ellipse intersects a vertical line \(x = x_1\) if and only if the following equation holds: \[ \triangle_2 = \sigma^2 \left(\sin^2(\theta_0) + \kappa^2 \cos^2(\theta_0)\right) - (x_1-x_0)^2 \geq 0 \]</li><br /></ul><br /></pre><br /><hr /><br />Example 2:<br /><a href="http://mytechmemo.blogspot.com.au/2012/04/color-balls.html">http://mytechmemo.blogspot.com.au/2012/04/color-balls.html</a><br /><br />Thus if we let \(\mu_i\) be the expected number of steps getting to one color when the bag has i distinct colors, then we have <br />\[<br />\mu_i = 1 + \frac{i(i-1)}{n(n-1)} \mu_{i-1}+\left(1-\frac{i(i-1)}{n(n-1)}\right) \mu_i.<br />\]<br />From this we can use induction to prove that<br />\[<br />\mu_i = \frac{i-1}{i} n(n-1).<br />\]<br />Therefore we have<br />\[<br />\mu_n = (n-1)^2.<br />\]<br /><br /><hr />Raw code:<br /><br /><pre>Thus if we let \(\mu_i\) be the expected number of steps getting to one color when the bag has i distinct colors, then we have <br />\[<br />\mu_i = 1 + \frac{i(i-1)}{n(n-1)} \mu_{i-1}+\left(1-\frac{i(i-1)}{n(n-1)}\right) \mu_i.<br />\]<br />From this we can use induction to prove that<br />\[<br />\mu_i = \frac{i-1}{i} n(n-1).<br />\]<br />Therefore we have<br />\[<br />\mu_n = (n-1)^2.<br />\]<br /></pre><br /><hr />Example 3:<br /><a href="http://mytechmemo.blogspot.com.au/2011/02/nodes-of-height-h-in-n-element-heap.html">http://mytechmemo.blogspot.com.au/2011/02/nodes-of-height-h-in-n-element-heap.html</a><br /><br />This is exercise 6.3-3 of CLRS version 2.<br /><br /><b>Question:</b>Show that there are at most \(\lceil n/2^{h+1} \rceil\) nodes of height \(h\) in any \(n\)-element heap.<br /><br /><b>Solution:</b> First some facts<br /><ol><li>According to exercise 6.1-7, an \(n\)-element heap has exactly \(\lceil n/2 \rceil\) leaves.</li><li>Notice that the nodes with height \(i\) become leaves after deleting all the nodes with height \(0,\cdots, i-1\).</li></ol><br />Let \(y_i\) be the total number of elements of the new tree after deleting all the nodes with height \(0,\cdots, i-1\), and let \(x_i\) be the number of leaves of the new tree. Then we have \(x_i = \lceil y_i/2 \rceil\) by fact #1 and \(y_{i+1}=y_i - x_i\) by fact #2. Thus we have \(y_{i+1} \leq y_i/2\), and this leads to<br /><br />$$y_i \leq n/2^i, \quad \forall n=0,1,\cdots.$$<br /><br />The final conclusion follows from the relation \(x_i = \lceil y_i/2 \rceil\).<br /><br /><hr />Raw code:<br /><br /><pre>This is exercise 6.3-3 of CLRS version 2.<br /><br /><b>Question:</b>Show that there are at most \(\lceil n/2^{h+1} \rceil\) nodes of height \(h\) in any \(n\)-element heap.<br /><br /><b>Solution:</b> First some facts<br /><ol><br /><li>According to exercise 6.1-7, an \(n\)-element heap has exactly \(\lceil n/2 \rceil\) leaves.</li><br /><li>Notice that the nodes with height \(i\) become leaves after deleting all the nodes with height \(0,\cdots, i-1\).</li><br /></ol><br />Let \(y_i\) be the total number of elements of the new tree after deleting all the nodes with height \(0,\cdots, i-1\), and let \(x_i\) be the number of leaves of the new tree. Then we have \(x_i = \lceil y_i/2 \rceil\) by fact #1 and \(y_{i+1}=y_i - x_i\) by fact #2. Thus we have \(y_{i+1} \leq y_i/2\), and this leads to<br /><br />$$y_i \leq n/2^i, \quad \forall n=0,1,\cdots.$$<br /><br />The final conclusion follows from the relation \(x_i = \lceil y_i/2 \rceil\).<br /></pre><br /><hr />Example 4:<br /><a href="http://mytechmemo.blogspot.com.au/2010/12/gift-distribution-problem.html">http://mytechmemo.blogspot.com.au/2010/12/gift-distribution-problem.html</a><br /><br />There are 20 person in a Christmas party. What is the probability that at least one of them get his own gift?<br /><br />Solution: <br />Let \(A_i\)=person i get his own gift<br />Then we want \(P(A_1 \cup A_2 \cup \cdots \cup A_n)\), <br />which is equal to <br />$$<br />\sum P(A_i)-\sum P(A_i \cap A_j)<br />+ \sum P(A_i \cap A_j \cap A_k) - \cdots<br />+ (-1)^n \sum P(A_1 \cap A_2 \cap \cdots \cap A_n)<br />$$<br /><br />The general term is<br />$$<br />C(n, k)\cdot \frac{(n-k)!}{n!} = \frac{1}{k!}<br />$$<br /><br />Therefore the answer is<br />$$<br />1-\frac{1}{2!}+\frac{1}{3!}-\cdots+\frac{1}{20!}<br />$$ <br /><br /><hr />Raw code:<br /><br /><pre>There are 20 person in a Christmas party. What is the probability that at least one of them get his own gift?<br /><br />Solution: <br />Let \(A_i\)=person i get his own gift<br />Then we want \(P(A_1 \cup A_2 \cup \cdots \cup A_n)\), <br />which is equal to <br />$$<br />\sum P(A_i)-\sum P(A_i \cap A_j)<br />+ \sum P(A_i \cap A_j \cap A_k) - \cdots<br />+ (-1)^n \sum P(A_1 \cap A_2 \cap \cdots \cap A_n)<br />$$<br /><br />The general term is<br />$$<br />C(n, k)\cdot \frac{(n-k)!}{n!} = \frac{1}{k!}<br />$$<br /><br />Therefore the answer is<br />$$<br />1-\frac{1}{2!}+\frac{1}{3!}-\cdots+\frac{1}{20!}<br />$$<br /></pre><br />...Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-62337043833412296302014-06-09T19:52:00.000-07:002014-06-09T21:26:56.241-07:00RAID++: Erasures aren't Errors<div class="tr_bq"><a href="http://stevej-lab-notes.blogspot.com/2014/04/raid-raid-0ecc.html">A previous piece in this series</a> starts as quoted below the fold, raising the question: The Berkeley group in 1987 were very smart, and Leventhal in 2009 no less smart,<i> so how did they both make the same <a href="http://en.wikipedia.org/wiki/Fundamental_attribution_error">fundamental attribution error</a></i>? This isn't just a <a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Various_proposals_for_further_extension">statistical "Type I" or "Type II" error</a>, it's conflating and confusing completely differences sources of data loss.<br /><a name='more'></a></div><blockquote><a href="http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf">Current RAID schemes</a>, and going back to the 1987/8 <a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf">Patterson, Gibson, Katz RAID paper</a>, make no distinction between transient and permanent failures: errors or dropouts versus failure. </blockquote><blockquote>This is a faulty assumptions in the <a href="http://queue.acm.org/detail.cfm?id=1670144">2009 "Triple Parity RAID" </a> piece by Leventhal, an author of ZFS, a proven production quality sub-system, and not just a commentator like myself. </blockquote><blockquote>The second major error is relying on the <a href="http://www.scientificamerican.com/article/kryders-law/">2005 "Kryder's Law"</a> 40%/year growth in capacity. Capacity growth had collapsed by 2009 and will probably not even meet 10%/year from now on, because <a href="http://stevej-on-it.blogspot.com/2014/03/the-new-disruption-in-computing.html">the market can't fund the new fabrication plants needed</a>. </blockquote><blockquote>The last major error by Leventhal was that 3.5 inch disks would remain the preferred Enterprise Storage. <a href="http://stevej-on-it.blogspot.com/2014/03/the-new-disruption-in-computing.html">2.5 inch drives are the only logical long-term choice of form-factor</a>.</blockquote>Storage has evolved considerably since 1987:<br /><ul><li>IBM sold drives for $25,000/GB ($40,000 now), allowing EMC to sell their Symmetrix 4200 around $10,000/GB, using 5.25" Seagate drives that sold for ~$5,000/GB.</li><ul><li>EMC stuffed 24GB into a single rack, along with 256MB of RAM and some CPU's</li><li>The 1% cache was important in achieving the high IO/sec</li><li>The real competitive advantage was size and power (and cooling). One fifth the floorspace and one fifth the cooling.</li></ul><li>disks spun at 3,600 to 5,400 RPM</li><ul><li>fast 5.25" disks achieved 12-15msec avg seek times,</li><li>commodity 3.5" drives achieved 25 - 30 msec</li><li>Raw IO/sec (IOPS) varied from 30-40 for slow drives to 60-80 for fast drives</li><li>For database applications, typically 4KB blocks, the read time, ~4msec, becomes significant.</li><ul><li>fast drives take 16.5msec to 20 msec to complete a read: 50-60 IO/sec.</li></ul></ul><li>heads read data off drives at around 10Mbps</li><ul><li>sectors were still 512by (half a Kilobyte) vs the 4KB used now.</li></ul><li>SCSI was new and ran at 5Mhz, 8-bits wide (40Mbps total, shared up and down).</li><li>The largest 3.5" disks were 100MB (Connor?) and 5.25" drives were 600Mb - 1,000MB (1GB)</li><li>Scan times of whole drives, allowing for seeking between tracks, was near the raw read rate:</li><ul><li>2-3 minutes for a 100MB 3.5" drive</li><li>15-20 minutes for a 1GB drive</li><li>For 1-1.25Gbps current drives of 1,000GB to 6,000GB, scan times are 6-30 hours.</li></ul><li>MTBF was 40,000 hrs for server-grade 5.25" drives and ~25,000hrs for commodity 3.5" drives, vs the 120,000 hrs for the 3380.</li><li>I'm guessing that design life, for server rooms, was the same 4-5 years. Compatible with the 40,000 MTBF of 5.25" drives (~5 years).</li><ul><li>Commodity 3.5" drives used in PC's with a low duty cycle and 25% power-on time,</li></ul></ul><ul><li>The Bit Error Rate, BER, of drives was 10<sup>-12</sup> to 10<sup>-14</sup>, now 10<sup>-14</sup> to 10<sup>-16</sup>.</li></ul><div>The Berkeley group were very clear in their view: more, smaller drives were the future of RAID.</div><div>EMC, then StorTech, didn't deliver this vision, they stayed with the largest form factors available, and continued to do so. This Theory/Practice gap will be explored elsewhere.</div><div><br /></div><div>For random IO, e.g. 4KB reads, fast drives could only achieve 4KB*40-50 IO/sec or 160-200KB, under ¼ MB/sec.</div><div>Over a year (32M seconds), 5TB-6.26TB, 5-6x10<sup>12</sup> bytes, was theoretically possible, but not seen in practice.<br /><br />This upper limit assumes both 24/7 power-on and <i>100% duty-cycle,</i> far in excess of what was demanded from drives. PC's and LAN's were yet to happen. ARPAnet existed, but links very typically around 9,600bps, with "fast" being 1Mbps. Mainframes and mini-computers were connected to networks of terminals and printers and only staffed and run when the business required, modulo the after-hours batch processing and at month-end, additional shifts.<br /><br />Airline reservation systems were the only sites commonly running 24/7, and even then demand was linked to operational hours. Bookings only happened when offices were open. Systems like SABRE had been running long enough, starting with drums, to routinely triplicate drives, now called RAID-1 or mirroring, with three drives. This allowed them to "split" a drive from the live system for maintenance, backups or testing, while keeping dual drives running in production.<br /><br />Another factor was the hang-over from batch processing and adapted COBOL programs. Even serial processing, the classic "Master File Update", didn't push disk drives hard. A reasonable guide for 100MB 3330's was to allow 3-4 hours for a <i>serial</i> scan of their contents. Even with 1KB blocks (100,000 per drive), they could average 10 IO/sec, vs the notional 40-50 IO/sec.<br /><br />An average site would power-on drives 30%-40% (2500-3200hrs/year), with duty cycles of 15%-25%. A reasonable derating from the naive rating, is 5%-10% actual utilisation.<br /><br />For mainframes, the theoretical limit of 6x10<sup>12</sup> bytes of random IO/year of drives, was under one-tenth that in practice: 5x10<sup>11</sup> bytes/year.</div><div><br /></div><div>In normal operations, drives were likely to suffer less than 1 error per year, caught by the internal CRC mechanisms.<br /><br />The IBM 3380 drives, with 4 drives per cabinet and max 4 cabinets per string, for max 16 drives per controller, had lower BER's (10<sup>-13</sup> to 10<sup>-14</sup>) and ~120,000 hrs MTBF. At 3,000 hrs/year operation, 16 drives would only see a drive failure every three years, much longer than the average stay of operators and junior Sys Progs. Most people didn't see a drive failure, though they did occur.<br /><br />In the days of removable disk packs (3330's), it was common to sequentially copy whole drives as backups. With the advent of sealed drives (3380/90) starting at ~$200,000/cabinet, this option was no longer<br /><br />A string of 16 drives might read 8TB, 8x10<sup>12</sup> bytes, per year. Read errors might be seen once in 10 years on those drives. As well, IBM scheduled regular "preventative maintenance". Their engineers probably tracked problems and replaced susceptible components, like bearings, well before they exceeded specifications and caused problems.<br /><br />These three forces, low power-on hours, low duty-cycle and active maintenance, drove down the rate of data errors experienced in mainframes.<br /><br />In the mid-1970's, Unix acquired the reputation as "being very hard on disks". This was because the O/S multi-tasked, ran a system-wide filesystem cache and was able to adopt simple access optimisation schemes like "The Elevator Scheduler". During their busy periods, drives were pushed to 100% duty-cycle for extended periods. For Universities with students, this would ramp up to near saturated performance 24/7 for the weeks leading up to end of term. System owners would exchange their experiences in Unix newsletters and conferences, leading to increases in sales for stronger drives.<br /><br />The commodity drive vendors were forced to address weaknesses in their designs to handle extended periods of 100% duty-cycle and 24/7 operations. Those that didn't lost sales and eventually failed.<br /><br />When LANs and workstations, and later PC's and Fileservers, came on the scene, it became standard practice to run servers, and their disks, 24/7. As RAID became more common in Fileservers and workstations, fuelled by both cheap hardware RAID cards and O/S tools, like Logical Volume Managers (LVM), able to "slice and dice" disks for sharing, drive counts increased while total installed capacity experienced explosive growth for nearly 2 decades.<br /><br />These forces have driven commodity drive vendors over decades to improve the MTBF of drives and improve error detection/correction as well.<br /><br /><b>Summary</b>:<br /><br /></div><div>In 1987, the mainframe market for Single Large Expensive Drive (SLED) addressed in the Berkeley group RAID paper.<br /><br />They were well aware of the economics, physics and operational aspects of disk drives and their analysis was correct:<br /><br /><ul><li>commodity drives at $1-2,500 each were 25%/GB the price IBM charged,</li></ul><ul><li>lashing together a number of commodity drives and adding extra-drive parity provided improved performance and increased MTBF beyond what was needed. Read errors, responsible for under 5%-10% of Data Loss events, were notionally handled by the same extra-drive parity.</li></ul><ul><li>Scaling drives down consumes far less power (fifth power on diameter) in aerodynamic drag, allowing both faster rotation rates and lower seek times due to the much smaller scale and mass of components. The 1987/8 RAID paper pitched 3.5" drives with later papers suggesting the Industry would further scale-down to single platter 1" drives.</li></ul><br />The reality of commodity drives, Storage System vendors and Computer system vendors was that fewer, larger drives were preferred. We are now in a period of transition to 2.5" drives for "performance" 10K & 15K RPM drives, but capacity drives still being 26.1mm thick, 3.5" drives, albeit variable speed ~5,900RPM.<br /><br />The "balance" of drive characteristics has shifted substantially, with MTBF's raising from 25-fold (40k to 1M hrs), capacity by 5,000 times and BER's only 10-100-fold.<br /><br />Close packed storage systems now achieve 430-450 drives per rack, vs 120-150 of the 1990's.<br /><br />The combination of radically increased capacity, parity drives and extended scan times has promoted the minor problem of BER Data Loss, to a "drop-dead" for current RAID systems: a single read error during a RAID Volume rebuild due to a drive failure is death to single-parity protection. The likelihood of these events has compounded as drive scan-times continue to increase and the volume of data per Volume Group has increased, while BER's are relatively static.<br /><br />All major vendors adopted dual-parity Data Protection by around 2005 to provide adequate reliability for their clientele. The 1-day to 1-week long RAID rebuild times, during which the Storage Array throughput is halved, seems to be accepted by clients.<br /><br />2.5" drives which can potentially pack 5,000 drives per rack create a whole new class of problems.<br /><br /></div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-74633681817533320352014-06-08T02:42:00.000-07:002014-06-08T15:48:55.435-07:00RAID, Archives and Tape v DiskThere's a long raging question in I.T. Operations:<i> How best to achieve data?</i> [What media to use?]<br />This question arose again for me as I was browsing retail site.<br /><br /><i>Conclusions</i>:<br /><br /><ol><li>The break-even for 2.5TB/6.25TB tapes is 85 and 140 tapes (compressed/uncompressed), or</li><ul><li>$13,150 and $17,400 capital investment.</li></ul><li>At just 2 times data duplication, uncompressed tapes are <i>not</i> cost effective.</li><ul><li>Enterprise backup show data duplication rates of 20-50 times.</li></ul><li>Compressed tapes are cost-effective up to 5-times data duplication.</li><ul><li>If you run 10 Virtual Machines and do full backups, you've passed that threshold.</li></ul></ol><br /><a name='more'></a><br /><a href="http://www.mwave.com.au/product/hp-lto5-ultrium-3280-sas-internal-tape-drive-eh899b-ab53718">An LTO-5 internal SAS drive (HP)</a>: $2550, <a href="http://www.mwave.com.au/product/hp-lto5-ultrium-3tb-rw-data-cartridge-c7975a-ab52399">3TB tapes</a> $45, <a href="http://www.mwave.com.au/product/hp-lto6-ultrium-625tb-mp-rw-data-cartridge-c7976a-ab52403">6.25TB tapes</a>, $90.<br /><br />The 6250GB is based on 2.5:1 compression, <a href="http://www.lto.org/About/faq.html">according to LTO.ORG</a>, or 2500GB uncompressed. This would also affect the claimed 400MB/sec write speed.<br /><br />LTO tapes now support <a href="http://en.wikipedia.org/wiki/LTFS">LTFS, the Linear Tape File System</a>, available across multiple platforms.<br /><br />So, is this really $15/TB, or $37.50/TB for tape, vs $80/TB for 1TB USB drives?<br />Is the gap $65/TB or $40/TB?<br /><br />At a minimum, you need <i>two</i> drives for archival use because you need to read/rewrite the data on tapes regularly, to both know you still can (many parts of the chain can fail) and tightly wound tape on spools has a habit of "print-through", of the bits getting corrupted.<br /><br />You also need a SAS Host-Bus-Adaptor (HBA), to connect the two drives. One <a href="http://www.mwave.com.au/product/hp-h222-host-bus-adapter-650926b21-ab53720">from HP for ~$300</a> and<a href="http://www.mwave.com.au/product/highpoint-rocketraid-2710-sassata-pcie20-rr2710-aa20971"> another for $215</a>.<br /><br />If you're supporting the compressed writes at 400MB/sec, you'll need 10Gbps ethernet interfaces. At least two for this important server, and that means a 10Gbps switch and 10Gbps HBA's in each client machine. Maybe you'll stick to 1Gbps and load the server with ethernet interfaces and provide a decent (5-10TB) of local buffer, allowing you to backup multiple clients at once and keep data streaming to that tape drive. If you can't keep the data flowing to the drive, it has to stop, backup, then accelerate back up to write speed. Not only is this as slow as it sounds (5-10 times slower), but it increases wear on the drive substantially. Yes, drives wear out, especially the heads that are in direct contact with the tape. Too many times I've worked with unmaintained hardware that finally fails - leaving 30 or more unreadable tapes in their wake.<br /><br />The server will need to have a bit of heft to run the Database for the Backup software you'll use, like Vertias, Legato, Backup Exec or Tivoli, and you'll want to run at least 4 drives in a RAID for that buffer. The last thing you can afford in backups and archives is to lose the data stored in the buffer, it may be your only copy.<br /><br />So there we have $1,000 is disk drives, $4,000 in the server, $5,000 in the two drives and $2-5,000 in the backup software. If you're grafting the drives and backup software onto an existing file server, you'll need more expensive software licenses (that's how they charge) and<br /><br />And you still get to manually load, unload, store and retrieve those tapes: a process that's been fraught for more than 50 years, which is why even modest sized sites have used either small "tape stackers" or robots. There's clean, fast and reliable. And work 24/7 for housekeeping.<br /><br />Did I mention your Disaster Recovery site? You'll need more drives there too, an up-to-date copy of the backup/archive database and a licensed copy of that program. And regular testing of full restores.<br /><br />What's the point of all this palaver if you don't <i>check</i> it works?<br /><br />The "gotcha" in this isn't losing the data, but the cost of not having just one copy of your archived files, but many. That's not <i>a few copies,</i> but the same data stored 20-50 times. Even then, <i>you might just lose all your data</i> if,<a href="http://en.wikipedia.org/wiki/2009_Sidekick_data_loss"> like Microsoft's "Pink"/Sidekick service,</a> there's an <i>Oops!</i> with the backup catalogue, or the data that's needed only exists on a couple of tapes and there's problems finding or reading them. I've had or seen all those problems. If you store your backups on RAID volumes, you know exactly how well protected they are and accessing any file or folder, in real-time, is trivial.<br /><br />Enterprises over the last 15 years have moved to Virtual Tape Libraries, VTL, causing the explosion in "<a href="https://www.symantec.com/content/en/us/about/media/industryanalysts/idc-evolution-and-value-purpose-built-backup-appliances_WP_239730.en-us.pdf">Purpose Built Backup Appliances"</a>, PBBA's, into a $3 billion/year business growing at ~20% year.<br /><br />There's a very simple problem caused by the proliferation of Virtual Machines: if you backup entire systems, you have a very large number of copies (~100) of exactly the same system files. And you absolutely need an automatic system to handle backing up all the VM images.<br /><br />I'm sure they roll some data onto tape and export it "somewhere secure". But that's a last resort.<br /><br />Unless you've got a mainframe and can afford a VTL/PBBA, your best bet at the moment is to run USB drives, or even sets of RAID protected drives. There small, light, cheap, robust and highly portable. They don't need expensive software to manage them and you're guaranteed to be able to read them on new equipment, immediately.<br /><br />If you run OS/X, then you already have 'snapshots' to disk. The system only stores copies of new data, existing files are linked into the new backup trees. The same mechanism will be available for other systems ('rsnapshot' & 'rsync' on Linux/Unix systems is free).<br /><br />This is still an unresolved argument for many people. The PBBA sales figures suggests that for an increasing number of firms, tapes are dead.Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-20385401258345546322014-06-05T17:30:00.001-07:002014-06-05T23:44:09.929-07:00Retail Disk prices, Enterprise drives, grouped by manufacturer & typeTable of current retail prices for various types of disk with cost-per-GB.<br />Only Internal drives, Hard Disks.<br /><br />Disclaimer: <i>This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.</i><br /><i><br /></i> Most drives are from a single manufacturer, Western Digital, to allow like-for-like comparisons.<br />Most manufacturers are close to the same pricing for the same specs.<br /><ul><li>There is ~$25 extra for SAS interface over SATA [1TB WD 'RE', SAS vs SATA]</li><li>There's ~$30/TB extra for higher spec drives [2TB & 3TB, WD SATA, NAS vs RE]</li><li>WD sell four 3.5" 1TB drives [03, 04, 26, 41]</li><ul><li>SAS vs SATA, ~$25</li><li>about double for 10,000RPM over 7,200RPM (Velociraptor vs RE)</li><li>about 25% less for the Intellipower, 'Capacity' drive</li></ul><li>While it's cheaper with Seagate to go from 15,000RPM/3.5" to 10,000RPM/2.5", there's no simple relation for the discount.</li></ul>Western Digital list these "Purchase Decision Criteria" for drives:<br /><ul><li>Capacity [GB]</li><li>Workload Capability [duty cycle or TB read/write per year]</li><li>Reliability [MTBF and BER]</li><li>Cost/GB</li><li>Performance [sustained throughput, latency or IO/sec = {RPM, seek time}]</li><li>Power used [not included by WD]</li><li>Racking density [not included by WD]</li></ul><br /><a name='more'></a><br /><br /><table frame="box"><colgroup><col span="8"></col><col span="1" style="background-color: lightgrey;"></col><col span="1"></col></colgroup> <thead><tr><th></th><th></th><th></th><th></th><th align="centre">05-June-2014</th></tr><tr><th></th><th></th><th></th><th colspan="3"><a href="http://www.mwave.com.au/">http://www.mwave.com.au</a></th></tr><tr><th colspan="10"><hr /></th></tr><tr><th>N</th><th>Sz</th><th>Conn</th><th>RPM</th><th>Brand/Model</th><th>Line</th><th>Cap</th><th>Cost</th><th>$/GB</th><th>GB</th></tr><tr><th colspan="10"><hr /></th></tr></thead><tbody><tr><td>01</td><td colspan="9">## WD, RE series, Durability</td></tr><tr><td>02</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd5003abyz-wd-500gb-re-35-7200rpm-sata3-hard-drive-ab54522">Western Digital WD5003ABYZ</a></td><td>RE</td><td>500GB</td><td>$88.99</td><td>0.1780</td><td>500GB</td><td></td></tr><tr><td>03</td><td>3.5"</td><td>SAS</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd1001fyyg-wd-1tb-re-35-7200rpm-sas-hard-drive-ab52518">Western Digital WD1001FYYG</a></td><td>RE</td><td>1TB</td><td>$148.99</td><td>0.1490</td><td>1000GB</td><td></td></tr><tr><td>04</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd1003fbyz-wd-1tb-re-35-7200rpm-sata3-hard-drive-ab52519">Western Digital WD1003FBYZ</a></td><td>RE</td><td>1TB</td><td>$120.99</td><td>0.1210</td><td>1000GB</td><td></td></tr><tr><td>05</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd2000fyyz-wd-2tb-re-35-7200rpm-sata3-hard-drive-ab49918">Western Digital WD2000FYYZ</a></td><td>RE</td><td>2TB</td><td>$218.99</td><td>0.1095</td><td>2000GB</td><td></td></tr><tr><td>06</td><td>3.5"</td><td>SAS</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd3001fyyg-wd-3tb-re-35-7200rpm-sas-hard-drive-ab47644">Western Digital WD3001FYYG</a></td><td>RE</td><td>3TB</td><td>$361.99</td><td>0.1207</td><td>3000GB</td><td></td></tr><tr><td>07</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd3000fyyz-wd-3tb-re-35-7200rpm-sata3-hard-drive-ab47643">Western Digital WD3000FYYZ</a></td><td>RE</td><td>3TB</td><td>$345.99</td><td>0.1153</td><td>3000GB</td><td></td></tr><tr><td>08</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd4000fyyz-wd-4tb-re-35-7200rpm-sata3-hard-drive-ab47636">Western Digital WD4000FYYZ</a></td><td>RE</td><td>4TB</td><td>$487.99</td><td>0.1220</td><td>4000GB</td><td></td></tr><tr><th colspan="10"><hr /></th></tr><tr><td>10</td><td colspan="9">## 15k 3.5", Seagate</td></tr><tr><td>11</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st3300657ss-300gb-cheetah-15k7-35-15000rpm-sas-hard-drive-aa19841">Seagate ST3300657SS</a></td><td>Cheetah 15k.7</td><td>300GB</td><td>$280.21</td><td>0.9340</td><td>300GB</td><td></td></tr><tr><td>12</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st3450857ss-450gb-cheetah-15k7-35-15000rpm-sas-hard-drive-aa19846">Seagate ST3450857SS</a></td><td>Cheetah 15k.7</td><td>450GB</td><td>$385.87</td><td>0.8575</td><td>450GB</td><td></td></tr><tr><td>13</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st3600057ss-600gb-cheetah-15k7-35-15000rpm-sas-hard-drive-aa19838">Seagate ST3600057SS</a></td><td>Cheetah 15k.7</td><td>600GB</td><td>$477.99</td><td>0.7966</td><td>600GB</td><td></td></tr><tr><th colspan="10"><hr /></th></tr><tr><td>15</td><td colspan="9">## 10k 2.5", Seagate</td></tr><tr><td>16</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st300mm0026-300gb-savvio-10k6-25-10000rpm-sas-hard-drive-ab50595">Seagate ST300MM0026</a></td><td>Savvio 10k.6</td><td>300GB</td><td>$226.81</td><td>0.7560</td><td>300GB</td><td></td></tr><tr><td>17</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st450mm0026-450gb-savvio-10k6-25-10000rpm-sas-hard-drive-ab52562">Seagate ST450MM0026</a></td><td>Savvio 10k.6</td><td>450GB</td><td>$366.29</td><td>0.8140</td><td>450GB</td><td></td></tr><tr><td>18</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st600mm0026-600gb-savvio-10k6-25-10000rpm-sas-hard-drive-ab52563">Seagate ST600MM0026</a></td><td>Savvio 10k.6</td><td>600GB</td><td>$404.84</td><td>0.6747</td><td>600GB</td><td></td></tr><tr><td>19</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st900mm0026-900gb-savvio-10k6-25-10000rpm-sas-hard-drive-ab51561">Seagate ST900MM0026</a></td><td>Savvio 10k.6</td><td>900GB</td><td>$531.86</td><td>0.5910</td><td>900GB</td><td></td></tr><tr><th colspan="10"><hr /></th></tr><tr><td>21</td><td colspan="9">## 10k WD, XE, 2.5" SAS</td></tr><tr><td>22</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd6001bkhg-wd-600gb-xe-25-10000rpm-sas-enterprise-hard-drive-ab49915">Western Digital WD6001BKHG</a></td><td>XE</td><td>600GB</td><td>$398.99</td><td>0.6650</td><td>600GB</td><td></td></tr><tr><td>23</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd9001bkhg-wd-900gb-xe-25-10000rpm-sas-enterprise-hard-drive-ab49916">Western Digital WD9001BKHG</a></td><td>XE</td><td>900GB</td><td>$590.99</td><td>0.6567</td><td>900GB</td><td></td></tr><tr><th colspan="10"><hr /></th></tr><tr><td>25</td><td colspan="9">## 10k WD, VelociRaptor, 3.5" SATA</td></tr><tr><td>26</td><td>3.5"</td><td>SATA3</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd1000dhtz-wd-1tb-velociraptor-35-10000rpm-sata3-hard-drive-aa19759">Western Digital WD1000DHTZ</a></td><td>VelociRaptor</td><td>1TB</td><td>$258.99</td><td>0.2590</td><td>1000GB</td><td></td></tr><tr><td>27</td><td>3.5"</td><td>SATA3</td><td>10000RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd5000hhtz-wd-500g-velociraptor-35-10000rpm-sata3-hard-drive-aa19758">Western Digital WD5000HHTZ</a></td><td>VelociRaptor</td><td>500GB</td><td>$150.99</td><td>0.3020</td><td>500GB</td><td></td></tr><tr><th colspan="10"><hr /></th></tr><tr><td>29</td><td colspan="9">## WD, 3.5" SATA, SE NAS (Scalability)</td></tr><tr><td>30</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd2000f9yz-wd-2tb-se-35-7200rpm-sata3-nas-hard-drive-ab49917">Western Digital WD2000F9YZ</a></td><td>SE</td><td>2TB</td><td>$166.99</td><td>0.0835</td><td>2000GB</td><td>NAS</td></tr><tr><td>31</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd3000f9yz-wd-3tb-se-35-7200rpm-sata3-nas-hard-drive-ab49913">Western Digital WD3000F9YZ</a></td><td>SE</td><td>3TB</td><td>$237.99</td><td>0.0793</td><td>3000GB</td><td>NAS</td></tr><tr><td>32</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd4000f9yz-wd-4tb-se-35-7200rpm-sata3-nas-hard-drive-ab49914">Western Digital WD4000F9YZ</a></td><td>SE</td><td>4TB</td><td>$308.99</td><td>0.0772</td><td>4000GB</td><td>NAS</td></tr><tr><th colspan="10"><hr /></th></tr><tr><td>34</td><td colspan="9">## Other</td></tr><tr><td>35</td><td>3.5"</td><td>SAS</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/western-digital-wd4001fyyg-wd-4tb-re-35-7200rpm-sas-hard-drive-ab47638">Western Digital WD4001FYYG</a></td><td>RE</td><td>4TB</td><td>$460.99</td><td>0.1152</td><td>4000GB</td><td></td></tr><tr><td>36</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st4000nc000-4tb-terascale-hdd-35-7200rpm-sata3-hard-drive-ise-ab52536">Seagate ST4000NC000</a></td><td>Terascale HDD</td><td>4TB</td><td>$376.32</td><td>0.0941</td><td>4000GB</td><td></td></tr><tr><td>37</td><td>2.5"</td><td>SAS</td><td>15000RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st9300653ss-300gb-savvio-15k3-25-15000rpm-sas-hard-drive-aa19875">Seagate ST9300653SS</a></td><td>Savvio 15k.3</td><td>300GB</td><td>$489.90</td><td>1.6330</td><td>300GB</td><td></td></tr><tr><td>38</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st6000nm0024-6tb-enterprise-capacity-35-7200rpm-sata3-hard-drive-ab55308">Seagate ST6000NM0024</a></td><td>Enterprise Capacity</td><td>6TB</td><td>$908.99</td><td>0.1515</td><td>6000GB</td><td></td></tr><tr><td>39</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td><a href="http://www.mwave.com.au/product/hitachi-0b23661-ultrastar-15k600-0b23661-300gb-35-sas-hard-drive-aa19842">Hitachi 0B23661</a></td><td>UltraStar</td><td>300GB</td><td>$316.84</td><td>1.0561</td><td>300GB</td><td></td></tr><tr><td>40</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td><a href="http://www.mwave.com.au/product/seagate-st4000nm0033-4tb-constellation-es3-hdd-35-7200rpm-sata3-hard-drive-ab47965">Seagate ST4000NM0033</a></td><td>Constellation ES.3 HDD</td><td>4TB</td><td>$446.81</td><td>0.1117</td><td>4000GB</td><td></td></tr><tr><td>41</td><td>3.5"</td><td>SATA3</td><td>IntelliPower</td><td><a href="http://www.mwave.com.au/product/western-digital-wd10eurx-wd-1tb-avgp-35-intellipower-sata3-av-hard-drive-aa19751">Western Digital WD10EURX</a></td><td>AV-GP</td><td>1TB</td><td>$93.99</td><td>0.0940</td><td>1000GB</td><td>AV</td></tr></tbody> <caption align="bottom"><b>Mwave Enterprise Disk Retail Prices</b></caption></table><br />Same data, sorted on $/GB<br /><br /><table frame="box"><colgroup><col span="8"></col><col span="1" style="background-color: lightgrey;"></col><col span="1"></col></colgroup> <thead><tr><th></th><th></th><th></th><th></th><th align="centre">05-June-2014</th></tr><tr><th></th><th></th><th></th><th colspan="3"><a href="http://www.mwave.com.au/">http://www.mwave.com.au</a></th></tr><tr><th colspan="10"><hr /></th></tr><tr><th>N</th><th>Sz</th><th>Conn</th><th>RPM</th><th>Brand/Model</th><th>Line</th><th>Cap</th><th>Cost</th><th>$/GB</th><th>GB</th></tr><tr><th colspan="10"><hr /></th></tr></thead><tbody><tr><td>0034</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD4000F9YZ</td><td>SE</td><td>4TB</td><td>$308.99</td><td>0.0772</td><td>4000GB</td></tr><tr><td>0033</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD3000F9YZ</td><td>SE</td><td>3TB</td><td>$237.99</td><td>0.0793</td><td>3000GB</td></tr><tr><td>0032</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD2000F9YZ</td><td>SE</td><td>2TB</td><td>$166.99</td><td>0.0835</td><td>2000GB</td></tr><tr><td>0043</td><td>3.5"</td><td>SATA3</td><td>IntelliPower</td><td>Western Digital WD10EURX</td><td>AV-GP</td><td>1TB</td><td>$93.99</td><td>0.0940</td><td>1000GB</td></tr><tr><td>0038</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Seagate ST4000NC000</td><td>Terascale</td><td>4TB</td><td>$376.32</td><td>0.0941</td><td>4000GB</td></tr><tr><td>0007</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD2000FYYZ</td><td>RE</td><td>2TB</td><td>$218.99</td><td>0.1095</td><td>2000GB</td></tr><tr><td>0042</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Seagate ST4000NM0033</td><td>Constellation ES.3</td><td>4TB</td><td>$446.81</td><td>0.1117</td><td>4000GB</td></tr><tr><td>0037</td><td>3.5"</td><td>SAS</td><td>7200RPM</td><td>Western Digital WD4001FYYG</td><td>RE</td><td>4TB</td><td>$460.99</td><td>0.1152</td><td>4000GB</td></tr><tr><td>0009</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD3000FYYZ</td><td>RE</td><td>3TB</td><td>$345.99</td><td>0.1153</td><td>3000GB</td></tr><tr><td>0008</td><td>3.5"</td><td>SAS</td><td>7200RPM</td><td>Western Digital WD3001FYYG</td><td>RE</td><td>3TB</td><td>$361.99</td><td>0.1207</td><td>3000GB</td></tr><tr><td>0006</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD1003FBYZ</td><td>RE</td><td>1TB</td><td>$120.99</td><td>0.1210</td><td>1000GB</td></tr><tr><td>0010</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD4000FYYZ</td><td>RE</td><td>4TB</td><td>$487.99</td><td>0.1220</td><td>4000GB</td></tr><tr><td>0005</td><td>3.5"</td><td>SAS</td><td>7200RPM</td><td>Western Digital WD1001FYYG</td><td>RE</td><td>1TB</td><td>$148.99</td><td>0.1490</td><td>1000GB</td></tr><tr><td>0040</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Seagate ST6000NM0024</td><td>Enterprise Capacity</td><td>6TB</td><td>$908.99</td><td>0.1515</td><td>6000GB</td></tr><tr><td>0004</td><td>3.5"</td><td>SATA3</td><td>7200RPM</td><td>Western Digital WD5003ABYZ</td><td>RE</td><td>500GB</td><td>$88.99</td><td>0.1780</td><td>500GB</td></tr><tr><td>0028</td><td>3.5"</td><td>SATA3</td><td>10000RPM</td><td>Western Digital WD1000DHTZ</td><td>VelociRaptor</td><td>1TB</td><td>$258.99</td><td>0.2590</td><td>1000GB</td></tr><tr><td>0029</td><td>3.5"</td><td>SATA3</td><td>10000RPM</td><td>Western Digital WD5000HHTZ</td><td>VelociRaptor</td><td>500GB</td><td>$150.99</td><td>0.3020</td><td>500GB</td></tr><tr><td>0021</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td>Seagate ST900MM0026</td><td>Savvio 10k.6</td><td>900GB</td><td>$531.86</td><td>0.5910</td><td>900GB</td></tr><tr><td>0025</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td>Western Digital WD9001BKHG</td><td>XE</td><td>900GB</td><td>$590.99</td><td>0.6567</td><td>900GB</td></tr><tr><td>0024</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td>Western Digital WD6001BKHG</td><td>XE</td><td>600GB</td><td>$398.99</td><td>0.6650</td><td>600GB</td></tr><tr><td>0020</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td>Seagate ST600MM0026</td><td>Savvio 10k.6</td><td>600GB</td><td>$404.84</td><td>0.6747</td><td>600GB</td></tr><tr><td>0018</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td>Seagate ST300MM0026</td><td>Savvio 10k.6</td><td>300GB</td><td>$226.81</td><td>0.7560</td><td>300GB</td></tr><tr><td>0015</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td>Seagate ST3600057SS</td><td>Cheetah 15k.7</td><td>600GB</td><td>$477.99</td><td>0.7966</td><td>600GB</td></tr><tr><td>0019</td><td>2.5"</td><td>SAS</td><td>10000RPM</td><td>Seagate ST450MM0026</td><td>Savvio 10k.6</td><td>450GB</td><td>$366.29</td><td>0.8140</td><td>450GB</td></tr><tr><td>0014</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td>Seagate ST3450857SS</td><td>Cheetah 15k.7</td><td>450GB</td><td>$385.87</td><td>0.8575</td><td>450GB</td></tr><tr><td>0013</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td>Seagate ST3300657SS</td><td>Cheetah 15k.7</td><td>300GB</td><td>$280.21</td><td>0.9340</td><td>300GB</td></tr><tr><td>0041</td><td>3.5"</td><td>SAS</td><td>15000RPM</td><td>Hitachi 0B23661</td><td>UltraStar</td><td>300GB</td><td>$316.84</td><td>1.0561</td><td>300GB</td></tr><tr><td>0039</td><td>2.5"</td><td>SAS</td><td>15000RPM</td><td>Seagate ST9300653SS</td><td>Savvio 15k.3</td><td>300GB</td><td>$489.90</td><td>1.6330</td><td>300GB</td></tr></tbody> <caption align="bottom"><b>Mwave Enterprise Disk Retail Prices, Sorted on $ per GB</b></caption></table>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com1tag:blogger.com,1999:blog-29875143.post-72925117030571596262014-06-01T22:23:00.000-07:002014-06-06T00:44:42.412-07:00Historical External Disk Storage Data: IDC Worldwide tracking reportData from IDC's Quarterly <a href="http://www.idc.com/getdoc.jsp?containerId=prUS24716314">Worldwide External Disk Storage Systems Factory Revenue</a>s series (Press Releases). Multiply quarterly values by 4 for an approx yearly value. Full data not available prior to 2011.<br />For 2013: US$24.4 billion and 34.6PB.<br /><a name='more'></a><br />The naively calculated cost per Gigabyte of shipped "External Disk Storage", excluding NAS, iSCSI, "open SAN" and internal Direct Attached Storage (DAS).<br /><br />This is not the "Total Disk" market for Enterprise storage, but roughly Enterprise RAID arrays.<br /><br />For the Enterprise Storage reported by IDC, there are <i>no</i> small, cheap systems sold: all systems sold will have both some room for growth and be large enough to amortise the highest cost items across many drives. The calculated Price/GB isn't inflated by small systems.<br /><br />Enterprises are sophisticated purchasers of I.T. - they spend considerable time & resources on purchase and upgrade decisions, sifting through many options to select optimal price/performance for each application. They are also very aware of the vendor product offerings and their price points, opting for low-priced priced solutions when they are applicable. This direct-substitution effect sets the caps the price vendors can charge. As mid-range systems grow in features, performance and capacity due to technology progression, they poach sales from high-end systems. In the same way that lower-level platforms,s mini-computers, then PC's + LANs and now tablets/portables with "Cloud" Apps, have steadily eroded the markets for high-end computers, mainframe and supercomputer.<br /><br />One of the factors not widely discussed is the purchasing policy of Enterprises is directed by their existing equipment. Technically, there is both a <i>Barrier to Entry</i> and a <i>Barrier to Exit</i>.<br /><br />There are sound commercial and managerial reasons to limit the number of vendors on-site and the degree of change. It begins with access to trained staff and the learning curve involved in maximising systems throughput ('tuning'), learning troubleshooting and monitoring tools & techniques and the administration of ancillary tasks, such as snapshots, backup/restore and archives. Migrating the complete Storage pool of an enterprise from old to new equipment is a critical task, one that cannot be allowed to fail, that cannot be allowed to corrupt any data and cannot exceed the planned window. It also cannot result in any 'surprises', like incompatible block-sizes blowing out filesystem sizes or renamed/missing 'extents' creating problems for databases and filesystems.<br /><br />Enterprises only change major I.T. vendors with extreme reluctance. Hitachi Data Systems, HDS, had a detailed study of the costs of Data Migration: it was between 5-10 times the cost of the Storage.<br /><br />A single vendor, even if migrating some data to a lower-cost product, is both experienced in the technique and motivated to get it right, first time. Large customers <i>will</i> flee dysfunctional and poorly performing vendors, albeit slowly. The most obvious example is Unisys. Since the early 1980's merger of Sperry-Univac and Burroughs, its whole business has shrunk 10-fold with mainframe sales by much more. I was once hired into a Unisys site where the customer was unhappy to the point of tearing up all contracts.<br /><br />The <a href="http://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf">EMC/IDC "Digital Universe"</a> report estimates in 2013 there were 4.4 Zettabytes (4,400 Exabytes) of installed Storage Capacity (Disk, Flash, etc) at an estimated 75% utilisation.<br /><br />The data illustrates the "Storage Paradox", revenue decreases whilst capacity increases, noted in the "Digital Universe" report. There are at least three effects in operation:<br /><ul><li>The per-unit cost of 'capacity optimised' Enterprise drives from vendors is continuing to fall,</li></ul><ul><li>For high-performance Storage, many types of Flash memory are displacing premium-priced smaller, high-RPM drives, and</li></ul><ul><li>competition between vendors and from their own lower-cost "substitutes" is forcing Enterprise pricing down.</li></ul><div>During the 2008 Great Recession/Global Financial Crisis, sales of Enterprise storage first <i>increased</i> as large corporates invested more in Information Technology, then in late 2009, suffered a reversal when they reduced spending to achieve savings. They are presumed to have bought cheaper disk solutions, such as workgroup servers and smaller NAS appliances from the same vendors.</div><br /><br /><table border="1" cellpadding="5" style="border-collapse: collapse;"><thead><tr><th>Qtr</th><th>$ per GB</th><th>Revenue<br />(US billion)</th><th>Capacity<br />(petabytes)</th></tr><tr></tr></thead><tbody><tr><td>4Q13</td><td>$0.68</td><td>$6.9</td><td>10,200</td></tr><tr><td>3Q13</td><td>$0.68</td><td>$5.7</td><td>8,400</td></tr><tr><td>2Q13</td><td>$0.72</td><td>$5.9</td><td>8,200</td></tr><tr><td>1Q13</td><td>$0.76</td><td>$5.9</td><td>7,800</td></tr><tr><td>4Q12</td><td>$0.84</td><td>$6.7</td><td>8,000</td></tr><tr><td>3Q12</td><td>$0.83</td><td>$5.9</td><td>7,104</td></tr><tr><td>4Q11</td><td>$1.05</td><td>$6.6</td><td>6,279</td></tr><tr><td>3Q11</td><td>$1.07</td><td>$5.8</td><td>5,429</td></tr><tr><td>1Q11</td><td>$1.13</td><td>$5.6</td><td>4,956</td></tr><tr><td>4Q10</td><td>$1.19</td><td>$6.1</td><td>5,127</td></tr><tr><td>3Q10</td><td>$1.21</td><td>$5.2</td><td>4,299</td></tr><tr><td>2Q10</td><td>$1.37</td><td>$5.0</td><td>3,645</td></tr><tr><td>1Q10</td><td>$1.47</td><td>$5.0</td><td>3,397</td></tr><tr><td>2Q08</td><td>$2.87</td><td>$5.1</td><td>1,777</td></tr><tr><td>3Q07</td><td>$3.38</td><td>$4.4</td><td>1,300</td></tr><tr><td>1Q07</td><td>$4.30</td><td>$4.3</td><td>1,000</td></tr></tbody><caption align="bottom"><b>Average installed Cost of Enterprise Storage</b></caption></table><br /><table frame="box"><colgroup><col span="1" style="background-color: lightgrey;"></col><col span="6"></col></colgroup> <thead><tr><th>Qtr</th><th>Revenue<br />Delta</th><th>Revenue<br />(US billion)</th><th>Capacity<br />(petabytes)</th><th>PB Delta<br />(year over year)</th><th>Date</th><th>Year</th></tr></thead><tbody><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>4Q13</td><td>2.4%</td><td>$6.9</td><td>10,200</td><td>26.2%</td><td>March 7</td><td>2014</td></tr><tr bgcolor="pink"><td>3Q13</td><td>-3.5%</td><td>$5.7</td><td>8,400</td><td>16.1%</td><td>December 6</td><td>2013</td></tr><tr bgcolor="pink"><td>2Q13</td><td>-0.8%</td><td>$5.9</td><td>8,200</td><td>21.5%</td><td>September 6</td><td>2013</td></tr><tr bgcolor="pink"><td>1Q13</td><td>-0.9%</td><td>$5.9</td><td>7,800</td><td>26.4%</td><td>June 7</td><td>2013</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>4Q12</td><td>2.3%</td><td>$6.7</td><td>8,000</td><td>25.3%</td><td>March 8</td><td>2013</td></tr><tr><td>3Q12</td><td>3.3%</td><td>$5.9</td><td>7,104</td><td>24.4%</td><td>December 7</td><td>2012</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>4Q11</td><td>7.7%</td><td>$6.6</td><td>6,279</td><td>22.4%</td><td>March 5</td><td>2012</td></tr><tr><td>3Q11</td><td>10.8%</td><td>$5.8</td><td>5,429</td><td>30.7%</td><td>December 2</td><td>2011</td></tr><tr><td>2Q11</td><td>12.2%</td><td>$5.6</td><td>?</td><td>?</td><td>September 2</td><td>2011</td></tr><tr><td>1Q11</td><td>13.2%</td><td>$5.6</td><td>4,956</td><td>46.3%</td><td>June 3</td><td>2011</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>4Q10</td><td>16.2%</td><td>$6.1</td><td>5,127</td><td>55.7%</td><td>March 4</td><td>2011</td></tr><tr><td>3Q10</td><td>19.0%</td><td>$5.2</td><td>4,299</td><td>65.2%</td><td>December 3</td><td>2010</td></tr><tr><td>2Q10</td><td>20.4%</td><td>$5.0</td><td>3,645</td><td>54.6%</td><td>September 3</td><td>2010</td></tr><tr><td>1Q10</td><td>17.1%</td><td>$5.0</td><td>3,397</td><td>55.2%</td><td>June 4</td><td>2010</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr bgcolor="pink"><td>1Q09</td><td>-13.6%</td><td>$5.6</td><td>?</td><td>14.8%</td><td>June 5</td><td>2009</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>2Q08</td><td>16.7%</td><td>$5.1</td><td>1,777</td><td>43.7%</td><td>September 5</td><td>2008</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>3Q07</td><td>5.1%</td><td>$4.4</td><td>1,300</td><td>49.4% </td><td>December 10</td><td>2007</td></tr><tr><td>1Q07</td><td>?</td><td>$4.3</td><td>1,000</td><td>?</td><td>July 3</td><td>2007</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>4Q05</td><td>17.9%</td><td>?</td><td>650</td><td>54.6%</td><td>March 5</td><td>2006</td></tr><tr><td>1Q05</td><td>6.7%</td><td>$3.8</td><td>?</td><td>?</td><td>March 5</td><td>2005</td></tr><tr><td></td><td colspan="6"><hr /></td></tr><tr><td>4Q03</td><td>8.4%</td><td>$3.7</td><td>?</td><td>?</td><td>December 5</td><td>2003</td></tr></tbody><caption align="bottom"><b>IDC Worldwide External Disk Storage</b></caption></table>...Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-68203777224609970862014-05-27T21:25:00.001-07:002014-05-29T20:04:19.857-07:00"MAID" using 2.5 in drivesWhat would a current attempt at MAID look like with 2.5" drives?<br /><br /><a href="http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#MAID">"MAID", Massive Array of Idle Disks</a>, was an attempt by <a href="http://web.archive.org/web/20090801180959/http://www.copansystems.com/">Copan Systems</a> (<a href="http://en.wikipedia.org/wiki/Silicon_Graphics_International">bought by SGI in 2009</a>) at near-line Bulk Storage. It had a novel design innovation, mounting drives vertically back-to-back in slide-out canisters (patented), and was based on an interesting design principle: off-line storage can mostly be powered down.<br /><br />It was a credible attempt, coming out of <a href="https://blog.archive.org/about/">The Internet Archive</a>, and their "<a href="https://blog.archive.org/2010/07/27/the-fourth-generation-petabox/">Petabox</a>" (<a href="https://archive.org/web/petabox.php">a more technical view</a> and on <a href="http://en.wikipedia.org/wiki/PetaBox">Wikipedia</a>). At 24 x 3.5" drives per 4RU, they contain around half the 45 drives of the <a href="http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/">Backblaze 4.0 Storage Pod</a>. The Petabox has 10Gbps uplinks and much beefier CPU's and more DRAM.<br /><br />The <a href="http://www.xyratex.com/products/clusterstor-6000">Xyratex ClusterStor</a> (now Seagate) offers another benchmark: their Scalable Storage Unit (SSU) stores 3 rows of 14 drives in 2.5RU x 450mm slide-out draws, allowing hot-plug access to all drives. Two SSU's comprise a single 5RU unit of 84 drives, with up to 14 SSU's per rack for 1176 drives per rack, an average of 28 x 3.5" drives per Rack Unit.<br /><a name='more'></a>Key design questions:<br /><ul><li>How close to the <a href="http://stevej-lab-notes.blogspot.com.au/2014/03/storage-efficiency-measures.html">maximum drive packing density</a> of ~9,000 x 9.5mm or ~6,000 x 15mm drives could you get?</li></ul><ul><li>How do you power and cool all those drives?</li></ul><ul><li>Do you want drives fixed, like Backblaze, or in drawers or canisters like Copan and Xyratex?</li></ul><div>I'd use USB 3.0 as the power and connection method. Best packing density can be achieved if the drive manufacturers integrate the USB 3.0 chips onto the drive, replacing the SAS/SATA chips, in the same way both Seagate and Western Digital have recently announced ethernet connected drives.</div><div><br /></div><div>Ethernet, with PoE, is an alternative method with drives already nearing production.</div><div><br /></div><div>Groups of drives would share a USB hub or Ethernet switch, with a single connection back to the motherboard or another level of hubs/switches.</div><div><br /></div><div>For MAID, the motherboard(s) must be able to power-down individual devices, specifically be able to turn power per-port off and on, hence either control ports on USB hubs or per-port PoE feeds on switches. Power supplies feed the USB hubs or PoE switches. Per hub/switch power monitoring is necessary to avoid overloads. Small local batteries or ultra-capacitors might provide enough head-room to allow short duration overloads.</div><div><br /></div><div>Using the Xyratex vertical mounting technique, two drawers of drives can fit in 5RU, while limiting the unit to 7 x 5RU units to allow Top of Rack switches and power supplies.</div><div><br /></div><div>Alternatively 14 x 3RU units also fit in a 42RU rack, without space for PSU's or switches.</div><div><br /></div><div>Allowing for drawer sliders, 432-435mm is available internally to house drives.</div><div><br /></div>Drive Cooling: their are multiple vendors selling custom-built "Cold Plate" technology. [<a href="http://www.1-act.com/products/custom-cold-plates/">An example product</a>, <a href="http://en.wikipedia.org/wiki/Computer_cooling">Wikipedia</a>, and <a href="http://www.electronics-cooling.com/2011/12/technical-brief-design-considerations-for-high-performance-processor-liquid-cooled-cold-plates/">a 2011 article]</a>.<br /><br />The unclear thickness of cold-plates for low-power applications isn't clear. With modern machining, 0.5mm channels could be reliably & precisely machined in either Copper or Aluminium blocks, resulting in 1.5mm-2.0mm thick plates, the length of the rack: 750mm-950mm.<br /><br />The critical factor for liquid cooling is the thermal coupling of plates to drives. This is well known for CPU heat-sinks: pastes, sometimes toxic, are applied to do this job. This would impact in-place drive replacement, forcing whole modules to be replaced at once, with individual drives replaced elsewhere.<br /><br />As well, the thermal coupling must also dampen vibrations, not be a source of direct mechanical coupling between drives, suggesting either a rubber/synthetic plate with metal contact pads or a metal plate with rubberised thermal coupling pads.<br /><br />The bottom connectors and top & side locators (drives cannot be screwed in) would also have to mechanically isolate each drive to avoid vibration coupling.<br /><br />Using cold-plate cooling would allow drives to be mounted front-to-back or side-to-side, unlike air-cooling where only front-to-back is possible to maintain an airflow.<br /><br />The modules, ("sleds"), holding the drives can be open-frames, not adding to the width required, but adding a little to the height. In both formats, there is space available for this.<br /><br />I haven't factored in the space needed for motherboards and power supplies.<br />This may reduce the effective depth to 560mm-630mm, 7 or 8 drives deep.<br /><br /><b>Single row per Sled:</b><br /><br />In this way, drives could sit without gaps along a module, taking just 70mm each:<br /><ul><li>11 drives in 770mm or 13 drives in 910mm [14 in 980mm only works for fixed drives]</li></ul><ul><li>9.5mm drives, with 1.5mm cold-plates, 11mm centres, allows 39 drives across a rack:</li><ul><li>117 platters per row</li><li>429 (11-deep) or 507 (13-deep) drives per unit.</li><li>for 6006 to 7098 drives per 42RU rack (14 units)</li><li>a 66% - 78% packing density</li></ul></ul><ul><li>15mm drives with 2mm cold-plates at 17mm centres allows 25 drives across a rack:</li><ul><li>100 platters per row</li><li>275 drives (11-deep) or 325 (13-deep)</li><li>for 3850 to 4550 drives per 42RU rack (14 units)</li><li>mini:~ steve$ echo 4k 6000 sa 3850 la /p 4550 la /p|dc</li><li>a 64% to 75% packing density</li></ul></ul>At ~100gm (0.1kg) per drive, there will be ~500kg per rack in drives, required drawers to support 20kg each.<br /><br /><b>Dual rows per Sled:</b><br /><br />The same cold-plate can cool back-to-back disk drives.<br /><br />A 1.3-1.5mm thick plate bonded to 9.5 mm drives creates a module 20.3mm-20.5mm thick.<br /><br />Across a rack, 6 drives @ 70mm spacing (12 per module), with 12mm-15mm space for frames/couplings.<br />This allows 10 modules in 205mm, and 44 modules in 902mm [or 47 @ 20.3mm in 954.1mm]<br />A total of 524 [564] per layer.<br /><br />Along a rack, 13 drives can fit in 910mm [26 per module] and 20 modules @ 20.5mm in 410mm, or 21 modules @ 20.3mm in 426mm.<br />A total of 520 [546] per layer.<br /><br />The next smaller layout, 840mm and 12 drives, fits 480 [504] 9.5mm drives.<br /><br />The advantage of across-the-rack layout with cold-plates is that depth of rack used can be changed in much smaller increments.<br /><br /><i>Note: this extreme packing density is possible in a USB connected MAID device. It wouldn't be a first choice for SAS/SATA drives.</i><br /><br />For dual 15mm drives bonded to 1.8mm-2.0mm cold-plates, 31.8mm-32.0mm thick modules,<br /><br /><ul><li>28 modules take 896mm for 336 drives per layer, across the rack, and</li><li>13 modules at 910mm deep [13 drives], for 338 drives per layer along the rack,</li><ul><li>or 13 modules at 840mm deep [12 drives], for 312 drives per layer.</li></ul></ul><div><b>At 14 layers per rack, 7280 drives/rack @ 9.5mm and 4368 drives/rack @ 15mm.</b></div><div>Around an 80% packing density.</div><div><br /></div><div>For 7mm drives, double the 15mm drive figures.</div><div>As this is a MAID design, single-platter drives wouldn't be considered as we're not trying to maximise IO/sec or minimise latency, we're maximising capacity, storage density or $$/Gb.</div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com1tag:blogger.com,1999:blog-29875143.post-26473125766471621042014-05-04T02:13:00.001-07:002014-05-29T22:38:27.433-07:00RAID-0 and RAID-3/4 SparesThis piece is not based on an exhaustive search of the literature. It addresses a problem that doesn't seem to have been addressed as RAID-0 and the related RAID-3/4, a single parity drive.<br /><br />Single parity drives seem to be deemed early on to be impractical because it apparently comprises a deliberate system bottleneck. RAID-3/4 has no bottleneck for streaming reads/writes and for writes, performance becomes, not approaches, the raw write performance of the array is available, identical to RAID-0 (stripe). For random writes, the 100-150 times speed differential between sequential and random access of modern drives can be leveraged with a suitable buffer to remove the bottleneck. The larger the buffer, the more likely the pre-read of data, to save to calculate the new parity, won't be needed. This <i>triples</i> the array throughput by avoiding the full revolution forced by the read/write-back cycle.<br /><br />Multiple copies of the parity drive (RAID-1) can be kept to mitigate against the very costly failure of a parity drive: <i>all blocks on every drive must be reread to recreate a failed parity drive.</i> For large RAID groups and the very low price of small drives, this is not expensive.<br /><br />With the availability of affordable, large SSD's, naive management of a single parity drive also removes the bottleneck for quite large RAID groups. The SSD can be backed by a log-structured recovery drive, trading on-line random IO performance for rebuild time.<br /><br />Designing Local and/or Global Spares for large (N=64..512) RAID sets is necessary to reduce overhead, improve reconstruction times and avoid unnecessary partitioning, limiting recovery options and causing avoidable data loss events.<br /><a name='more'></a><br />In 1991, <a href="http://www.google.com/patents/US5258984">IBM lodged a patent for Distributed Spares</a>, like RAID-5/6 where a Parity Disk(s) got distributed, in diagonal stripes, across all disks in a RAID group. There is a mapping from logical to physical drives where, like RAID-5/6, for adjacent stripes, blocks from a single logical drive are on different physical drives, 'rotated' around, if you will. If a physical drive fails, all drives in the RAID-group take part in the reconstruction with the mapping of logical drives onto the rotating blocks being modified. It allows multiple spare drives.<br /><br />Gibson and Howard presented a paper in 1992, "<a href="http://www.pdl.cmu.edu/ftp/Declustering/ASPLOS.pdf">Parity Declustering for Continuous Operation in Redundant Disk Arrays</a>", intended to maximise reconstruction speed, producing minimal reconstruction times. Their algorithm criteria:<br /><ol><li>Single failure correcting.</li><li>Distributed reconstruction. </li><li>Distributed parity. </li><li>Efficient mapping. </li><li>Large write optimization. </li><li>Maximal parallelism.</li></ol>Gibson and Howard create a smaller logical array that is mapped onto the physical array, trading the overhead of parity for faster reconstruction and shorter periods of degraded performance. This mapping is not dissimilar to the IBM distributed spare. By adding unused, 'spare', physical drives to the parity declustering algorithm, the same outcome can be achieved as IBM's distributed spares.<br /><br />The per-drive unused/spare blocks of IBM's "Distributed Spares" and the spares possible in a Parity Declustering logical/physical mapping can be collected and laid out on drives in many ways, depending on how many 'associated' blocks are stored consecutively. For both schemes, only single 'associated' blocks are used.<br /><br />On a drive with a total of M spare (or parity) blocks, the number of 'associated' blocks consecutively allocated can vary from N=1, to N=M, but with the proviso that only exactly divisors of M are possible. The remainder of M/k must be zero, where k is the number of consecutive blocks.<br /><br />N=1 is the IBM and Parity Declustering case.<br />N=M is what we would now recognise as a disk <i>partition.</i><br />Values of N from 2 to M/k we might describe as "chunks", "stripes", "extents" or allocation units.<br /><br />We end up with <i>k</i> extents, where N=M/<i>k.</i><br /><br />Microsoft in its "Storage Spaces" product uses a fixed allocation unit of 256MB. Like Logical Volume Managers, this notion of (physical) "extents" is the same as posited here.<br /><br />In RAID-3/4, the disks holding data only are exactly a RAID-0 (striped) set. For this reason, any Spare handling scheme for one will work for both, with the caveat that RAID-0 has no redundancy to recover lost data. The Gibson/Howard logical/physical geometry remapping is still useful in spreading reads across all drives in a reconstruction, albeit it is no longer "Parity Declustering".<br /><br />A RAID-0 (striped) set can be managed locally by a low-power controller, typified by Embedded RAID Controllers found in low-end hardware RAID cards. These RAID-0 (striped) sets can be globally managed with shared parity drives, providing much larger RAID groups. These cards are capable enough to provide:<br /><ul><li>Large block (64kB to 256kB) error detection and correction, such as Hamming Codes or Galois Field "Erasure Codes".</li><li>Manage distributed spare space</li></ul><div>By separating out and globally managing Parity Drives, additional techniques are available, such as mirroring parity drives, SSD Parity and/or log-structured updates. Larger IO buffers acting as read-back caches can significantly improve random IO performance with a degree of locality (updating the same blocks).</div><div><br /></div><div>The Embedded RAID Controllers can handle distributed spares by storing one logical drive per partition. By distributing the logical drives across many physical drives, reconstruction when needed, can be run utilising all drives in a RAID group.</div><div><br /></div><b>Notes on RAID-3 vs RAID-4:</b><br /><br />Table 1 of Section 4.1, pg 13, of the <a href="http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf">SNIA technical definitions documents</a> lays down the industry standard definitions (below) of the different primary RAID types. <a href="http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_3">Wikipedia provides an interpretation</a> of these definitions, with per 'byte' and per 'block' calculation of parity.<br /><br />RAID-3 and RAID-4 use a single parity drive, computed with XOR over all drives in a parity volume group. [SNIA uses the term "disk group", pg 12, para 3.1.4. "Parity Volume Group" is a non-standard term.] Additional parity, like the Q-parity used in RAID-6, could be added to recover from multiple failed drives.<br /><br /><b>RAID-3:</b> Striped array with non-rotating parity, optimized for long, single- threaded transfers. [parity computed by byte, sector or native disk block, maximising streaming performance and scheduling IO across all disks]<br /><br /><b>RAID-4: </b> Striped array with non-rotating parity, optimized for short, multi- threaded transfers. [parity is computed on larger blocks, striped across the drives, not on whole 'extents']<br /><br />What's not generally commented on is that RAID-3 and RAID-4, like all RAID types, has no read-write-back penalty for large streaming transfers. Each stripe written in full requires no pre-read of data or parity to recalculate the parity block(s). This means the usual objection to RAID-3/4, a single "hot spot" on the parity drive, does not limit write performance for streaming transfers, only for intensive random writes.<br /><br />The loss of a parity drive forces a recalculation of all parity, forcing a rescanning of all data in all drives in the effected parity volume group. The system load and impact of the rescan is very high, and while its underway, there is no Data Protection during the parity drive rebuild for the Parity Volume Group, suggesting for larger groups, that mirroring the parity drive is desirable.<br /><br />For each moderate parity group sizes, taking the group off-line and rebuilding in sustained streaming mode would be desirable.<br /><br />As disk size and number of drives per parity volume group increases, volume of data to rescan increases linearly with each. Scan time for a whole array, especially in sustained streaming, may be limited by congestion on shared connections, such as Fibre Channel or SAS.<br /><br />RAID-3/4's most useful characteristic is that the data drives are a RAID-0 volume, with striping not concatenation. The parity volume(s) exist in isolation to the base data volume and can be stored and managed independently.Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-6896716343165667072014-05-03T21:55:00.000-07:002014-05-30T19:24:58.026-07:00Comparing consumer drives in small-systems RAIDThis stems from an email conversation with a friend: <i>why would he be interested in using 2.5" drives in RAID, not 3.5"?</i><br /><br />There are two key questions for Admins at this scale, and my friend was exceedingly sceptical of my suggestion:<br /><ul><li>Cost/GB</li><li>'performance' of 2.5" 5400RPM drives vs 7200RPM drives.</li></ul>I've used retail pricing for the comparisons. <a href="http://stevej-lab-notes.blogspot.com.au/2014/05/retail-disk-prices-as-printed.html">Pricelist</a> and <a href="http://stevej-lab-notes.blogspot.com.au/2014/05/retail-disk-prices-sorted.html">sorted pricelist</a>.<br /><a name='more'></a><br /><hr /><br /><b>Prices used</b><br /><br />4TB WD Red NAS 3.5" drive: $235,<br /> at 5.87 cents/GB<br />4TB Seagate (desktop) 3.5" drive: $189<br /> at 4.73 cents/GB<br /><br />1TB Hitachi 5400RPM 2.5"drive: $80: 4x $80 = $320 for 4TB,<br /> at 8.00 cents/GB<br />1.5TB Hitachi 5400RPM 2.5" drive: $139: 3x $139 = $417 for 4.5TB [12.5% more]<br /> at 9.27 cents/GB<br />1TB Hitachi 7200RPM 2.5"drive: $93: 4x $93 = $372 for 4TB,<br /> at 9.30 cents/GB<br /><br /><hr /><br />I’ve not said 2.5” drives are cheapest <i>now</i> on raw $$/GB. I do expect in the future they <i>could</i> be.<br /><br /><b>On drive count</b><br /><br />The minimum drives needed in RAID-3/5 is 3 (2 data + 1 parity)<br />For 4TB drives, that’s ~$700 (3x$235), giving you 8TB usable. [Or 3x$189 for $570]<br /><br />For the same capacity with 1TB drives, you only need 1 extra drive over usable capacity, ie. 9 drives = 9x$80 = $720.<br /><br />The cost/GB is the same for 1.5TB 2.5" drives is 13.5% more, but the capacity ratios aren’t the same.<br /><br />If you go for 5 data + 1 parity, then you get usable 7.5TB (6% less) for $834 - appreciably more expensive than1TB drives, but with lower IO/sec.<br /><br /><i>Including a spare, hot or cold:</i><br /><br />A spare takes the 3.5” to 4x$235 = ~$940 (or $756) and 2.5” to 10x$80 = $800.<br /><br />This more conservative approach would almost never be used at minimum scale.<br />You’d use RAID-10 (RAID-0 stripes of two RAID-1 volumes).<br />It gives better performance and no RAID-parity overhead. [worked example at end]<br /><br />For 8 vs 32 drives, a more realistic comparison, you spend $1,880 ($1,512) vs $2,560.<br /><br />You’d probably want to run 2 parity drives if this was a business (6 data = 24TB) vs (30 data = 30TB)<br /><br />Price $/TB = $78.33 ($63) vs $85.33, or 9% higher,<br />while usable capacity is 25% more for 2.5” drives [caveat below on 24 vs 32 drives in a box]<br />and I’m ignoring extra costs for high-count SAS/SATA PCI cards. [possibly $750]<br /><br />You have to add in the price of power and replacement drives over a 5-year lifetime.<br />8-drives will consume ~800kWHr/year, or 4,000kWHr in a lifetime = $800 @ 20c/kWHr.<br /><br />The 2.5” drives use half that much power - a $400 saving, but not a slam dunk (we’re looking for ~$650 (or $1000) in savings).<br /><br />Over 5 years you’d expect to lose 10%-12% of drives:<br />1x3.5” and 3x2.5” = $235 vs $160.<br /><br />Close, but still no clear winner.<br /><br /><hr /><br /><b>On rebuild time after failure.</b><br /><ul><li>a 4TB drive will take between 3-4 times longer to rebuild than a 2.5” drive. [only lose one drive at a time]</li><li>if the rebuild fails, you’ve lost 4TB at a time, not 1TB [or the whole volume. Errors are proportional to bits read.]</li><li>this will matter in an 8-drive array in heavy use:</li><ul><li>performance gets degraded hugely during rebuild, further slowing rebuild with competition for access.</li></ul></ul>What I can't calculate is actual RAID rebuild times - this is dependant on the speed of shared connections to the drives and the normal work load on the array, where competition slows down rebuilds. Some RAID arrays are documented as taking <i>more than a week</i> (200hrs) to rebuild a single failed drive.<br /><br />At the very best, a single 4TB drive will take 32,000 seconds (8+ hours) to read or write.<br />A 5400RPM 1TB drive will take around one-third (3 hours).<br /><br />If you're heavily dependant on your RAID, or if it runs saturated for significant periods, then reducing rebuild times is a massive win for a business: the hourly wages cost or company turn-over must be $100+/hr even for small businesses. It doesn't take much of an<br /><br />As the RAID rebuild time increases, the likelihood of a second drive failure, or more likely, an unrecoverable read error, occurring rises. This results in a business disaster, the total loss of data on the RAID array, requiring recovery from backups, if they've been done.<br /><br /><hr /><br /><b>On Performance.</b><br /><b><br /></b>Yes, individually 5400RPM drives are slower than 7200RPM, but for RAID volumes, it comes down to the total number of rotations available (seek time is now much faster than rotation), especially for random I/O loads, like databases.<br /><br />Looking at IO/sec [twice the revs per second, seeks are “half a rotation” on average] Hz below, or Hertz, is "revolution per second":<br /><br /><ul><li>7200RPM = 120Hz = 240 IO/sec, per drive = $1.00 per IO/sec for 4TB NAS 3.5” drives</li><li>5400ROM = 90Hz = 180 IO/sec, per drive = $0.45 per IO/sec for 1TB 2.5” drives</li></ul><br />In the 3x3.5” 8TB RAID-5, you’ve 3x240 IO/sec = 750 IO/sec read or total<br />vs 9x2.5” = 9x180 = 1620 IO/sec, 116% <i>faster</i> [2.16 times the throughput]<br /><br />In the 24TB 8x3.5” drive = 1920 IO/sec vs 5760 IO/sec [3 times the throughput]<br /><br />If you’re running Databases in a few VM’s and connecting to a modest RAID-set or a NAS, 32x2.5” drives will be ~20% more expensive per GB, but give you 25% more capacity and <i>three</i> times the DB performance.<br /><br />What I've calculated is raw IO performance, not included RAID-parity overheads.<br /><br />My understanding is you only get is 24x2.5” drives, not 32, in the same space as 8x3.5” drives. Manufacturers haven’t really pushed the density of 2.5” drives yet.<br /><br />Even if you have modest performance needs, 2.5” drives are a slam-dunk - they deliver twice the IO/sec per dollar and in RAID’s of modest size, three times the throughput for 17% more in $/GB.<br /><br /><i>So, I agree with you, at the low end, 3.5” drives are still the drive of choice.</i><br /><br />We haven’t reached the tipping point yet, but every year the gap is closing, I think that’s the big thing I’ve come to understand.<br /><br />Just like when we moved from 5.25” drives to 3.5” drives on servers: it didn’t happen overnight, but one day when it came to replace an old server, like-for-like drives were not even considered.<br /><br /><hr /><br /><b>Addendum 1 - Calculating the random write (update) performance of RAID.</b><br /><br />The <i>write</i>_ performance of a RAID array with 24TB usable on7200RPM drives in a RAID-5 config (6+1) vs 24+1x1TB for 2.5” drives.<br /><br />RAID-6 has a higher compute load for the second parity, but same IO overheads. Because 3 drives, not 2, are involved in every update, array throughput (for random IO updates) is reduced by 50%, read performance is unaffected.<br /><br />The problem with distributed parity is that <i>every</i> update incurs a read/write-back cycle on two drives (data + parity) <i>and</i> you force a full revolution delay.<br />Instead of 1/2 revolution for the read and 1/2 revolution for the write, you have 1 full revolution for the write:<br /> 3/2 revs per update for each drive, or 6 IO/sec of array capacity. [9 IO/sec for RAID-6].<br /><br />If you’re prepared to buffer whole stripes during streaming writes (large sequential writes), RAID-5 & RAID-6 performance can be close to raw IO of drives, because the read/write-back cycle can be avoided.<br /><br />This addendum is about a typical Database load, random IO in 4kB blocks, not about streaming data.<br /><br />7x3.5" drives can deliver 240 IO/sec per drive = 1680 IO/sec reads<br />and 280 IO/sec RAID-5 writes. [vs 187 IO/sec for RAID-6]<br /><br />25x2.5” drives can deliver 180 IO/sec per drive = 4500 IO/sec reads<br />and 750 IO/sec RAID-5 writes.<br /><br />RAID-1 (2 mirrored drives) delivers twice the IO/sec for reads, but the IO/sec of a single drive for writes as both drives have to write same to same block. RAID-1 isn't limited to mirroring on a pair of drives, any number of mirrors can be used. The same rule applies: read IO/sec increase 'N' times, while write IO/sec remains static.<br /><br />Which is why RAID-10 (stripes of mirrored drives) are popular. Additionally, the data on a drive can be read outside the RAID, blocks aren’t mangled or added, leaving a usable image of filesystem or database, if it fits fully on a drive.<br /><br />For 24TB in RAID-10, you need 12 drives, an additional 4 drives over RAID-6 [~$850 (50%) more than ~$1,900]<br />read throughput is: 12x240 = 2880 IO/sec [just over half the 25x2.5” drive RAID-5 performance]<br />write throughput is: 6x240 = 1440 IO/sec [twice the 25x2.5” drive RAID-5 performance]<br /><br /><b>Note</b>: For 12x3.5” drives delivering 24TB in RAID-10, cost is $2,868 ($2,268), with half the read IO & but twice the write IO of RAID-5.<br /><br />25x2.5” drives delivering 24TB as RAID-5 cost $2,000, or 30% (10%) less $$/GB for a higher IO throughput in normal 80%:20% read/write loads.<br /><br />Power required for 2.5” drives is around 25% (not <i>less</i>): 30W vs 120W.<br />Over 5 years, that’s $1100 vs $300 in electricity, another $800 saving. Cooling is around 20% of direct power cost, not included.<br /><br />To do a full TCO, you need to factor in the cost of replacement drives, the labour cost of replacing them, any business impact (lost sales or extra working hours of users), the cost of the "footprint” [floor space used] as well as the power/cooling costs.<br /><br />The larger sizes of 3.5” drives increases the changes of RAID rebuilds failing and suffering a complete Data Loss event (Bit Error Rate is the same, regardless of capacity. 4 times larger drives = 4 times chance of errors during rebuild).<br /><br />RAID-6, with its performance penalty and 12% extra cost is mandatory for 24TB RAID to achieve reasonable chance of Data Loss during rebuilds. The RAID-5 comparison is very generous. In Real Life, it’d be 8x.35” RAID-6 vs 25x.5" RAID-5, slanting more to 2.5” drives.<br /><br /><b>Note:</b> We are <i>past</i> the break-even point for RAID-1/10 of 3.5” drives vs RAID-5 of 2.5” drives.<br /><br />For 8TB (2x2 =4x3.5” drives vs 8+1 2.5” drives),<br />raw drive costs are: $956 and $720.<br />Power consumption is: 40W vs 11W.<br />Read IO: 4x240 = 960 IO/sec vs 9x180 = 1440 IO/sec [50% <i>higher</i>]<br />Write IO: 2x240 = 480 IO/sec vs (9x180)/3 = 240 IO/sec [50% <i>lower</i>]<br /><br />For 80:20 Read/Write ratio:<br />3.5” = 0.8x960 + 0.2x480 = 864 IO/sec<br />2.5” = 0.8x1440 + 0.2x240 = 1200 IO/sec<br /><br />I’ve left out streaming IO performance because that implies data has to be streamed via the external interface.<br />If we’re talking low-end servers, they’ll have a 1Gbps LAN, at best 2x1Gbps, easily achieved by 2 drives of <i>any</i> size.<br /><br />If you’re talking 10Gbps interfaces, even SFP+ @ $50/cable and $300-$500/PCI card (guess) + 5 times the price/port for switches, you’ve got a much more expensive installation and are probably looking to maximise throughput-per-dollar, not primarily capacity.<br /><br /><hr /><br /><b>Addendum 2</b><br /><br />These are notes on RAID-3/4, single parity drive, not distributed parity as in RAID-5/6,normally it’s avoided because the parity drive becomes the limiting factor.<br /><br />The <i>really</i> interesting thing with dedicated parity drives is performance: the data drives <i>are</i>, not seem to be, RAID-0 (stripes).<br /><br />You get the full raw performance of the drives delivered to you in many situations. The cost of Data Protection is for CPU and RAM for the RAID-parity and at least one drive overhead to store it. I suggest below that two parity drives can be used for 100+ drives in a RAID-group. Small extra cost and protect against worst case rebuild.<br /><br />It does have a performance slowdown on random <i>updates</i>, read, then write-back, but only for isolated blocks.<br />If you’re streaming data with large sequential writes, you don’t need to read/write-back data or parity, and the parity drive keeps up with the data drives.<br /><br />You can do 3 interesting things with a dedicated parity drive for Random IO:<br /><ul><li>use an SSD for parity. Their random IO/sec is 500-1,000 times faster than a single drive. Easy to do, highest performance, lowest overhead. SSD’s are ‘only’ about 2-3 times the cost of HDD’s, making this attractive for large RAID groups (100+), but not so for 3-8 drives.</li></ul><ul><li>use a HDD for parity and a small SSD buffer to convert Random I/O into sequential ‘scan’ from one side of the disk to the other. Need to buffer new & old data blocks as well as parity blocks.</li><ul><li>Sequential IO is 100-150 times faster than Random IO on current drives, logging or journaling writes can leverage that difference.</li></ul></ul><ul><li>use two HDD’s for parity and a buffer. Read current parity from one drive, recalc parity and write new parity to other drive. Then swap which drive is ’new’ and ‘old' (in a ‘ping-pong’ fashion) and repeat. It may take 20-30 minutes on a 2.5” drive per scan, but you can use an SSD to store changes or even log update to a HDD. The number of IO’s to store, even on a very large RAID (300-500 drives) isn’t that large, 750GB for 4kB blocks for 100% write load and no overwrites.</li></ul>For more realistic loads/sizes, 120 drives and 20% write, 0% overwrite, 75GB for 4kB blocks.<br /><br />There’s a performance benefit to this long-range buffering, what may look like a bunch of isolated updates can become “known data” over 30 minutes, meaning you avoid the read/write-back cycle.Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-59293012264358898692014-05-03T20:45:00.002-07:002014-06-05T17:29:37.576-07:00Retail Disk Prices, as printedTable of current retail prices for various types of disk with cost-per-GB.<br /><br />Disclaimer: <i>This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.</i><br /><a name='more'></a><br /><br /><br /><table frame="box"><colgroup><col span="8"></col><col span="1" style="background-color: lightgrey;"></col><col span="1"></col></colgroup> <thead><tr><th></th><th></th><th></th><th></th><th></th><th align="left">01-May-2014</th></tr><tr><th></th><th></th><th></th><th colspan="3"><a href="http://www.msy.com.au/Parts/PARTS.pdf">http://www.msy.com.au/Parts/PARTS.pdf</a></th></tr><tr><th colspan="10"><hr /></th></tr><tr><th>F-Fac</th><th>Typ</th><th>Dsk</th><th>Conn</th><th>RPM</th><th>Brand/Model</th><th>Cap</th><th>Cost</th><th>$/GB</th><th>GB</th></tr><tr><th colspan="10"><hr /></th></tr></thead><tbody><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>1TB</td><td>64</td><td>0.0640</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>2TB</td><td>95</td><td>0.0475</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>3TB</td><td>129</td><td>0.0430</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>4TB</td><td>185</td><td>0.0462</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>2TB</td><td>129</td><td>0.0645</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>3TB</td><td>165</td><td>0.0550</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>4TB</td><td>235</td><td>0.0587</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>1TB</td><td>95</td><td>0.0950</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>2TB</td><td>135</td><td>0.0675</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>3TB</td><td>179</td><td>0.0597</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>4TB</td><td>259</td><td>0.0648</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate NAS</td><td>2TB</td><td>125</td><td>0.0625</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate NAS</td><td>3TB</td><td>160</td><td>0.0533</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate NAS</td><td>4TB</td><td>229</td><td>0.0573</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>500G</td><td>55</td><td>0.1100</td><td>500GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>1TB</td><td>65</td><td>0.0650</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>2TB</td><td>95</td><td>0.0475</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>3TB</td><td>129</td><td>0.0430</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>4TB</td><td>189</td><td>0.0473</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200?</td><td>Hitachi HGST NAS</td><td>3TB</td><td>179</td><td>0.0597</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200?</td><td>Hitachi HGST NAS</td><td>4TB</td><td>249</td><td>0.0622</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA2</td><td>7200?</td><td>Hitachi HGST UltraStar</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Hitachi HGST</td><td>2TB</td><td>185</td><td>0.0925</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Hitachi HGST</td><td>3TB</td><td>270</td><td>0.0900</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Hitachi HGST</td><td>4TB</td><td>365</td><td>0.0912</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>320G</td><td>53</td><td>0.1656</td><td>320GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>500G</td><td>55</td><td>0.1100</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>750G</td><td>66</td><td>0.0880</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>1TB</td><td>80</td><td>0.0800</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>1.5TB</td><td>139</td><td>0.0927</td><td>1500GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Hitachi HGST</td><td>500G</td><td>64</td><td>0.1280</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Hitachi HGST</td><td>750G</td><td>85</td><td>0.1133</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Hitachi HGST</td><td>1TB</td><td>93</td><td>0.0930</td><td>1000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD LPVX</td><td>320G</td><td>53</td><td>0.1656</td><td>320GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD LPVX</td><td>500G</td><td>57</td><td>0.1140</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD BPVX</td><td>750G</td><td>64</td><td>0.0853</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD JPVX</td><td>1TB</td><td>83</td><td>0.0830</td><td>1000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>WD BPKX</td><td>500G</td><td>67</td><td>0.1340</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>WD BPKX</td><td>750G</td><td>78</td><td>0.1040</td><td>750GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Seagate</td><td>500G</td><td>64</td><td>0.1280</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Seagate</td><td>320G</td><td>53</td><td>0.1656</td><td>320GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Seagate</td><td>500G</td><td>56</td><td>0.1120</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSHD</td><td>SATA?</td><td>5400?</td><td>Seagate</td><td>500G</td><td>85</td><td>0.1700</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSHD</td><td>SATA?</td><td>5400?</td><td>Seagate</td><td>1TB</td><td>129</td><td>0.1290</td><td>1000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Mybook Essential</td><td>2TB</td><td>139</td><td>0.0695</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Mybook Essential</td><td>3TB</td><td>149</td><td>0.0497</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Mybook Essential</td><td>4TB</td><td>209</td><td>0.0522</td><td>4000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Element</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Element</td><td>2TB</td><td>149</td><td>0.0745</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Element</td><td>3TB</td><td>129</td><td>0.0430</td><td>3000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport</td><td>1.5TB</td><td>129</td><td>0.0860</td><td>1500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport</td><td>2TB</td><td>159</td><td>0.0795</td><td>2000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport Ultra</td><td>500G</td><td>74</td><td>0.1480</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport Ultra</td><td>1TB</td><td>104</td><td>0.1040</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport Ultra</td><td>2TB</td><td>165</td><td>0.0825</td><td>2000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>500G</td><td>62</td><td>0.1240</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>1TB</td><td>86</td><td>0.0860</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>1.5TB</td><td>115</td><td>0.0767</td><td>1500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>2TB</td><td>149</td><td>0.0745</td><td>2000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Seagate BackUp Plus</td><td>500G</td><td>88</td><td>0.1760</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Seagate BackUp Plus</td><td>1TB</td><td>99</td><td>0.0990</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Seagate Expansion</td><td>500G</td><td>69</td><td>0.1380</td><td>500GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate Expansion</td><td>2TB</td><td>95</td><td>0.0475</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate Expansion</td><td>3TB</td><td>139</td><td>0.0463</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate BackUp Plus</td><td>2TB</td><td>115</td><td>0.0575</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate BackUp Plus</td><td>3TB</td><td>159</td><td>0.0530</td><td>3000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB?.0</td><td>5400?</td><td>Hitachi HGST Touro Mobile</td><td>500G</td><td>56</td><td>0.1120</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB?.0</td><td>5400?</td><td>Hitachi HGST Touro Mobile</td><td>1TB</td><td>80</td><td>0.0800</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB?.0</td><td>5400?</td><td>Hitachi HGST Touro Pro</td><td>1TB</td><td>92</td><td>0.0920</td><td>1000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>60G</td><td>65</td><td>1.0833</td><td>60GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>120G</td><td>88</td><td>0.7333</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>240G</td><td>159</td><td>0.6625</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>480G</td><td>329</td><td>0.6854</td><td>480GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston HyperX</td><td>120G</td><td>105</td><td>0.8750</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston HyperX</td><td>240G</td><td>199</td><td>0.8292</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston SMS200s3</td><td>60G</td><td>76</td><td>1.2667</td><td>60GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston SMS200s3</td><td>120G</td><td>119</td><td>0.9917</td><td>120GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>OCZ Vertec 450</td><td>128G</td><td>97</td><td>0.7578</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Seagate 600</td><td>120G</td><td>99</td><td>0.8250</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Seagate 600</td><td>240G</td><td>159</td><td>0.6625</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Seagate 600</td><td>480G</td><td>299</td><td>0.6229</td><td>480GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Intel 530</td><td>120G</td><td>130</td><td>1.0833</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Intel 530</td><td>240G</td><td>242</td><td>1.0083</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Intel 525</td><td>120G</td><td>139</td><td>1.1583</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 730</td><td>240G</td><td>285</td><td>1.1875</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 530</td><td>120G</td><td>115</td><td>0.9583</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 530</td><td>180G</td><td>184</td><td>1.0222</td><td>180GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 530</td><td>240G</td><td>209</td><td>0.8708</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 520</td><td>120G</td><td>104</td><td>0.8667</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel S3500</td><td>120G</td><td>164</td><td>1.3667</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel S3500</td><td>160G</td><td>209</td><td>1.3062</td><td>160GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel S3500</td><td>240G</td><td>288</td><td>1.2000</td><td>240GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5S</td><td>256G</td><td>168</td><td>0.6562</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5-PRO</td><td>128G</td><td>135</td><td>1.0547</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5-PRO</td><td>256G</td><td>179</td><td>0.6992</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5-PRO</td><td>512G</td><td>329</td><td>0.6426</td><td>512GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>64G</td><td>69</td><td>1.0781</td><td>64GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>128G</td><td>99</td><td>0.7734</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>256G</td><td>170</td><td>0.6641</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>512G</td><td>383</td><td>0.7480</td><td>512GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Ultra Plus</td><td>64G</td><td>75</td><td>1.1719</td><td>64GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Ultra Plus</td><td>128G</td><td>89</td><td>0.6953</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Ultra Plus</td><td>256G</td><td>157</td><td>0.6133</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme</td><td>240G</td><td>185</td><td>0.7708</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme II</td><td>120G</td><td>118</td><td>0.9833</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme II</td><td>240G</td><td>195</td><td>0.8125</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme II</td><td>480G</td><td>379</td><td>0.7896</td><td>480GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Samsung 840 EVO</td><td>120G</td><td>105</td><td>0.8750</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Samsung 840 EVO</td><td>250G</td><td>182</td><td>0.7280</td><td>250GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>120G</td><td>95</td><td>0.7917</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>250G</td><td>170</td><td>0.6800</td><td>250GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>500G</td><td>329</td><td>0.6580</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>1TB</td><td>589</td><td>0.5890</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 PRO</td><td>128G</td><td>138</td><td>1.0781</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 PRO</td><td>256G</td><td>232</td><td>0.9062</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 PRO</td><td>512G</td><td>439</td><td>0.8574</td><td>512GB</td></tr></tbody> <caption align="bottom"><b>MSY SOHO Retail Disk Prices</b></caption></table>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-79565166538112183522014-05-03T20:42:00.000-07:002014-05-03T20:42:20.998-07:00Retail disk prices, sorted.Table of current retail prices for various types of disk, sorted on cost-per-GB.<br />Disclaimer: <i>This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.</i><br /><br />3.5" Internal drives are the cheapest $/GB, ranging from 4.3 cents/GB to 10-11 cents/GB. Generally, larger drives have cheaper $/GB. Higher spec drives, suitable for high duty-cycle applications, are more expensive. This retailer doesn't sell 10K or SAS drives.<br /><br />It's not possible to track 3.5" drives from Internal to External to arrive at a cost of packaging.<br /><br />2.5" Internal drives range 8 to 16.5 cents/GB, generally higher than 3.5" drive costs. There seems to be little extra cost of packaging for external drives. There is a small premium in consumer drives for 7200RPM. This retailer only sells 2TB drives (15mm vs 9.5mm?) as external drives.<br /><br />There was no information in the retailers rather compact format on the thickness (5mm, 7mm, 9.5mm, 12.5mm, 15mm) of 2.5" drives.<br /><br />Solid State Disks are 5+ times more expensive than Hard Disk Drives, at 59 cents/GB to $1.37/GB.<br />The smaller mSATA drives start at 72.8 cents/GB.<br />No supplier information on SSD specs are included: SLC/MLC, transfer rates, IO/sec and number of write cycles. <i>SSD's are <b>very</b> sensitive to wear and device selection requires very careful reading of device specifications.</i><br /><br /><br /><table frame="box"><colgroup><col span="8"></col><col span="1" style="background-color: lightgrey;"></col><col span="1"></col></colgroup> <thead><tr><th></th><th></th><th></th><th></th><th></th><th align="left">01-May-2014</th></tr><tr><th></th><th></th><th></th><th colspan="3"><a href="http://www.msy.com.au/Parts/PARTS.pdf">http://www.msy.com.au/Parts/PARTS.pdf</a></th></tr><tr><th colspan="10"><hr /></th></tr><tr><th>F-Fac</th><th>Typ</th><th>Dsk</th><th>Conn</th><th>RPM</th><th>Brand/Model</th><th>Cap</th><th>Cost</th><th>$/GB</th><th>GB</th></tr><tr><th colspan="10"><hr /></th></tr></thead><tbody><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>3TB</td><td>129</td><td>0.0430</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>3TB</td><td>129</td><td>0.0430</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>4TB</td><td>185</td><td>0.0462</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>4TB</td><td>189</td><td>0.0473</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>2TB</td><td>95</td><td>0.0475</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>2TB</td><td>95</td><td>0.0475</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate NAS</td><td>3TB</td><td>160</td><td>0.0533</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>3TB</td><td>165</td><td>0.0550</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate NAS</td><td>4TB</td><td>229</td><td>0.0573</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>4TB</td><td>235</td><td>0.0587</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>3TB</td><td>179</td><td>0.0597</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200?</td><td>Hitachi HGST NAS</td><td>3TB</td><td>179</td><td>0.0597</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200?</td><td>Hitachi HGST NAS</td><td>4TB</td><td>249</td><td>0.0622</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate NAS</td><td>2TB</td><td>125</td><td>0.0625</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Green EZRX</td><td>1TB</td><td>64</td><td>0.0640</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>2TB</td><td>129</td><td>0.0645</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>4TB</td><td>259</td><td>0.0648</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>1TB</td><td>65</td><td>0.0650</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>2TB</td><td>135</td><td>0.0675</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Red NAS EFRX</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA2</td><td>7200?</td><td>Hitachi HGST UltraStar</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Hitachi HGST</td><td>3TB</td><td>270</td><td>0.0900</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Hitachi HGST</td><td>4TB</td><td>365</td><td>0.0912</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Hitachi HGST</td><td>2TB</td><td>185</td><td>0.0925</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>WD Purple PURX Surveillance</td><td>1TB</td><td>95</td><td>0.0950</td><td>1000GB</td></tr><tr><td>3.5"</td><td>Int</td><td>HDD</td><td>SATA3</td><td>7200?</td><td>Seagate</td><td>500G</td><td>55</td><td>0.1100</td><td>500GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Element</td><td>3TB</td><td>129</td><td>0.0430</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate Expansion</td><td>3TB</td><td>139</td><td>0.0463</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate Expansion</td><td>2TB</td><td>95</td><td>0.0475</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Mybook Essential</td><td>3TB</td><td>149</td><td>0.0497</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Mybook Essential</td><td>4TB</td><td>209</td><td>0.0522</td><td>4000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate BackUp Plus</td><td>3TB</td><td>159</td><td>0.0530</td><td>3000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>Seagate BackUp Plus</td><td>2TB</td><td>115</td><td>0.0575</td><td>2000GB</td></tr><tr><td>3.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>7200?</td><td>WD Mybook Essential</td><td>2TB</td><td>139</td><td>0.0695</td><td>2000GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>1TB</td><td>80</td><td>0.0800</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD JPVX</td><td>1TB</td><td>83</td><td>0.0830</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD BPVX</td><td>750G</td><td>64</td><td>0.0853</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>750G</td><td>66</td><td>0.0880</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>1.5TB</td><td>139</td><td>0.0927</td><td>1500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Hitachi HGST</td><td>1TB</td><td>93</td><td>0.0930</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>WD BPKX</td><td>750G</td><td>78</td><td>0.1040</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>500G</td><td>55</td><td>0.1100</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Seagate</td><td>500G</td><td>56</td><td>0.1120</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Hitachi HGST</td><td>750G</td><td>85</td><td>0.1133</td><td>750GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD LPVX</td><td>500G</td><td>57</td><td>0.1140</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Hitachi HGST</td><td>500G</td><td>64</td><td>0.1280</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>Seagate</td><td>500G</td><td>64</td><td>0.1280</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>7200</td><td>WD BPKX</td><td>500G</td><td>67</td><td>0.1340</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Hitachi HGST</td><td>320G</td><td>53</td><td>0.1656</td><td>320GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>WD LPVX</td><td>320G</td><td>53</td><td>0.1656</td><td>320GB</td></tr><tr><td>2.5"</td><td>Int</td><td>HDD</td><td>SATA?</td><td>5400</td><td>Seagate</td><td>320G</td><td>53</td><td>0.1656</td><td>320GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Element</td><td>2TB</td><td>149</td><td>0.0745</td><td>2000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>2TB</td><td>149</td><td>0.0745</td><td>2000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>1.5TB</td><td>115</td><td>0.0767</td><td>1500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport</td><td>2TB</td><td>159</td><td>0.0795</td><td>2000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB?.0</td><td>5400?</td><td>Hitachi HGST Touro Mobile</td><td>1TB</td><td>80</td><td>0.0800</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport Ultra</td><td>2TB</td><td>165</td><td>0.0825</td><td>2000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport</td><td>1.5TB</td><td>129</td><td>0.0860</td><td>1500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>1TB</td><td>86</td><td>0.0860</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Element</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport</td><td>1TB</td><td>89</td><td>0.0890</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB?.0</td><td>5400?</td><td>Hitachi HGST Touro Pro</td><td>1TB</td><td>92</td><td>0.0920</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Seagate BackUp Plus</td><td>1TB</td><td>99</td><td>0.0990</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport Ultra</td><td>1TB</td><td>104</td><td>0.1040</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB?.0</td><td>5400?</td><td>Hitachi HGST Touro Mobile</td><td>500G</td><td>56</td><td>0.1120</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Samsung</td><td>500G</td><td>62</td><td>0.1240</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Seagate Expansion</td><td>500G</td><td>69</td><td>0.1380</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>WD Passport Ultra</td><td>500G</td><td>74</td><td>0.1480</td><td>500GB</td></tr><tr><td>2.5"</td><td>Ext</td><td>HDD</td><td>USB3.0</td><td>5400?</td><td>Seagate BackUp Plus</td><td>500G</td><td>88</td><td>0.1760</td><td>500GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>1TB</td><td>589</td><td>0.5890</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Ultra Plus</td><td>256G</td><td>157</td><td>0.6133</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Seagate 600</td><td>480G</td><td>299</td><td>0.6229</td><td>480GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5-PRO</td><td>512G</td><td>329</td><td>0.6426</td><td>512GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5S</td><td>256G</td><td>168</td><td>0.6562</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>500G</td><td>329</td><td>0.6580</td><td>500GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>240G</td><td>159</td><td>0.6625</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Seagate 600</td><td>240G</td><td>159</td><td>0.6625</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>256G</td><td>170</td><td>0.6641</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>250G</td><td>170</td><td>0.6800</td><td>250GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>480G</td><td>329</td><td>0.6854</td><td>480GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Ultra Plus</td><td>128G</td><td>89</td><td>0.6953</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5-PRO</td><td>256G</td><td>179</td><td>0.6992</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Samsung 840 EVO</td><td>250G</td><td>182</td><td>0.7280</td><td>250GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>120G</td><td>88</td><td>0.7333</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>512G</td><td>383</td><td>0.7480</td><td>512GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>OCZ Vertec 450</td><td>128G</td><td>97</td><td>0.7578</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme</td><td>240G</td><td>185</td><td>0.7708</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>128G</td><td>99</td><td>0.7734</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme II</td><td>480G</td><td>379</td><td>0.7896</td><td>480GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 EVO</td><td>120G</td><td>95</td><td>0.7917</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme II</td><td>240G</td><td>195</td><td>0.8125</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Seagate 600</td><td>120G</td><td>99</td><td>0.8250</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston HyperX</td><td>240G</td><td>199</td><td>0.8292</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 PRO</td><td>512G</td><td>439</td><td>0.8574</td><td>512GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 520</td><td>120G</td><td>104</td><td>0.8667</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 530</td><td>240G</td><td>209</td><td>0.8708</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston HyperX</td><td>120G</td><td>105</td><td>0.8750</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Samsung 840 EVO</td><td>120G</td><td>105</td><td>0.8750</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 PRO</td><td>256G</td><td>232</td><td>0.9062</td><td>256GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 530</td><td>120G</td><td>115</td><td>0.9583</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Extreme II</td><td>120G</td><td>118</td><td>0.9833</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston SMS200s3</td><td>120G</td><td>119</td><td>0.9917</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Intel 530</td><td>240G</td><td>242</td><td>1.0083</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 530</td><td>180G</td><td>184</td><td>1.0222</td><td>180GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Plextor M5-PRO</td><td>128G</td><td>135</td><td>1.0547</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Fujitsu</td><td>64G</td><td>69</td><td>1.0781</td><td>64GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA3</td><td>-</td><td>Samsung 840 PRO</td><td>128G</td><td>138</td><td>1.0781</td><td>128GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston V300</td><td>60G</td><td>65</td><td>1.0833</td><td>60GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Intel 530</td><td>120G</td><td>130</td><td>1.0833</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>mSATA3</td><td>-</td><td>Intel 525</td><td>120G</td><td>139</td><td>1.1583</td><td>120GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>SanDisk Ultra Plus</td><td>64G</td><td>75</td><td>1.1719</td><td>64GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel 730</td><td>240G</td><td>285</td><td>1.1875</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel S3500</td><td>240G</td><td>288</td><td>1.2000</td><td>240GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Kingston SMS200s3</td><td>60G</td><td>76</td><td>1.2667</td><td>60GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel S3500</td><td>160G</td><td>209</td><td>1.3062</td><td>160GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSD</td><td>SATA?</td><td>-</td><td>Intel S3500</td><td>120G</td><td>164</td><td>1.3667</td><td>120GB</td></tr><tr><td colspan="10"><hr /></td></tr><tr><td>2.5"</td><td>Int</td><td>SSHD</td><td>SATA?</td><td>5400?</td><td>Seagate</td><td>1TB</td><td>129</td><td>0.1290</td><td>1000GB</td></tr><tr><td>2.5"</td><td>Int</td><td>SSHD</td><td>SATA?</td><td>5400?</td><td>Seagate</td><td>500G</td><td>85</td><td>0.1700</td><td>500GB</td></tr></tbody> <caption align="bottom"><b>Retail Disk Prices</b></caption></table>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-75697976694562886542014-04-22T16:37:00.000-07:002014-04-22T16:37:33.857-07:00RAID++: RAID-0+ECC<a href="http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf">Current RAID schemes</a>, and going back to the 1987/8 <a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf">Patterson, Gibson, Katz RAID paper</a>, make no distinction between transient and permanent failures: errors or dropouts versus failure.<br /><a name='more'></a><br /><br />This is a faulty assumptions in the <a href="http://queue.acm.org/detail.cfm?id=1670144">2009 "Triple Parity RAID" </a> piece by Leventhal, an author of ZFS, a proven production quality sub-system, and not just a commentator like myself.<br /><br />The second major error is relying on the <a href="http://www.scientificamerican.com/article/kryders-law/">2005 "Kryder's Law"</a> 40%/year growth in capacity. Capacity growth had collapsed by 2009 and will probably not even meet 10%/year from now on, because <a href="http://stevej-on-it.blogspot.com/2014/03/the-new-disruption-in-computing.html">the market can't fund the new fabrication plants needed</a>.<br /><br />The last major error by Leventhal was that 3.5 inch disks would remain the preferred Enterprise Storage. <a href="http://stevej-on-it.blogspot.com/2014/03/the-new-disruption-in-computing.html">2.5 inch drives are the only logical long-term choice of form-factor</a>.<br /><br />This is a pretty simple observation:<br /><blockquote class="tr_bq">Long-length error mitigation and correction techniques are well known and used in CD/CD-ROM and later DVD/Bluray. The same long-range techniques could be applied on top of the ECC techniques already used by drive controller electronics.</blockquote>With the complex geometry and techniques used in current drives, we can't know tracks or precisely the raw transfer rate. How long on a disk is a 4kB block? The current 0.7Gbit/square-inch is around 850,000 bits/inch, making a 4kB (32,000 bit) block around 1mm long. But that's wrong, disks are much more complex than that these days.<br /><br />What we can reliably deal with is transfer rates.<br />At 1Gbps, a reasonable current rate, a 4kB block transfers in ~30 micro-seconds, or 32,000 blocks/second.<br /><br />64kB or 128kB reads take 0.5 & 1.0 milliseconds. In the context of 8-12msec per revolution, this is a reasonable overhead.<br /><br />If you're taking the trouble to detect and correct up to 4kB in a read (more is possible with higher overhead), then you need to also impose higher level error checking: MD5 or SHA1 fingerprints.<br /><br />Storing raw data on disks could be done using an error correcting technique based on Erasure Codes (Galois Fields), in super-blocks.<br /><br />For data to be portable, standards are needed. This needs:<br /><br /><ul><li>a tag for the record type</li><li>a small number of "chunk" sizes, 64kB and 128kB are already used in some file systems and SSD blocks</li><li>the convolution data</li><li>a high-level checksum, either MD5 or SHA1, or the raw data.</li></ul><div>These chunks can be converted at the drive or in a controller. The MD5/SHA1, once computed, should travel with the data over the whole data path, back to its point of consumption.</div><div><br /></div><div>I'd like to be able to make more specifications on how chunks are organised, but a first cut would be the 3-tier scheme of CD-ROM's. Test implementations could start with the 2kB blocks of CD's, the code is well known, free and well tested.</div><div><br /></div><div>Doing this and adopting 2.5" drives will go a long way to avoiding the Error Meltdown predicted by Leventhal. It also provides a very good platform for increasing the performance of other RAID levels.</div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-82374257427967871062014-04-21T20:11:00.001-07:002014-04-22T06:30:39.248-07:00Storage: Spares and Parity in large disk collectionsWhat approaches are available to deal with spare drives and RAID parity for 300-1,000 drives in a single box?<br />Will existing models scale well?<br />Do other technologies fill any gaps?<br /><a name='more'></a>There are three main variants to holding RAID Parity and four variants for spare drives in a Protected Data solution. JBOD solutions with no Data Protection are outside the scope of this piece.<br /><br />The meta-solution of dual- or triple-stores needs no spares and no provision for failed drives, just per-drive error correction.<br /><br />There are three general architectures used here as the context for organising large sets of disks:<br /><ul><li>single controller, one or more RAID groups (a 'backblaze' capacity-optimised configuration)</li><li>mid-scale, single main controller, multiple (single-ported) embedded RAID controllers with internal access fabric, non-switching</li><li>high-end, dual main controllers, multiple dual-ported embedded RAID controllers, switching access fabric.</li></ul><div>Spare drives & RAID parity can be avoided by using RAID 1, either at the drive or RAID-group level.</div><div>RAID 1 allows multiple replicas. For some applications, higher streaming and IO/sec can be supported by using large counts of replicas, up to the entire drive set. For these arrangements, the block mapping of logical to physical blocks on individual drives can be varied, placing a set of blocks in the outer ring of that drive giving better access times ('short-stroking'). The block scheduler can preferentially direct reads of those blocks to those drives.</div><div><b><br /></b></div><div><b>Drive to embedded RAID controller mapping.</b></div><div>Between 4 and 12 2.5" drives mounted along a carrier, or Sled. Carriers may be single or double sided.</div><div><ul><li>There may be a 1:1 or 1:M connection of carriers to an embedded RAID controller, allowing the RAID controller chip to be mounted on the carrier with the drives and share a common power supply. This is the simplest arrangement electrically, with fewest removable connectors, or</li><li>orthogonal mounting of drives and RAID controllers, allowing larger RAID groups and reducing count of secondary controllers:</li><ul><li>the 'Nth' drive of each in a set of carriers are connected to the same RAID controller</li><li>A single carrier can be removed or have a common mode failure during which the RAID controller will compensate for with reconstructed parity data.</li><li>This requires many removable connectors, increasing sources of failure & errors.</li></ul></ul></div><div><b>Spare drives</b> can be:</div><div><ul><li>No hot spares. "Break/Fix" replacement. Increased chance of Data Loss events. [need to quantify]</li><li>1 or more global hot-spares managed by main controller.</li><li>1 hot-spare per embedded RAID controller</li><li>complete RAID-set spares.</li><ul><li>When a single drive fails in a RAID group, all data is streamed to a new, unused RAID set.</li><li>The old RAID-group can be rebuilt as a single smaller RAID-group without the failed drive.</li></ul></ul></div><div><b>RAID Parity</b>:</div><div><ul><li>RAID parity can be stored entirely within a RAID-group, managed by a single controller.</li><ul><li>With an orthogonal drive/controller mapping, larger RAID-groups are possible.</li></ul><li>RAID parity can be stored across controllers in large RAID groups, requiring inter-controller traffic for all operations, or even</li><ul><li>One subset of this is for RAID 3/4, collecting all parity devices onto specially allocated</li></ul><li>RAID parity can be stored on ancillary devices, SSD's or even in battery-backed DRAM.</li></ul></div><br /><hr /><br /><b>Recent posts:</b><br /><a href="http://stevej-lab-notes.blogspot.com.au/2014/04/storage-first-look-at-hardware-block.html">http://stevej-lab-notes.blogspot.com.au/2014/04/storage-first-look-at-hardware-block.html</a><br /><a href="http://stevej-lab-notes.blogspot.com.au/2014/04/storage-challenges-of-high-count-disk.html">http://stevej-lab-notes.blogspot.com.au/2014/04/storage-challenges-of-high-count-disk.html</a><br /><a href="http://stevej-lab-notes.blogspot.com.au/2014/04/storage-how-many-drives-can-be-stuffed.html">http://stevej-lab-notes.blogspot.com.au/2014/04/storage-how-many-drives-can-be-stuffed.html</a>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-54196199684328974952014-04-21T01:53:00.001-07:002014-04-21T01:53:50.992-07:00Storage: First look at Hardware block diagram<a href="http://stevej-lab-notes.blogspot.com.au/2014/04/storage-challenges-of-high-count-disk.html">Stuffing 500-1000 2.5" drives in an enclosure</a> is just the start of a design adventure.<br /><br />The simplest being choosing fixed or hot-plug drive mounting. There's a neat slide-out tray system for 3.5" drives that allows hot-plug access for densely vertically packed drives that could be adapted to 2.5" drives.<br /><a name='more'></a><br /><br />There seem to be roughly three hardware approaches, all of which have different performance goals, different MTDL intents and :<br /><br /><ul><li>Single board, <a href="http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/">'back blaze'</a> style. Capacity-optimised, low external bandwidth.</li><li>High Availability, dual motherboard, dual-ported local Embedded RAID Controllers, zero contention internal fabric, high performance external interfaces, additional Flash memory, NVRAM memory</li><li>Mid-scale server. Single board, fast external interfaces, Embedded RAID Controllers, low contention internal fabric.</li></ul><div><b>Low-end, Capacity Focused, low bandwidth external interface.</b></div><div><br /></div><div>The <a href="http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/">Backblaze Storage Pod 4.0</a> has moved away from SATA Port Multiplexers to 40-port SAS cards (HighPoint Rocket 750's), reducing system cost and increasing system throughput, though they are still limited to 1Gbps external interfaces. They could invest $500 in dual-port twisted-pair 10Gbps NIC's, though this would require matching Top of Rack switches and higher capacity datacentre links, switches and routers. Their product offering is capacity, not performance, based over the wider Internet.</div><div><br /></div><div>The advantage to such a system is everything is done in software at the one place.</div><div>The load on sub-systems is limited due to 1Gbps external interface.</div><div><br /></div><div>The HDD's can be organised as single or multiple RAID groups, in differing RAID levels, as desired.</div><div><br /></div><div>Backblaze don't seem to run hot-spares nor hot-plug drives. They have a "Break/Fix" maintenance schedule, costing around 15 mins ($15-$25) per time, including a presumed device outage, which doesn't give them a TCO benefit buying Enterprise drives. It's unclear if they run a daily or weekly replacement schedule.</div><div><br /></div><div>Even though Backblaze do limit external bandwidth to approximately one drive performance, they still benefit from high-performance internal fabric: for data scrubbing, reconfiguration, data migration and RAID rebuild.</div><div><br /></div><div>Of the ~$10,000 Backblaze pay per box of 180TB, around 60% is spent directly on drives.</div><div>Backblaze build 25 units/month, saving 10%-30% on a one-off price.</div><div><br /></div><div><b>High-End, multi-redundant systems</b></div><div><br /></div><div>These are going to be built from high-spec items, with hot-pluggable everything and high-speed external interfaces. Expect "exotic" solutions and custom-built components and subsystems. Under half the cost of the system will be in disk drives.</div><div><br /></div><div>In the light of this cost structure, high-end systems can afford many more spare drives, even providing enough for a full lifetime's projected failures. This over-provisioning would only marginally increase retail price.</div><div><br /></div><div>For a single random I/O optimised enclosure (1,000x5mm drives), achieving 250k IO/sec with 4kB blocks, 1GB/sec minimum is required (10Gbps) for the external interface. With any degree of streaming IO, 40Gbps will easily be reached.</div><div><br /></div><div>When performance is being optimised, zero contention internal links are required. Multiple 10Gbps schemes are available, allowing 8-10 drives to share a single link. This suggests a tiered internal fabric with a switching capability, especially if dual controllers are expected to access all disks.</div><div><br /></div><div>Considering the possible streaming performance of 300 and 500 drives around 1Gbps each, suggests 100Gbps external interfaces to be a minimum. This makes for a very expensive datacentre infrastructure, but that the price of high performance.</div><div><br /></div><div><b>Mid-scale systems</b></div><div><br /></div><div>Building fast systems with good redundancy and low contention fabrics while using mostly commodity parts will be a point of differentiation between solution providers.</div><div><br /></div><div>There is a lot of scope for novel and innovative designs for tiered storage and low contention internal fabrics.</div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-45296706431343578842014-04-20T22:47:00.001-07:002014-04-21T01:35:15.112-07:00Storage: Challenges of high-count disk enclosures<a href="http://stevej-lab-notes.blogspot.com.au/2014/04/storage-how-many-drives-can-be-stuffed.html">Stuffing 500-1,000 2.5" drives in a single enclosure</a> may be technically possible, but how do you make those drives do anything useful?<br /><br />Increasing drives per enclosure from 15-45 for 3.5" drives to 1,000 requires a deep rethink of target market, goals and design.<br /><br />Not the least is dealing drive failures. With an <a href="http://knowledge.seagate.com/articles/en_US/FAQ/174791en?language=en_US">Annualised Failure Rate (AFR) </a>of 0.4%-0.75% now quoted by Drive Vendors, dealing with 5-15 drive failures per unit, per year is a given. In practice, failure rates are at least twice the Vendor quoted AFR not the least because in <i>systems</i>, conditions can be harsh and other components/connectors also fail, not just drives. Drives have a design life of 5 years, with an expected duty-cycle. Consumer-grade drives aren't expected to run 24/7 like the more expensive enterprise drives. Fail Rates over time, when measured on large fleets in service, increase over time and considerably towards end of life.<br /><br />It's isn't enough to say "we're trying to minimise per unit costs", all designs do that, but for different criteria.<br />What matters is the constraints you're working against or parameters being optimised.<br /><a name='more'></a>Competing design & operational dimensions start with:<br /><ul><li>Cost (CapEx, OpEx and TCO),</li><li>Performance,</li><li>Durability, and</li><li>Scalability</li></ul>'Performance' is a multi-faceted dimension with many meanings. It's broken into Internal and External at the highest level.<br />What type and speed of External interface is needed? SAS, Fibre Channel, Infiniband, or Ethernet?<br />If Ethernet, 1Gbps, 10Gbps or 40-100Gbps? The infrastructure costs of high-bandwidth external interfaces goes far beyond the price of the NIC's: patch-cables, switches, routers in the data-centre and more widely are affected.<br />Does the Internal fabric designed to be congestion-free with zero contention access to all drives, or does it meet another criteria?<br /><br />There are at least 4 competing parameters that can be optimised: <br /><ul><li>raw capacity (TB)</li><li>(random) IO/second, both read and write.</li><li>streaming throughput</li><li>Data Protection as RAID resilience</li></ul><div>The hardware architecture can be High Availability (many redundant and hot-plug sub-systems), simple or complex, single layer or multi-layer and with or without customised components.<br /><br />Close attention to design detail is needed to reduce per-IO latency to microseconds, necessary for high random IO/sec. 1,000x 5400 RPM HDD's can support 180k IO/sec, or around 6 microseconds per IO.<br /><br />Marketing concerns, not engineering, mostly drive these considerations.<br />This forces a discipline on the engineering design team: to know exactly their target market, what they need, what they value and what price/capability trade-offs they'll wear.<br /><br /><b>RAID resilience</b><br /><br />This falls broadly into three areas:<br /><ul><li>Spare policy and overhead,</li><li>Parity block organisation overhead, and</li><li>Rebuild time and performance degradation during rebuilds.</li></ul><div>Different choices all affect different aspects of performance:</div><div><ul><li>RAID-5 reduces write-performance by a factor of 3 for streaming and random IO.</li><li>while RAID-6 burns CPU's in the Galois Field calculations needed for the 'Q' parity blocks,</li><li>RAID-1 and RAID-10 are simple, low-CPU solutions, but cost more in capacity and don't offer protection against all dual-drive failures.</li></ul></div><div>The essential calculation is the likelihood of specific types of failure creating a Data Loss event: Mean Time to Data Loss. </div></div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-22976685886493690462014-04-20T21:59:00.000-07:002014-04-20T22:09:02.097-07:00Storage: How many drives can be stuffed in a Box?How many 2.5" drives can be stuffed into a single enclosure, allowing space for power, cooling, wiring and a single motherboard? <i>Short answer</i>: ~500-1000.<br /><br /><a name='more'></a>This assumes drives mounted vertically, thin side forward, needing 105-110mm vertically and ~75mm deep per drive. Drives are mounted on a carrier, secured in a slide, vertically is assumed.<br /><br />Drives can be mounted in a single row on a carrier, or dual rows of drives per-carrier, back-to-back. This may have advantages in construction and widening air-flow gaps. Testing needed to show if vibration isolation needed.<br /><br />Extra drives can be mounted horizontally in the free space. As well, drive carriers can be oriented alternately horizontally and vertically, lowering coupling of vibration between adjacent groups.<br /><br />Drives will fit in a 3RU (133mm) enclosure, requiring space behind the drives for PSU's, fans and motherboard, or in 4RU (177mm), allowing 1.5RU underneath drives for electronics & wiring.<br /><br />In a 42RU high rack, there is space for 14x3RU enclosures or 10x4RU enclosures.<br />These calculations ignore limitations of weight and power imposed by floor loading and racks.<br /> 9000 drives at 0.1kg each is 900kg.<br />At 1W/drive, 9kW power used.<br /><br />Count depends on:<br /><ul><li>The thickness (5mm-15mm) of drives used: from 76 to 25, with 40 for maximum platters.</li><li>How deep drives are stacked. @ 75mm/drive this is:</li><ul><li>450mm = 6 drives deep</li><li>600mm = 8 deep [in 3RU, allows electronics behind]</li><li>900mm = 12 deep [in 4RU, forces electronics underneath]</li></ul></ul>Totals are:<br /><ul><li>3RU, Max HDD (5mm) = 8 x 76 = 608/enclosure [300TB] = 8,512/rack @ 0.5TB/HDD</li><li>4RU, Max HDD (5mm) = 12 x 76 = 912/enclosure [450TB] = 9,120/rack</li></ul><ul><li>3RU, Max Platters (9.5mm) = 8 x 40 = 320/enclosure [320TB] = 4,480/rack @ 1TB/HDD</li><li>4RU, Max Platters (9.5mm) = 12 x 40 = 480/enclosure [480TB] = 4,800/rack</li></ul><ul><li>3RU, Max Cap drive (15mm) = 8 x 25 = 200/enclosure [400TB] = 2,800/rack @ 2TB/HDD</li><li>4RU, Max Cap drive (15mm) = 12 x 25 = 300/enclosure [600TB] = 3,000/rack</li></ul>Previous pieces on Storage with 2.5" drives:<br /><ul><li><a href="http://stevej-lab-notes.blogspot.com.au/2014/03/storage-more-capacity-calculations.html">http://stevej-lab-notes.blogspot.com.au/2014/03/storage-more-capacity-calculations.html</a></li><li><a href="http://stevej-lab-notes.blogspot.com.au/2014/03/storage-efficiency-measures.html">http://stevej-lab-notes.blogspot.com.au/2014/03/storage-efficiency-measures.html</a></li></ul><br /><hr /><b>Table of 2.5" Platter equivalent across 19" Rack</b><br /><br /><b>Rack Width: </b>435mm (17.125in, allows for sliders)<br /><b>Interdrive space</b> (cooling): 52.5mm<br /><b>Usable space</b>: 382.5mm<br /><b>1 Rack Unit</b>: 1.75 in or 44.45mm (Clearance of 0.5mm = 44 mm usable)<br /><b>3 Rack Unit</b>: 5.25 in or 133.35mm (Clearance of 0.5mm = 132.75 mm usable)<br /><div><b>4 Rack Unit</b>: 7.00 in or 177.80mm (Clearance of 0.5mm = 177 mm usable)</div><div><br /></div><div><br /></div><table><thead><tr><th>Drive<br />thickness</th><th>Count<br />across<br />Rack</th><th>Platters</th></tr></thead> <tbody><tr><td>15mm</td><td>25</td><td>100</td></tr><tr><td>9.5mm</td><td>40</td><td>120</td></tr><tr><td>7mm</td><td>54</td><td>108</td></tr><tr><td>5mm</td><td>76</td><td>76</td></tr></tbody> </table><br /><hr />Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-5426324345812516272014-03-23T22:23:00.001-07:002014-03-23T22:23:13.134-07:00Storage: more capacity calculationsFollowing on from t<a href="http://stevej-lab-notes.blogspot.com.au/2014/03/storage-efficiency-measures.html">he previous post on Efficiency and Capacity,</a> baselining "A pile of Disks" as "100% efficient".<br /><br />Some additional considerations:<br /><ul><li>Cooling.</li><ul><li>Drives can't be abutted, there must be a gap for cooling air to circulate.</li><li>Backblaze allow 15x25.4mm drives across a 17.125in chassis, taking around 12%, 52.5mm, of space for cooling.</li><ul><li>This figure is used to calculate a per-row capacity, below.</li></ul></ul></ul><ul><li>A good metric on capacity is "Equivalent 2.5 in drive platters", table below.</li><ul><li>9.5mmx2.5" drives with 3 platters, yield highest capacity.</li><li>Better than 5 platter, 3.5" drives.</li><li>Drive price-per-GB is still higher for 2.5" drives.</li><ul><li>This may change in time if new technologies, such as "Shingled Writes" are introduced into 2.5" drive fabrication lines and not into 3.5" lines.</li></ul></ul></ul><ul><li>Existing high density solutions, measured in "Drives per Rack Unit". SGI "Modular" units, which also support 4xCPU's per 4RU chassis, are the most dense storage current available.</li></ul><ul><ul><li>Backblaze achieve the lowest DIY cost/GB known:</li><ul><li>4RU, vertical orientation, 15 drives across, in 3 rows.</li><li>fixed drives.</li><li>45x3.5" drives = 11.25x3.5" drives per RU</li><li><b>450x3.5" drives per 42RU</b></li></ul></ul></ul><ul><ul><li>Supermicro 'cloud' server (below) achieves 12 drives, fixed, in 1RU</li><ul><li>12x3.5" drives per RU</li><li><b>504x3.5" drives per 42RU</b></li></ul></ul></ul><ul><ul><li>Supermicro High Availability server, supports 36x3.5" removable drives in 4RU</li><ul><li>9x3.5" drives per RU</li><li>360x3.5" drives per 42RU</li><li>An alternative 2.5" drive unit puts 24x2.5" removable drives in 2RU,</li><li>= 8.5x3.5" drives per RU</li><li><b>356 drives per 42RU</b></li></ul></ul></ul><ul><ul><li>Open Compute's Open Rack/Open Vault use of 21", not 19" racks still in 30" floor space, allows higher disk densities:</li><ul><li>30x3.5" drives in 2RU</li><li>15x3.5" drives per RU</li><li><b>630x3.5" drives per 42RU</b></li></ul></ul></ul><ul><ul><li>SGI Modular InfiniteStorage uses modules of 3x3 3.5" drives, 3x3 2.5" 15mm and 3x6 9.5mmx2.5" drives in a 4RU chassis. Drives are mounted vertically.</li><ul><li>Modules are accessible from front and rear.</li><li>All modules are accessible externally and are removable.</li><li>81x3.5"drives per extension case, 72x3.5" drives per main chassis, 4 expansion cases per main chassis.</li><li><b>720 to 792x3.5" drives per 42RU (same for 2.5" 15mm drives)</b></li><li><b>1140 to 1584x 2.5" 9.5mm drives per 42RU</b></li></ul></ul></ul><ul><ul><li>SGI/COPAN "MAID" uses "Patented Canisters" to store 14x3.5" drives back-back per canister. 8 canisters per 4RU drive shelf, 112x3.5" drives per shelf. <i>These devices no longer appear on the SGI website, though have featured in a Press Release.</i></li><ul><li>MAID attempts to reduce power consumption by limiting active drives to at most half installed drives.</li><li>Up to 8 shelves per 42RU unit.</li><li>Power, CPU's and </li><li>21.33x3.5" drives per RU (28x3.5" drives per RU per shelf)</li><li><b>896x3.5" drives per 42RU</b></li></ul></ul></ul><ul><ul><li>EMC Isilon S200, X200 Nodes [2011 figures] are 2RU units</li><ul><li>EMC support 144 Nodes per cluster</li><li>24x2.5" drives and 12x3.5" drives respectively</li><li>12x2.5" drives per RU and 6x3.5" drives per RU respectively</li><li><b>504x2.5" and 252x3.5" drives per 42RU</b></li><li>5.5 racks to support maximum 144 node cluster [unchecked for 2014 config]</li></ul></ul></ul><br /><a href="http://stevej-lab-notes.blogspot.com.au/2014/03/storage-efficiency-measures.html">In the previous piece,</a> I said there were just 4 'interesting' drive orientations of 6 possible, due to "flat plate" blocking of airflow.<br /><br />If you include a constraint for uninterrupted front-back airflow, there are only two good orientations:<br /><ul><li>the drive connectors, on the shortest side, have to be to one-side (bottom or left/right)</li><ul><li>vertical, thin-side forward, 100mm high x thickness (5mm-15mm) width</li><ul><li>allows many drives across the rack (table below)</li><li>stacked drives take 75mm depth. Allows 6 in 450mm. 900mm deep possible</li></ul><li>horizontal, thin-side forward, thickness (5mm-15mm) high x 100mm wide</li><ul><li>allows 4 drives across Rack</li><li>stack drives vertically with small separation.</li></ul></ul><li>drive connectors in-line with airflow will restrict it, eliminating horizontal & vertical end-to-end.</li></ul><br /><hr /><br /><b>Table of 2.5" Platter equivalent across 19" Rack</b><br /><br /><b>Rack Width: </b>435mm (17.125in, allows for sliders)<br /><b>Interdrive space</b> (cooling): 52.5mm<br /><b>Usable space</b>: 382.5mm<br /><br /><table><thead><tr><th>Drive<br />thickness</th><th>Count<br />across<br />Rack</th><th>Platters</th></tr></thead> <tbody><tr><td>25.4mm</td><td>15</td><td>106 (75 actual)</td></tr><tr><td>15mm</td><td>25</td><td>100</td></tr><tr><td>9.5mm</td><td>40</td><td>120</td></tr><tr><td>7mm</td><td>54</td><td>108</td></tr><tr><td>5mm</td><td>76</td><td>76</td></tr></tbody> </table><br /><hr /><br />Backblaze V4.<br /><br /><a href="http://www.zdnet.com/buy-a-180tb-array-for-6gb-7000027459/">http://www.zdnet.com/buy-a-180tb-array-for-6gb-7000027459/</a><br /><a href="http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/">http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/</a><br /><a href="http://www.highpoint-tech.com/USA_new/series_R750-Overview.htm">http://www.highpoint-tech.com/USA_new/series_R750-Overview.htm</a><br /><br />$688ea for the new SAS/SATA cards (to Backblaze in 100 Qty):<br /><br />"The Rocket 750's revolutionary HBA architecture allows each of the 10 Mini-SAS ports to support up to four SATA hard drives: A single Rocket 750 is capable of supporting up to 40 4TB 6Gb/s SATA disks,"<br /><br />$3,387.28 full chassis<br />$9,305 total 180TB [$131/drive, $5917.72 total]<br /><br />$5,403 ‘Storinator’ by Protocase.<br />$7,200 $160 per 4TB drive is $7,200 <br />$12,603 Protocase + drives<br /><br />Parts<br />$872.00 case<br />$355.99 PSU<br />~$360 motherboard, CPU, RAM<br />$1,376.40 SATA cards (2)<br /><br /><hr /><br />From SuperMicro: scale-out storage products already, mostly 3.5", but some 2.5"<br /><br /><a href="http://www.supermicro.com/products/rack/scale-out_storage.cfm">http://www.supermicro.com/products/rack/scale-out_storage.cfm</a><br />- 360 3.5” drives in 42RU. 4Ux36-bay SSG<br /><br />And for ‘hardoop’, they do a little denser.<br /><a href="http://www.supermicro.com/products/rack/hadoop.cfm">http://www.supermicro.com/products/rack/hadoop.cfm</a><br /><br />Supermicro have multiple innovative designs [below], with 9-12x3.5” drives/RU, 12x2.5” drives/RU and their microblande & microclould servers with proprietary motherboards & high-bandwidth.<br /><br />e.g. hardoop, 1RU, fixed:<br /><br />12x3.5” in a 1RU rack. 43mm x 437mm x 908mm (H, W, D)<br />- 2 full length columns (3 drives, fans, 2 drives)<br />- 1 short column (fans, 2 drives)<br />- PSU, m’board, Addon-card (PCI on riser) and front panel on one side of chassis<br />- AddOnCard w/ 8x LSI 2308 SAS2 ports and 4x SATA2/3 ports<br />- dual 1Gbps ethernet<br />- m’board 216mm x 330mm, LGA 1155/Socket H2, 4xDDR3 slots<br />- 650W<br /><a href="http://www.supermicro.com/products/system/1U/5017/SSG-5017R-iHDP.cfm?parts=SHOW">http://www.supermicro.com/products/system/1U/5017/SSG-5017R-iHDP.cfm?parts=SHOW</a><br /><a href="http://www.supermicro.com/products/motherboard/Xeon/C202_C204/X9SCFF-F.cfm">http://www.supermicro.com/products/motherboard/Xeon/C202_C204/X9SCFF-F.cfm</a><br /><a href="http://www.supermicro.com/products/accessories/addon/AOC-USAS2-L8i.cfm?TYP=I">http://www.supermicro.com/products/accessories/addon/AOC-USAS2-L8i.cfm?TYP=I</a><br /><br />Front and back Hot-swap, 4RU, 36 drives:<br /><br />- 178mm x 437mm x 699mm (H, W, D)<br />- dual CPU, 4x1Gbps ethernet<br />- 2x1280W Redundant Power Supplies<br />- 24xDDR3 slots<br />- LSI 2108 SAS2 RAID AOC (BBU optional), Hardware RAID 0, 1, 5, 6, 10, 50, 60<br />- 2x JBOD Expansion Ports<br />- BPN-SAS2-826EL1 826 backplane with single LSI SAS2X28 expander chip<br />- BPN-SAS2-846EL1 Backplane supports upto 24 SAS/SATA<br /><br /><a href="http://www.supermicro.com/products/system/4U/6047/SSG-6047R-E1R36N.cfm">http://www.supermicro.com/products/system/4U/6047/SSG-6047R-E1R36N.cfm</a><br /><a href="http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-LN4F_.cfm">http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-LN4F_.cfm</a><br /><br />Alt. system, 2RU, 24x2.5” hot-plug:<br /><br />- 89mm x 437mm x 630mm (H, W, D)<br />- 12Gbps SAS 3.0<br />- no CPU specified.<br /><a href="http://www.supermicro.com/products/chassis/2U/216/SC216BE1C-R920LP.cfm">http://www.supermicro.com/products/chassis/2U/216/SC216BE1C-R920LP.cfm</a><br /><br /><hr /><br />Open Vault/Open Rack.<br /><br /><a href="http://www.opencompute.org/projects/open-vault-storage/">http://www.opencompute.org/projects/open-vault-storage/</a><br /><a href="http://www.opencompute.org/projects/open-rack/">http://www.opencompute.org/projects/open-rack/</a><br /><br />The Open Vault is a simple and cost-effective storage solution with a modular I/O topology that’s built for the Open Rack.<br />The Open Vault offers high disk densities, holding 30 drives in a 2U chassis, and can operate with almost any host server.<br />Its innovative, expandable design puts serviceability first, with easy drive replacement no matter the mounting height.<br /><br />Open Rack<br /><a href="http://en.wikipedia.org/wiki/19-inch_rack#Open_Rack">http://en.wikipedia.org/wiki/19-inch_rack#Open_Rack</a><br /><br />Open Rack is a mounting system designed by Facebook's Open Compute Project that has the same outside dimensions as typical 19-inch racks (e.g. 600 mm width), but supports wider equipment modules of 537 mm or about 21 inches.<br /><br /><hr /><br />SGI® Modular InfiniteStorage™ <br /><br /><a href="http://www.sgi.com/products/storage/modular/index.html">http://www.sgi.com/products/storage/modular/index.html</a><br /><a href="http://www.sgi.com/products/storage/modular/server.html">http://www.sgi.com/products/storage/modular/server.html</a><br /><a href="http://www.sgi.com/products/storage/modular/jbod.html">http://www.sgi.com/products/storage/modular/jbod.html</a><br /><br />Image: <a href="http://www.sgi.com/products/storage/images/mis_jbod.jpg">http://www.sgi.com/products/storage/images/mis_jbod.jpg</a> [whole unit]<br />Image: <a href="http://www.sgi.com/products/storage/images/mis_brick.jpg">http://www.sgi.com/products/storage/images/mis_brick.jpg</a> [module: 3x3, vertical mount]<br /><br />Extreme density is achieved with the introduction of modular drive bricks that can be loaded with either nine 3.5 inch SAS or SATA drives, or 18 2.5 inch SAS or SSD drives.<br /><br />SGI® Modular InfiniteStorage™ JBOD<br /><br />(SGI MIS JBOD) is a high-density expansion storage platform, designed for maximum flexibility and the ability to be tuned to specific customer requirements.<br />Whether as a standalone dense JBOD solution, or combined with SGI Modular InfiniteStorage Server (SGI MIS Server), SGI MIS JBOD provides unparalleled versatility for IT managers while also dramatically reducing the amount of valuable datacenter real estate required to accommodate rapidly-expanding data needs.<br /><br />Up to 81 3.5" or 2.5" Drives in 4U<br />up to 3.2PB of disk capacity can be supplied within a single 19" rack footprint.<br /><br />SGI MIS JBOD shares the same innovative dense design with SGI MIS Server, which can be configured with up to<br />81 3.5" or 2.5" SAS, SATA SSD drives.<br />This enables SGI MIS JBOD to have up to 324TB in 4U.<br /><br />SGI MIS JBOD comes with a SAS I/O module, which can accommodate four quad port connectors or 16 lanes.<br />An additional SAS/IO module can be added as an option for increased availability.<br /><br /><hr /><br />SGI Modular InfiniteStorage Platform Specifications<br /><br /><a href="http://www.sgi.com/pdfs/4344.pdf">http://www.sgi.com/pdfs/4344.pdf</a><br /><br />Servers are hot pluggable, and can be serviced without impacting the rest of the chassis or the other server.<br />Through an innovative rail design, the chassis can be accessed from the front or rear, enabling drives and other components to be non disruptively replaced.<br />RAIDs 0, 1, 5, 6 and 10 can be deployed in the same chassis simultaneously for total data protection.<br />Battery backup is used to allow for cache de-staging for an orderly shutdown in the event of power disruptions.<br /><br />Connectivity Up to 4 SGI MIS JBODs per SGI MIS Server enclosure<br />Rack Height 4U<br />Height 6.94” (176 mm)<br />Width 16.9” (429.2 mm)<br />Depth 36” (914.4 mm)<br />Max weight 250 lbs. (113kg)<br />Internal Storage<br />Up to 72 X 3.5” or 2.5” 15mm drives (max 288TB)<br />Up to 144 x 2.5” 9.5mm drives.<br />RAID or SAS Controllers<br />Single server: Up to four 8 ports cards<br />Dual server: Up to two 8 ports card per server mother board (four per enclosure)<br />External Storage Attachment Up to 4 SGI MIS JBOD chassis per server enclosure<br /><br />JBOD modules:<br />Connectivity Four quad port SAS standard. Eight quad port SAS optional<br />Internal Storage<br />Up to 81 X 3.5” drives (max 324TB)<br />Up to 162 x 2.5” 9.5mm drives<br /><br /><hr /><br />SGI® COPANTM 400M Native MAID Storage<br /><div><br /></div><a href="https://web.archive.org/web/20130402000726/http://www.sgi.com/pdfs/4212.pdf">https://web.archive.org/web/20130402000726/http://www.sgi.com/pdfs/4212.pdf</a><br /><br /><ul><li>up to 2.7PB raw of data in a compact storage footprint.</li><li>8x4RU, 8xcanisters ea 4RU, 112 drives/4RU.</li><li>2x4RU power, cache and management</li><li>Up to 6,400 MB/s (23TB/hr) of disk-based throughput</li><li>idling: power consumption of the storage system by up to 85%.</li><li>Patented Disk Aerobics® Software</li><li>Patented Power Managed RAID® Software. Provides full RAID 5 data protection and helps lower energy costs as a maximum of 25% or 50%</li></ul><br /><br />Capacity: 224TB to 2688TB per cabinet - 1 shelf = 112x2TB, = 14 drives/module, 8 canisters/shelf<br />Shelves: 1–8<br />Connectivity: Up to eight 8-Gbps Fibre Channel ports [later docs: 16x8Gbps FCAL]<br /><br />Max Spinning Drives at Full Operation up to 50%<br />Spare Drives 5 per shelf for a maximum of 40<br />Disk Drive 2TB & 3TB SATA<br />Dimensions 30” (76.2 cm) w x 48” (121.9 cm)d x 87” (221 cm) h<br />Clearances Front–40” (101.6 cm), Rear–36” 91.4 cm), Side–0”<br />Weight Maximum 3,193 lbs. (1,447 kg)<br /><br />Power Consumption @ Standby (min/max) 426/2,080 watts<br />Power Consumption @ 25% power (min/max) 649/3,819 watts<br />Power Consumption @ 50% power (min/max) 940/6,554 watts<br /><br />Storage tiering software SGI Data Migration Facility (DMF)<br />D2D backup IBM® TSM®, CommVault® Simpana® Quantum® StorNext®<br /><br /><hr /><br />SGI® COPAN™ 400M Native MAID<br /><br /><a href="https://web.archive.org/web/20130401184152/http://www.sgi.com/products/storage/maid/400M/specifications.html">https://web.archive.org/web/20130401184152/http://www.sgi.com/products/storage/maid/400M/specifications.html</a><br />Specs:<br />Connectivity Up to sixteen 8-Gbps Fibre Channel ports<br /><br /><hr /><br />MAID Platforms<br />A New Approach to Data Backup, Recovery and Archiving<br /><br /><a href="https://web.archive.org/web/20130405012939/http://www.sgi.com/products/storage/maid">https://web.archive.org/web/20130405012939/http://www.sgi.com/products/storage/maid</a><br /><br />COPAN products are all based on an Enterprise MAID (Massive Array of Idle Disks) platform, which is ideally suited to cost effectively address the long-term data storage requirements of write-once/read-occasionally (WORO) data.<br /><br />Solutions<br /><br /><ul><li>Data Archiving</li><li>Data Protection: Backup & Recovery</li><li>Storage Tiering</li><li>Power Efficient Storage</li></ul><br />For backup, recovery and archiving of persistent data:<br /><br />Unprecedented reliability - six times more reliable than traditional spinning disk solutions<br /><br /><ul><li>Massive scalability - from 224 TB to 2688 TB raw capacity</li><li>High Density - 268 TB per ft.² (2688 TB per .93 m²)</li><li>Small Footprint - 10 ft.²</li><li>Energy Efficiency - up to 85% more efficient than traditional, always spinning disk solutions</li></ul><br />COPAN technology simplifies your long-term data storage,<br />drastically lowers your utility costs, and<br />frees up valuable data center floor space.<br /><br /><ul><li>Lowest Cost Solution</li><ul><li>Savings in operational costs and capital expenses</li></ul><li>Smallest Disk-Based Storage Footprint</li><ul><li>268 TB per square foot or 2688 TB per .93 m²</li></ul><li>High Performance</li><ul><li>Fast Restores up to 23 TB/hour system</li></ul><li>Breakthrough Energy Efficiency</li><ul><li>Save up to 85% on power and cooling costs</li></ul></ul><br />COPAN Patented Cannister Technology<br /><br /><ul><li>Patented mounting scheme to eliminate "rotational vibration" within a storage shelf</li><li>Canister technology enables efficient and quick servicing of 14 disk drives</li><li>Data is striped across canisters with a shelf in 3+1 RAID sets</li></ul><br /><br /><table><thead><tr><th>Storage<br />Environment Factors</th><th>Tape</th><th>Traditional<br />Disk-Based Storage</th><th>COPAN Systems'<br />Enterprise MAID</th></tr></thead> <tbody><tr><td>Quick Data Recovery</td><td>X</td><td>√</td><td>√</td></tr><tr><td>Cost per GB</td><td>√</td><td>X</td><td>√</td></tr><tr><td>Operating Expense</td><td>X</td><td>X</td><td>√</td></tr><tr><td>Scalability</td><td>√</td><td>X</td><td>√</td></tr><tr><td>Small Footprint</td><td>√</td><td>X</td><td>√</td></tr><tr><td>Power & Cooling Efficiency</td><td>√</td><td>X</td><td>√</td></tr><tr><td>Ease of Management</td><td>X</td><td>√</td><td>√</td></tr><tr><td>Built for Long-Term Data Storage</td><td>X</td><td>X</td><td>√</td></tr></tbody> </table><br /><br /><a href="https://web.archive.org/web/20120619124024/http://www.sgi.com/pdfs/4223.pdf">https://web.archive.org/web/20120619124024/http://www.sgi.com/pdfs/4223.pdf</a><br /><br />MAID is designed for Write Once Read Occasionally (WORO) applications.<br />Six times more reliable than traditional SATA drives<br /><br />Disaster Recovery Replication Protection:<br />Three-Tier System Architecture<br />• Simplifies system management of persistent data<br />• Scales performance with capacity<br />• Enables industry-leading, high density, storage capacity in a single footprint<br />• Enhances drive reliability with unique disk packaging, cooling and vibration management<br /><br />Patented Canister Technology<br />• Patented mounting scheme eliminates “rota- tional vibration” within a storage shelf<br />• Canister technology enables efficient and quick servicing of the 14 disk drives<br />• Data is striped across canisters with a shelf in 3+1 RAID sets<br /><br /><hr /><br />MONDAY, MAY 16, 2011<br /><br />EMCWorld: Part2 - Isilon Builds on Last Months Announcements with Support for 3TB Drives and a 15PB FileSystem<br /><br /><a href="http://www.storagetopics.com/2011/05/emcworld-part2-isilon-builds-on-last.html">http://www.storagetopics.com/2011/05/emcworld-part2-isilon-builds-on-last.html</a><br /><br />With list pricing starting at $57,569 ($4.11/GB) for the S200 <br />the value metric is not the <br />traditional capacity view but<br />$/IOP and <br />$/MBs<br />i.e. $6/IOP and $97/MBs respectively for the S200.<br />With a starting price of $27,450/Node the X200 comes in at nearly <br />$13/IOP and<br />$110/MBs<br />but has an attractive starting price of<br />$1.14$/GB,<br />even more attractive when their 80% utilization claim is factored in. <br /><br />They doubled their IOP number for<br />a maximum cluster size of 144 nodes<br />to 1.4M IOPs and<br />doubled of their maximum throughput to 85GB/s.<br />It is not just the power of Intel (Westmere/Nahalem upgrades) that has driven this performance increase but also<br />the intelligent use of SSD’s.<br />By supporting HDD’s and SSD’s in the same enclosure and by placing the file metadata on SSD, performance gets a significant boost.<br />The IOP number has not yet been submitted to SpecFS so the performance number is still “unofficial”. <br /><br />The latest announcement last week at EMCWorld increased the maximum supported single file system,<br />single volume to 15PB plus support for 3TB HDD’s on their capacity platform, the NL-108.<br />Worth noting that this impressive scalability is only for the NL108 configured with 3TB drives.<br />In comparison the higher performance X200 scalability tops out at 5.2PB. <br /><br /><table><thead><tr><th>Feature</th><th>S200</th><th>X200</th></tr></thead> <tbody><tr><td>Form Factor</td><td>2U</td><td>2U</td></tr><tr><td>Maximum Drives</td><td>24</td><td>12</td></tr><tr><td>Drive Types</td><td>2.5” SAS, SSD</td><td>3.5” SATA, SSD</td></tr><tr><td>Maximum Node Capacity</td><td>14TB</td><td>24TB</td></tr><tr><td>Max Memory</td><td>96GB</td><td>48GB</td></tr><tr><td>Global Coherent Cache</td><td>14TB</td><td>7TB</td></tr><tr><td>Max Cluster Size</td><td>144</td><td>144</td></tr><tr><td>Protocols</td><td colspan="2">NFS, CIFS, FTP, HTTP, iSCSI (NFS 4.0, Native CIFS and Kerberized NFSv3 supported)</td></tr><tr><td>Maximum IO/s</td><td>1,414,944 IO/s</td><td>309,312 IO/s</td></tr><tr><td>Maximum Throughput</td><td>84,960 MB/s</td><td>35,712 MB/s</td></tr><tr><td>List Price Starting at;</td><td>$57,569/node</td><td>$27,450/node</td></tr></tbody> </table><br /><br />Front and center in Isilon’s promotional pitches are the advantages of scale-out namely, scalability, efficiency, ease of use and availability and are positioning themselves as the scale-out architecture that is integrated with capabilities that elevate it to enterprise class. This they believe serves them well in both their traditional space as well positioning them to penetrate the commercial HPC, Big Data space. <br /><br /><hr /><br />WEDNESDAY, MAY 18, 2011<br /><br />EMCWorld; Part 3, The Final installment; VNXe, ATMOS and VMWare<br /><br /><a href="http://www.storagetopics.com/2011/05/emcworld-part-3-final-installment-vnxe.html">http://www.storagetopics.com/2011/05/emcworld-part-3-final-installment-vnxe.html</a><br /><br />VNX Series: As you all are probably well aware the VNX series is the EMC mid-tier, unified storage offering that is in the process of replacing the CLARiiON and Celerra lines.<br />It was launched back in January and continues to evolve as these announcements suggest:<br /><br />1. FLASH 1st is the VNX SSD strategy which incorporates FAST, FASTCache and soon to be available server side cache code named project lightening.<br />On this feature I must admit I became a bit of a convert, see my comments in my earlier blog.<br />2. A Cloud Tiering Appliance designed to offload cold unstructured data from the VMX to the cloud was introduced.<br />This device can also operate as a migration tool to siphon data from other devices such as NetApp.<br />This announcement really resonated with me, more coverage in my earlier blog.<br />3. A ruggedized version of the SMB version of the VNXe was introduced.<br />It was mentioned a couple of times in the presentations that EMC have not done well in the federal space.<br />This is an obvious attempt to help fix that deficiency.<br />Napolitano also mentioned that 50% of the customers who have purchased VNXe were new to EMC and during the 1st quarter EMC signed 1100 new VNXe partners. <br />4. SSD support for the VNXe.<br />Another reinforcement of EMC’s commitment to solid state storage.<br />5. VAAI support for NFS and block enhancements including thin provisioning.<br />No surprise here - a deeper integration with VMWare which all storage vendors should be doing.<br />EMC just happens to have a bit of an advantage.<br />6. A Google Search Appliance was introduced.<br />This device enables updated files to be searched sooner and comes in two flavors the 7007 supporting up to 10M files and 9009 supporting up to 50M files.<br />Clever announcement; in the world of big data findability (my word) is valuable currency.<br />7. A high density disk enclosure supporting 60, 3.5” SAS, NL-SAS or Flash drives.<br />GB/RU is one of today’s metrics and this helps EMC’s capacity positioning big time.<br />8. Doubled bandwidth performance with a high bandwidth option that triples the 6Gb SAS backend ports.<br />Bandwidth, IOPS & capacity and interesting balancing act particularly when you throw in cost.<br /><br /><br />ATMOS: I first started to write about Atmos when Hulk was the star of the rumor mill; boy how time fly’s.<br />Hulk is still there in its evolved instantiation but its role has most certainly moved as a back-up player in the chorus line.<br />The lead player, ATMOS 2.0 featured in the announcement with the declaration of a significant performance boost.<br />The claim is an 5x increase in performance with a current ability to handle up to 500M objects per day.<br />They have also changing their protection scheme they can increase storage efficiency by 65%.<br />Change is probably the wrong word, they continue to support the multiple copy approach but have added there new object segmentation approach.<br /><br />Previously data protection was achieved by the creation of multiple copies that were distributed within the Atmos cloud.<br />The EMC Geo Parity as it is called is similar to the<br />Cleaversafe approach where rather than storing multiple copies of a complete object<br />it breaks the object into segments (12) with four segments being parity,<br />analogous to a RAID group.<br />These segments are then distributed throughout the cloud with the data protected with a tolerance to a multiple failures.<br /><br />VMWare: Not much in terms of announcement but some the adoption stats was interesting.<br /><br />• VM migration (vMotion) has increased from 53% to 86%<br />• High availability use has increased from 41% to 74%<br />• Dynamic Resource Scheduling (DRS) has increased from 37% to 64%<br />• Dynamic migration (storage vMotion) has increased from 27% to 65%<br />• Fault tolerant use has grown from zero to 23%<br /><br /><hr /><br />IBM Delivers Technology to Help Clients Protect and Retain "Big Data"<br /><br /><a href="http://www-03.ibm.com/press/us/en/pressrelease/34452.wss">http://www-03.ibm.com/press/us/en/pressrelease/34452.wss</a><br /><br />Introduces industry-first tape library technology capable of storing nearly 3 exabytes of data -- enough to store almost 3X the mobile data in U.S. in 2010<br /><br />ARMONK, N.Y., - 09 May 2011: IBM (NYSE: IBM) today announced new tape storage and enhanced archiving, deduplication offerings designed to help clients efficiently store and extract intelligence from massive amounts of data.<br /><br />At the same time, demand for storage capacity worldwide will continue to grow at a compound annual growth rate of 49.8 percent from 2009-2014, according to IDC (1). Clients require new technologies and ways to capitalize on the growing volume, variety and velocity of information known as "Big Data."<br /><br />IBM System Storage™ TS3500 Tape Library is enabled by a new, IBM-developed shuttle technology -- a mechanical attachment that connects up to 15 tape libraries to create a single, high capacity library complex at a lower cost. The TS3500 offers 80 percent more capacity than a comparable Oracle tape library and is the highest capacity library in the industry, making it ideal for the world's largest data archives (3).<br /><br /><hr />Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-6981542140560447942014-03-20T23:38:00.000-07:002014-03-20T23:38:05.688-07:00Storage: Efficiency measuresIn 2020 we can expect bigger disk drives and hence Petabyte stores. Price per bit will come at a premium, it won't track capacity as it does now: larger capacity drives will cost more per unit.<br /><br />What are the theoretical limits on which Storage solution "efficiency" can be judged?<br /><br />We're slowly approaching what could be the last factor-10 improvement, to 10Tbits/in², in rotational 2-D magnetic recording technologies of Hard Disk Drives. Jim Gray (<a href="http://research.microsoft.com/en-us/um/people/gray/">~2000</a>) and Mark Kryder (<a href="http://www.dssc.ece.cmu.edu/research/pdfs/After_Hard_Drives.pdf">2009</a>) suggested 7TB/platter for 2.5" disk drives by 2020, assuming a 40%/yr capacity growth.<br /><br />Rosenthal et al (<a href="http://www.unesco.org/new/fileadmin/MULTIMEDIA/HQ/CI/CI/pdf/mow/VC_Rosenthal_et_al_27_B_1330.pdf">2012</a>) suggest that, like CPU-speed "Moore's Law", disk capacity growth rates have slowed, suggesting 100Tbits/in² may be possible in the far future. They predict 1.8 Tbits/in² commercially available in 2020, vs 0-6-0.7Tb/in² currently.<br /><br />Three platter 2.5" drives are normally 12.5mm thick, but <a href="http://www.storagenewsletter.com/rubriques/hard-disk-drives/hgst-travelstar-5k1500/">are in 9.5mm drives available in 2013</a> (HGST, 1.5TB). Four platter 2.5" drives are 12.5mm or 15mm usually, <a href="http://www.tomshardware.com/news/seagate-samsung-hdd-spinpoint-m9t,24997.html">according to Seagate</a>, with three 667GB/platter in 9.5mm for 2TB total (using 2.3W for read/write).<br /><br />Slim-line 7mm and 5mm 2.5" drives are on the market. <a href="http://www.storagenewsletter.com/rubriques/hard-disk-drives/wd-blue-hdd/">7mm drives are two platter</a>.<br /><br />In 2020, the 2.5" disk drive market will differentiate by both thickness (5, 7, 9.5, 12.5,15mm) and number of platters, from 1 to 4. Laptop and ultrabook manufacturers will determine if 7mm replaces 9.5mm as the standard consumer portable form factor, giving them a volume production price advantage.<br /><br />Per-platter, we can expect 1.5TB-2TB, or total 1TB-6TB in 2.5" drives [vs 5 platter 3.5" drives at 15TB].<br /><br />Storage system builders will be able to select drive combinations on, not just SSD + HDD:<br /><ul><li>Cost per GB</li><li>GB per cubic-inch</li><li>Watts per GB, and</li><li>spindles per TB, setting maximum IO/sec and streaming IO performance.</li></ul><b>Questions:</b><br /><ul><li>How many drives can fit in a single rack?</li><ul><li>How much raw capacity?</li></ul><li>How much power would they use? [and how much cooling]</li><li>How much does it all weight? [can the floor hold it up?]</li><li>Time to back it up?</li><ul><li>Dependant on external ports and interface speeds.</li></ul><li>Performance:</li><ul><li>How many IO/sec?</li><li>Aggregate internal streaming throughput?</li><li>Normalised multi-media transactions/sec: 1MB Object requests/sec?</li><li>Scan Time for searching, data mining, disk utilities & RAID rebuild?</li></ul></ul><br />Disk drives have 3 different dimensions: WxDxH and 3 different 'faces', WxD, WxH, DxH<br />For 2.5" drives, approx 70mm x 101mm x 9.5mm<br />For 3.5" drives, 101.6mm (4 inch) x 146mm (6inch) x 19-26.1mm (nominally "1 inch")<br /><br />Drives can be placed with any of the 3 faces down and rotated about a vertical axis, giving potentially 6 orientations.<br />In practice, the thinest cross-section has to face forward, into the airflow, to allow effective cooling.<br />This gives just 4 orientations: 2 'flat' and 2 'vertical'.<br /><br /><b>19 inch racks are "mostly standard":</b><br /><ul><li>19 inches across the faceplate, posts are each 5/8 inch, fasteners & holes are well defined.</li><ul><li>But need extra space either side for cabling and airflow, increasing external rack dimension.</li></ul><li>17.75 inches internal clearance (450mm). With sliders: 17.25 inches internal. (435mm)</li><li>1RU (Rack Unit) = 1 .75 inches high</li><li>convention is 42RU high = 73.5 inches of usable space</li><ul><li>Allow for plinth, first usable RU is off the floor</li><li>Allow for head piece, plate + structural rails,fans and cable organisers on top</li></ul><li>Depth varies on use:</li><ul><li>600mm (24 inch) common in Telecoms</li><li>966mm (38 inch) common in IT.</li><li>Need extra space front and rear for doors, cabling, power strips, ...</li></ul><li>External dimensions: 30in x 48in x 87in (WxDxH)</li><ul><li>Notionally, a single rack uses ten square feet (1 square meter) of floor space.</li><li>side clearance of zero: racks bolt together to stabilise the structure.</li><li>Front and rear clearance, often 40" and 30" are needed to open doors and load/unload parts.</li><li>Aisles are needed between rows to allow work and access.</li><ul><li>In many facilities, need to open two doors at once, 50" minimum.</li></ul><li>"Hot Aisle": exhaust adjacent rows into the one sealed area with extractor fan.</li></ul><li>Floor space in server rooms</li><ul><li>Only around 33%-50% of the available floor space can be used for racks.</li><li>Racks are best organised in rows parallel to long dimension of room</li><li>long rooms need breaks in rows, creating cross-aisles</li><li>Additional clearance is needed around walls of rooms</li><li>Entrance doors need to be double and handle shipping pallets</li><li>Extra spare space is needed around doors for staging equipment in, and storing packing waste before removal</li><li>Dedicated space is needed for "Air handling units" (at least two), power distribution boards and fire control systems. These need clearance for servicing and removal/replacement.</li><li>In room UPS units need space and cooling (No-break power supplies)</li><ul><li>lead-acid battery banks of any capacity need to be housing in separate, spark-proof rooms with additional fire control and sprinklers.</li></ul></ul></ul><br /><b>Stacking 3.5 inch drives,</b> no allowance for cooling, wiring, power or access<b>:</b><br /><br />3.5" drives, at 4 inches wide can be stacked flat, 4 abreast in a rack.<br />6 drives will fit end-to-end in a 36"-38" cabinet, for <b>24 drives in a layer</b>.<br />Alternatively, 17 drives can be stood on their sides across a rack, 4" tall layers.<br /><b>With 102 drives/layer and 1836 drives/rack.</b><br /><br />For nominal 1" thick drives, 72 layers can be stacked, <b>giving 1,752 drives per rack.</b><br />With 15TB 3.5" drives, <b>26PB/rack.</b><br />With 4TB 3.5" drives, 7PB/rack.<br />7200 RPM 3.5" drives consume 8W-10W, or <b>14kW-16kW per rack.</b><br />7200 RPM, 120Hz, drives are capable each of 240 IO/sec, <b>for 400k IO/sec aggregate.</b><br />3.5" drives weight ~600grams each, for a load of about <b>1 ton (or 1,000kg/m²)</b><br />15TB 3.5" drives will stream at around 2Gbps, for <b>3.5Tbps aggregate internal bandwidth.</b><br /><br /><b>Stacking 2.5" drive</b>s, at 5400 RPM (90Hz)<br /><table><tbody><tr><td>5mm</td><td>17,800 drives</td><td>3,204k IO/sec</td><td>@ 0.5TB 9PB/rack</td><td>@ 1.5TB 27PB/rack</td></tr><tr><td>7mm</td><td>12,714 drives</td><td>2,288k IO/sec</td></tr><tr><td>9.5mm</td><td>9,368 drives</td><td>1,686k IO/sec</td><td>@ 1TB 9.5PB/rack</td><td>@ 3TB 30PB/rack</td></tr><tr><td>12.5mm</td><td>7,120 drives</td><td>1,281k IO/sec</td></tr><tr><td>15mm</td><td>5,933 drives</td><td>1,069k IO/sec</td><td>@ 2TB 12PB/rack</td><td>@ 6TB 36PB/rack</td></tr></tbody></table><br />Power consumption at 1.2W for <b>9.5mm drives of 8kW,</b> around half the power needed for 3.5" drives.<br /><br />Aggregate internal bandwidth is higher, even though the per-drive streaming rate is up to 25% lower, 1.5Gbps.<br />For 9.5mm drives, <b>14Tbps aggregate internal bandwidth</b> (3TB drives).<br /><br />5mm drives weigh around 95grams each and 15mm drives 200grams, the same weight ± 15% as 3.5" drives.<br /><br />Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-35537664397492370032013-09-15T23:09:00.000-07:002013-09-15T23:53:28.701-07:00Reasonably Trustworthy Messaging (RTM)<a href="http://www.groklaw.net/article.php?story=20130818120421175">PJ of Groklaw got spooked a Lavabit founder</a> responding to PRISM by saying "if you knew what I did, you wouldn't use the Internet/email".<br /><br />This is the start of a design for a <i>reasonably</i> trustworthy messaging system, in the same way that PGP was only<i> pretty good</i> privacy.<br /><br />I'd like to combine 3 tools/concepts on top of the obvious measures.<br /><br /><ul><li>SFTP as a file delivery mechanism.</li></ul><ul><li>ACSnet (later MHSnet) was a Store & Forward system that separated content and control while passing files to a content-handler on the far end.</li></ul><ul><li>Bespoke classified message systems contain two useful concepts:</li><ul><li>Messages have an urgency and separate security classification.</li><ul><li>these are used in routing & queuing decisions.</li></ul><li>Every message is tracked & acquitted via multiple sequence numbers:</li><ul><li>Per-link or channel sequence number</li><li>end-end sequence numbers</li></ul></ul></ul><div>Obvious measures:</div><div><ul><li>File splitting</li><ul><li>Everything is a 2Kb or 4Kb block</li><li>I'd prefer an "M of N" redundancy system to allow some data to be lost. </li><ul><li><a href="http://en.wikipedia.org/wiki/Parchive">Parchive</a> & friends do this</li></ul><li>There's a "batch reassembly" file implied by this.</li><li>SFTP needs to have some means of grouping batched files together.</li><ul><li>Either a named control file, or</li><li>a directory name, a sequence number in clear or encrypted.</li></ul></ul></ul><ul><li>PGP/GPG encryption</li><ul><li>Encrypt everything</li><li>Compress first, for text files especially.</li><ul><li>Ideally, force different coding tables per file.</li></ul></ul></ul><ul><li>Content-based file naming</li><ul><li>Files only referred to by content: its hash-sum, SHA-1 or MD5 (old)</li><li>Queued blocks can be jumbled and transmitted in any order, being reassembled into correct order later.</li><ul><li>This implies "message headers" contain the hash-keys of contents.</li></ul></ul></ul><div>There are old programs, like "fetchmail" (or 'fdm' now), that know how to pull mail from many sources and present to the Mail User Agent in a form it can handle, like the traditional Unix mailbox format.</div></div><div><br /></div><div>The ACSnet software was able to accept emails from "sendmail", so a simple SMTP daemon is also needed here to accept outbound messages from Mail Clients, allowing </div><div><br /></div><div>An extra layer of obfuscation and aggregation, like ToR or VPN's, makes the task of matching inbound/outbound packets harder. </div><div><br /></div><div>One of the central notions is no system, apart from the end-points, ever has the unencrypted files/messages.</div><div><ul><li>Messages are encrypted with the public-key of the next-hop destination before sending.</li></ul><ul><li>Messages queued to a link/channel should be kept as-received (encrypted for current system), only being decrypted before transmission.</li></ul><ul><li>Small blocks mean encryption can be performed in-memory, without any intermediate results touching a disk. Prevent swapping and VM page-write may be a challenge, but only one layer is lost.</li></ul><ul><li>SFTP/SSH use per-session keys to encrypt transfers.</li><ul><li>With content re-encrypted per hop, there is no common plain-text allowing keys to be compromised.</li></ul></ul><div><b>Solution</b></div></div><div><br /></div><div>That's what I'm aiming for, a low user-intervention system that users can reasonably trust:</div><div><ul><li>Messages and files sent, when delivered can be shown to intact and complete.</li></ul><ul><li>No messages/files can be read "in clear" on the wire or on any intermediate system.</li><ul><li>Messages/files only appear unencrypted on the source and destination system.</li><li>Simple tools, like USB drives, can be used to transfer files to/from off-line systems.</li></ul></ul><ul><li>Users can get confirmations of delivery and/or acceptance back from each step along the way.</li></ul><ul><li>Data is sufficiently obscured so traffic analysis yield minimal useful metadata:</li><ul><li>Using SFP in a Store+Forward mode removes the normal headers </li><li>Any SFP service can be used as a relay, if the per-hop encryption can't be organised on the server and exposing one-layer of encryption an acceptable risk.</li></ul></ul><div><b>Use Case</b></div></div><div><br /></div><div>Alice and Bob want to exchange Company-Confidential information.</div><div>They both have access to the same SFTP server, or preferably a service that also supports re-encryption.</div><div>Or they each have access to SFTP servers hosted by co-operating trustworthy providers.</div><div>Alice and Bob both have off-line computers that they load/download encrypted files onto.</div><div>Alice and Bob both have internet-connected computers with the RTM App, PGP/GPG & SFTP installed, plus the transfer host PGP/GPG private keys.</div><div>Alice and Bob have exchanged their email Public PGP/GPG keys, both only have their private email keys on their off-line system.</div><div>Alice and Bob have exchanged their internet-host PGP/GPG public keys with their upstream server.</div><div><br /></div><div>Alice, on her <i>off-line system</i>, writes one or more messages to Bob (and others) from her usual Mail Client and possibly queues some file transfers with another App.</div><div><br /></div><div>The RTM software creates a directory of encrypted files which Alice then copies to a USB drive.</div><div><br /></div><div>On her on-line system, Alice loads the USB drive and starts the RTM software than uses SFTP to transfer files to their shared server. The RTM software creates a control file for the batch and encrypts all files with the public key of the next hop.</div><div><br /></div><div>The next-hop system receives the files and decrypts the control file, queuing blocks for whichever next hop link is to be used and creates & encrypts control files for batches.</div><div><br /></div><div>After zero of more hops, Bob's SFTP server receives blocks batched for him from Alice and others and queues them for Bob, ready to encrypt them with his transfer host public key.</div><div><br /></div><div>Bob, in his own time, starts RTM on his transfer system and downloads the blocks queued for him, decrypting them to copy to his USB drive. These blocks were encrypted for Bob by Alice and others using the email public key he shared with them. They cannot be decrypted on the transfer host.</div><div><br /></div><div>Bob takes his USB drive to his off-line system and starts RTM to upload the encrypted blocks, decrypt them, checks hash-keys, reassembles the sent files and then delivers to the appropriate 'handler', email, file transfer or other specific tool.</div><div><br /></div><div>Bob can then check email on his off-line machine using his favourite Mail Client. He can choose to save/distribute the decrypted files transferred by whatever means he needs.</div><div><br /></div><div>Alice and Bob do not have to use off-line systems with air-gaps. If they accept the risk, they can run both the transfer host and "in-clear" system as Virtual Machines in the same system. A private, internal network can be used to transfer encrypted blocks between the two VM's.</div><div><br /></div><div>If the system hosting the two VM's is compromised, the most an attacker can do is monitor the display.</div><div><br /></div><div>I haven't discussed the various passwords/pass-phrases that would be needed in operation.</div><div>The system should be simple to install and configure and be mostly self-administering.</div><div><br /></div><div><br /></div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-31275334957553133002013-02-04T14:08:00.002-08:002013-02-04T14:48:22.427-08:00Storage: New Era Taxonomies<div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">There are 3 distinct consumer facing market segments that must integrate seamlessly:</div></div><div><ul><li>portable/mobile devices: "all flash".</li><li>Desktop (laptop) and Workstations</li><li>Servers and associated Storage Arrays.</li></ul></div>We're heading into new territory in Storage:<br /><ul><li>"everything" is migrating on-line. Photos, Documents, Videos and messages.</li><ul><li>but we don't yet have archival-grade digital storage media.</li><ul><li>Write to a portable drive and data retention is 5 years: probable, 10 years: unlikely. That's a <i>guess</i> on my part, real-life may well be much worse.</li></ul><li>Currently householders don't understand the problem.</li><ul><li>Flash drives are not (nearly) permanent or error-free.</li><li>Most people have yet to experience catastrophic loss of data.</li><li>"Free" cloud storage may only be worth what you pay for it.</li></ul></ul><li>Disk Storage (magnetic HDD's) is entering its last factor-10 increase.</li><ul><li>We should expect 5-10TB/platter for 2.5" drives as an upper bound.</li><li>Unsurprisingly, the rate of change has gone from "doubling every year" to 35%/year to 14%/year.</li><li>As engineers approach hard limits, the rate of improvement is slower and side-effects increase.</li><li>Do we build the first maximum-capacity HDD in 2020 or a bit later?</li></ul><li>Flash memory is getting larger, cheaper and faster to access, but itself is entering an end-game.</li><ul><li>but retention is declining, whilst wear issues may have been addressed, at least for now.</li><li>PCI-attached Flash, the minimum latency config, is set to become standard on workstations and servers.</li><ul><li>How do we use it and adequately deal with errors?</li></ul></ul><li>Operating Systems, General Business servers and Internet-scale Datacentres and Storage Services have yet to embrace and support these new devices and constraints. </li></ul><div>David Patterson, author with Gibson & Katz of the 1989 landmark paper on "RAID", noted that every storage layer trades cost-per-byte with throughput/latency.<br />When a layer is no longer cheaper than a faster layer, consumers discard it. Tapes were once the only high-capacity long-term storage option<br /><br />My view of FileSystems and Storage:</div><ul><li>high-frquency, low-latency: PCI-flash.</li><li>high-throughput, large-capacity: read/write HDD.</li><li>Create-Once Read-Maybe snapshot and archival: non-update-in-place HDD.</li><ul><li>'Create' not 'Write'-Once. Because latent errors can only be discovered actively, one of the tasks of Archival Storage systems is regularly reading and rewriting all data.</li></ul></ul>What size HDD will become the norm for read/write and Create-once HDD's?<br />I suspect 2.5" because:<br /><br /><ul><li>Watts-per-byte are low because aerodynamic drags increases near the fifth power of platter diameter and around cube of rotational speed.</li><ul><li>A 3.5" disk has to spin at around 1700rpm to match a 5400rpm 2.5" drive in power/byte, and ~1950rpm to match a 7200rpm 2.5" drive.</li><li>All drives will use ~2.4 times the power to spin at 7200rpm vs 5400rpm.</li><li>Four 2.5" drives provide around the same capacity as a single 3.5" drive</li><ul><li>Area of 2.5" platters are half a 3.5" platters.</li><li>2.5" drives are half the thickness as 3.5" drives (25.4mm)</li><li>3.5" drives may squeeze 5 platters, 25% better than 2.5".</li></ul></ul><li>Drives are cheap, but four smaller drives will always be more expensive than a single larger drive.</li><ul><li>Four sets of heads will always provide:</li><ul><li>higher aggregate throughput</li><li>lower-latency</li><li>more diversity, hence more resilience and recovery options</li><li>"fewer eggs in one basket". Impact of failures are limited to a single drive.</li></ul><li>In raw terms, the cheapest, slowest, most error-prone storage will always be 3.5" drives. But admins build <i>protected</i> storage, not raw.</li><ul><li>With 4TB 3.5" drives, 6 drives will provide 16TB in a RAID-6 config.</li><ul><li>Note the lack of hot-spares.</li></ul><li>With 1TB 2.5" drives, RAID-5 is still viable.</li><ul><li>24 drives as two sets of 11 drives + hot-spare, provide 20TB.</li><li>For protected storage, 3.5" drives only offer at best 3.2 times the density and many-fold less throughput and latency.</li></ul></ul></ul></ul><br /><br /><br />Here are some of my take-aways from <a href="http://linux.conf.au/">LCA 2013</a> in Canberra.<br /><br />* We're moving towards 1-10TB of PCI-Flash or other Storage Class Memory being affordable and should expect it to be a 'normal' part of desktop & server systems. (Fusion IO now 'high-end')<br /><br /> - Flash isn't that persistent, does fade (is that with power on?).<br /> - How can that be managed to give decade long storage?<br /> - PCI-Flash/SCM could be organised as one or all of these:<br /> - direct slow-access memory [need a block-oriented write model]<br /> - fast VM page store<br /> - File System. Either as<br /><span class="Apple-tab-span" style="white-space: pre;"> </span>- 'tmpfs' style separate file system<br /><span class="Apple-tab-span" style="white-space: pre;"> </span>- seamlessly integrated & auto-managed, like AAPL's Fusion LVM<br /><span class="Apple-tab-span" style="white-space: pre;"> </span>- massive write-through cache (more block-driver?)<br /><br /> - there was a talk on Checkpoint/Restart in the Kernel, especially for VirtMach, it allows live migration and the potential for live kernel upgrades of some sort...<br /> - we might start seeing 4,000 days uptime.<br /> - PCI-Flash/SCM would be the obvious place to store CR's and as source/destination for copies<br /> - nobody is talking about error-detection and data preservation for this new era: essential to explicitly detect/correct and auto-manage.<br /><br /> - But handling read-errors and memory corruption wasn't talked about..<br /> - ECC won't be enough to *detect* let alone correct large block errors.<br /> - Long up-times means we'll want H/A CPU's as well to detect compute errors.<br /> Eg. triplicated CPU paths with result voting.<br /><br /> - the 'new era' approach to resilience/persistence has been whole-system replication and 'network' (ethernet/LAN) connection, and away from expensive internal replication for H/A.<br /><br /><br />==> As we require more and more of 'normal' systems, they start to need more and more Real-time and H/A components.<br /><br />==> For "whole-system" replication, end-end error detection of all storage transfers starts to look necessary. i.e. an MD5 or other checksum generated by the drive or Object store and passed along the<br />chain into PCI-Flash and RAM: and maybe kept for rechecking.<br /><br />==> With multiple levels of storage with latency between them and very high compute rates in CPU's, we're heading into the same territory that Databases addressed (in the 80's?) with Transactions and ACID.<br /><br /><br />* Log-structured File Systems are perfect fit for large Flash-based datastores.<br /> - but log-structured FS may also be perfect for:<br /> - write-once data, like long-term archives (eg. git repos)<br /> - shingled write (no update-in-place) disks, effectively WORM.<br /><br />==> I think we need an explicit Storage Error Detect/Correct layer between disks and other storage to increase BER from 10^14 or 10^16 to more like 10^25 or 10^30. [I need to calculate what numbers are actually needed.] Especially are everything gets stored digitally and people expect digital archives to be like paper and "just work" over many decades.<br /><br />Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-53356010891763500022013-01-17T16:31:00.000-08:002013-01-18T21:57:05.662-08:00Storage: FileSystems, Block/Object Storage and Physical Disk Management in 21st Century SystemsThe central social contract filesystem and storage layers have with users is:<br /><ul><li><i>Don't lose data</i></li><li>Make it easy to get data in and out, preferably verifiably correct.</li><li>Performance is nice, but can <i>never</i> talks precedence over preserving data and replaying it correctly.</li></ul>The approaches and paradigms that worked for Unix in 1970 won't work now. Its was a world of 5-10MB drives @ 1-10Mbps, 1MHz CPU's without cache and "off-line" storage was 6250bpi 9-track 0.5in tape (2400' ~120Mb).<br /><br />Even nearly 20 years later in 1988, the year of the Patterson/Gibson/Katz RAID paper, streaming the full contents of a drive for a rebuild (100MB 5.25" SCSI drives) was ~100 seconds and ~1000 seconds for 1GB 8" Fujitsu Eagle drives preferred by the first Storage Arrays.<br /><br />What's changed is the relative capacity and speeds of storage devices, the demands of "average users" and some additional layers of storage, like cache and Flash memory.<br /><br />The old approaches are creaking and becoming more & more complex in attempts to handle performance (rate), volume and size. One "fast" filesystem, ReiserFS, was popular for a time but notorious with users for corrupting disks and losing data. Breaking the contract loses users...<br /><br />The 10 TB/platter 2.5" drives expected by 2020 will only read 2-3 times faster than current 1TB drives (250-400MB/sec). That's 40,000 seconds to stream the whole drive: 10-12 hours. Increasingly, Jim Grays' millennial observation, "Tape is Dead, Disk is the new Tape" (meaning disks are good at streaming, poor at random I/O), is driving Storage designs. Enterprise Class Storage Arrays cannot compete with Flash memory for random I/O and to cover need increasingly long drive rebuild times (4-150 hours) have adopted slower, more inefficient/complex parity schemes.<br /><br />We now have chips with 3 levels of cache, soon with on-chip DRAM, on-board DDR3 DRAM, PCIe Flash, SATA/SAS Flash and HDD drives and soon "no update-in-place" Shingled-Write drives.<br />SCM, Storage Class Memories, like Flash are hoped to provide the path to higher capacity devices, but to date, their are no obvious commercial technologies.<br /><br />This in the context of at least 4 types of compute devices, each with different demands for Storage and data recovery and protections.<br /><ul><li>Mobile: smartphones and tablets. Not usually "content creators" but "viewers". Software from Firmware and vendor App Stores. Auto-sync config and data to "Cloud" or desktop.</li><li>Laptop/low-end Desktop: Limited "content creation". Restores via vendor products, erratic/random backups and data protection.</li><li>"Power user" Workstations: Professional platforms for content creation. Dedicated Storage Appliances, with problematic & erratic data protection.</li><li>Servers:</li><ul><li>SOHO/SME, small ISP: single servers or small farms. Nil or problematic data protection.</li><li>SMP servers, business server farms: SAN's + Storage Arrays, H/A, multiple-sites, fail-over, ...</li><li>Clusters and large arrays: special filesystems, lots of storage, fast networks.</li><li>Internet-scale Data Centres: purpose built hardware and storage solutions.</li></ul></ul>On my average Mac desktop there are over 2M files after 5years, a scale not anticipated by the original Unix Directory-Inode-Links-Blocks design.<br /><br />It's now possible and feasible for individuals to follow <a href="http://research.microsoft.com/en-us/projects/mylifebits/">Gordon Bell</a> and digitally record their entire lives. This is more than storing random snaps from smartphones, but creating a usable, accessible store.<br />In 10M recording seconds per year, individuals can create 100k files/year, 1TB at low data rates (100KB/sec) and view 1-10M files/web-pages.<br /><br />This load for even the current 1-2B smartphone users (not the 6B cell phone services), whilst potentially being a boon for Network Operators & Storage vendors, requires new services and new approaches. Especially:<br /><ul><li>Strong User Identification with many roles per individual, for work, interests and personal life.</li><li>Single Federated views of individual-Identity storage.</li><li>New Search, Indexing, tagging and annotation tools.</li><li>Integrated "point-in-time" file browsing and scanning.</li><li>Internet-scale data de-duplication and peer-peer Storage.</li></ul>We already have definitive solutions for "point-in-time" recovery of:<br /><ul><li>Text files via Version Control Systems like SVN, CVS, RCS, ...</li><li>Relational Databases with full-DB snapshot and "roll-forward" transaction logs,</li><li>but other important binary data types, {DB's, images, videos, sound, PDF-docs, geo-data, machine control, ...}, aren't born with verifiable digital signatures, nor their own change logs.</li></ul>Metadata, both system generated like timestamps, Geo-locn and user GUID, and user-supplied data, like tags and text, are as important as the data changes.<br /><br />Backups and Version Control Systems typically offer 3 sorts of versioning. A combination of these methodologies will be used at various levels:<br /><br /><ul><li>Full Backup. 100% replication of all bits.</li><li>Incremental: Store only the bits changed since last Incremental.</li><ul><li>Notionally, the minimum storage required.</li><li>Slowest to recover: all Incrementals must be applied sequentially, in order.</li><li>Most prone to error and data loss.</li><ul><li>If one delta-file is deleted or corrupted, the entire set is useless.</li></ul></ul><li>Differential: Store all bits changed since last Full Backup.</li><ul><li>Each differential is larger than the last, potentially up to the size of a Full Backup.</li><li>Fastest to recover.</li><li>Simplest to manage</li><li>Robust against errors and deletions, if the dataset was stored.</li></ul></ul><br />Work on non-Relational Databases is occurring, but there are important challenges for relational Databases a continuous-timeline view of storage, more than the current transactional/data-wharehouse duality/conversion:<br /><ul><li>limited data storage formats can be supported, "importing and conversion" </li><li>indexing of data is a separate activity and stored/accessed differently.</li><li>Schemas and Database names have to survive changes.</li><li>Semantics of individual fields are as important as</li></ul>The Wayback Machine, a.k.a. The Internet Archive, gives us a working model and informs us that people can tolerate a) retrieval delays, b) some datasets unavailable and c) some data loss.<br /><br />It costs a <i>lot</i> less for "Best Effort" rather than "Guaranteed" storage services, suggesting multiple approaches, cost structures and service offerings in the marketplace. Hopefully consumers won't be inveigled to over-pay or complacently rely on inappropriate low-cost providers.<br /><br />Will current Consumer Protection laws need to be extended to this area??<br />If you share data within a group (Family and Friends) and some people don't maintain their part of the archive - losing data for people that rely on them, do current laws apply or will new law be needed?<br /><br />Will this lead to new businesses of "Archive Auditing"?<br /><br />There are currently three "drop-dead" problems for these services, ignoring the current "unsupported file format" and "ancient system & run-time" issues:<br /><ul><li>Currently, there is <i>no</i> archival quality digital media.<br /> Hard Disks, Flash memory and CD/DVD's have limited lifetimes. They cannot be left on a shelf and be expected to work a couple of decades on... Data must be constantly scanned, rebuilt and migrated to new storage systems.</li><ul><li>Acid-free paper and microforms will store documents for over 100 years.</li><li>Colour film is still the only archival media for movies and still images.</li><li>No good magnetic media exist for medium-long term storage of sound recordings.</li></ul></ul><ul><li>Vendor longevity and professional misconduct or negligence, even systemic corruption.</li><ul><li>When an Archival Storage Service goes bust, how do the owners of the data recover their data? Not over network links and if the facilities are locked and powered-off by administrators or sheriffs, not physically either.</li><li>There are around the world, just a few Telcos or Power Utilities that are 100 years old. Can we really expected profitable Storage to start now and last 5 times longer than Google without any commercial upsets? I'd argue "no".</li><li>Rogue admins and managers are the least of the problem, though they'll exist and cause problems.</li><li>Expecting ordinary, fallible owners, workers and managers to always resist temptation, bribery and sloth/negligence is more than naive and simplistic. Mistakes will happen, security breaches will occur and ordinary folk doing boring jobs <i>will</i> take shortcuts.</li><li>Valuable resources will always attract those wishing to steal it. These sorts of facilities must begin by never storing anything of value. Organised crime's only access must be via the individual users' system/device, not in a single, centralised resource.</li></ul></ul><ul><li>Legal access issues: a whole new area of lucrative International Law awaits us...</li><ul><li>Who has the right to look at data?</li><li>Can data "in default" (unpaid fees) be sold? To whom? At what price?</li><li>Can a Vendor move data from the Jurisdiction of origin, with or without permission?</li><li>Can Vendors share data across facilities in different Jurisdictions?</li><li>Can Storage custodians be forced to grant local Law Enforcement Offices access to individual or bulk data?</li></ul></ul>There are now three distinct views of the filesystem provided in the abstract model for user applications:<br /><ul><li>Current files</li><li>snapshots</li><li>archives</li></ul><div>The O/S has to provide these services for each of those dimensional slices through the storage:</div><div><ul><li>map names (paths) to inodes. Subsumes a "mount device/mount-point" model.</li><li>inodes (the immutable file, with metadata)</li><li>datablock link map, which reduces to start/end for contiguous allocation.</li><li>data blocks and free block list</li><li>Physical drive management, like LVM.</li></ul><div>Systems have to address four different aspects of real-world storage access:</div><div><ul><li>availability and connection paths</li><li>errors and rereads</li><li>erasures and failures</li><li>durability and longevity of data sets (protection and archive)</li></ul><div>Overlaid on this are 4-5 distinct access patterns, similar to a metal working "temperatures":</div></div><div><ul><li>"white-hot" region: read/write access on-board (RAM and PCIe Flash)</li><li>"red-hot" region: read/write access to direct-connect updatable HDD's</li><li>cool region: write once access to Big, Slow HDD's, probably non-update-in-place.</li><li>"blue" (cold) region: write once, seldom read HDD's. No update-in-place, append-only.</li><li>"black" (frozen) region: remote and archival storage. Rarely Accessed, Critical when needed.</li></ul><div>There is a direct correspondence between different temperature regions and the filesystem abstraction they are providing.</div><div><ul><li>Archives are read-only and live only in cool, cold and frozen regions.</li><li>Snapshots may be in a "red-hot" region, but otherwise in cool and cold regions.</li><ul><li>Files are ever only moved to Archive from the Snapshot areas.</li></ul><li>Current files will be migrated from, or cached into, the high-speed read/write regions on demand.</li><ul><li>The link between Snapshots and Current files is: Snapshot[0] == Current filesystem.</li></ul></ul></div><div>My thesis is that the traditional Unix filesystem and O/S structure of Directories-Inodes-Block_maps-Data_blocks cannot serve all these demands well, <b>but</b> that we already have very good tools to handle them.</div></div><div><br /></div><div>Schemes to handle inodes, Block_Maps & Linking and Block access for each "temperature" storage can be designed well for the specific trade-offs and performance expectations.</div><div><br /></div><div>The major problem appears to me to be mapping File Names to Inodes:</div><div><ul><li>It either requires very high performance and low-latency for the hottest I/O region, or</li><li>requires very large namespaces for snapshots and archives.</li><li>Indexes for Current & Snapshot views may be stored in low-latency storage, but the volume of names stored in long-term Archives means they cannot.</li></ul><div>Neither of which is well served by the traditional "directory in a block", backed by O/S cache model.</div></div><div>But both are robustly handled by Database systems, albeit differently organised, indexed and tuned.<br /><br />What is missing in normal systems is:<br /><br /><ul><li>Filesystem or storage layer of "What's Changed?" (Deltas) via md5sums or change messages.</li><li>Swapping snapshot views between "Delta"and "Full" filesystem views:</li><ul><li>'rsync' identifies changed files, but users have to create full filesystem images themselves.</li><li>Apple's TimeMachine creates a full filesystem image at a point-in-time, but provides no "Delta" interface beyond a single file or directory.</li></ul></ul></div><div>There are two implications that fall out of this analysis:<br /><br /><ul><li>Consumers will demand "Open" storage standards allowing them to swap devices, systems and Storage Vendors, not be locked into Proprietary standards, especially single-vendor solutions, and</li><li>a software solution model based on the Apache web-server or Linux kernel: co-operative Open Source backed by the GNU license. This allows all vendors to avoid license and patent issues, share work, leverage prior work, support and develop common standards, whilst also allowing market-differentiation by offering specific tools or hardware/software combinations.</li></ul><div>The current Unix-like approaches of filesystems, O/S supported directory scanning (name to inode mapping), LVM handling {data protection, logical and physical volumes}, independent snapshot/archive facilities, independent hot-plug media and manual setup and operation of Archival stores cannot provide an Identity-keyed Federated Storage & Archive system.<br /><br />Not all data stores or vendors will provide the same grade of service. Features that can be borrowed from:<br /><br /><ul><li>NTP (Network Time Protocol): stratum level of server. Just how good are they?</li><li>IP Routing: "cost of routes". Preferentially chose the faster, cheaper services.</li></ul></div><div>The main features required in an Identity-keyed Federated Storage & Archive system are:</div><div><ul><li>Data access limited by Identity (data privacy as part of "Security")</li><ul><li>Multiple Identities per user, based on role or use.</li><li>Multiple Users and Identities per Device.</li><li>Master Identity access to specified data, for work and families.</li></ul><li>Automatic implementation of Policies</li><li>Addition and management of user-managed hot-plug media</li><li>Automatic integration across all single-Identity devices of local disk, local network storage, peer storage and multiple Vendor services</li><li>Policies set as targets:</li><ul><li>Cost</li><li>Maximum size of store</li><li>Maximum data recovery time</li><li>Minimum and Maximum times between recovery points:</li><ul><li>every minute for the last 36 hours</li><li>every hour for the last fortnight</li><li>every day for the last year</li><li>every week for the last decade</li><li>every month after that</li></ul><li>normal performance: access rate, I/O per sec</li><li>By datatype, Data Resilience and Longevity (Probability Data Loss per period, Maximum data loss event size)</li><li>Warnings, Alerts and Alarms.</li><li>Default and specified Data Destruction dates</li></ul></ul></div></div></div>Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-32472438626619749672013-01-02T14:25:00.002-08:002013-01-02T14:25:56.010-08:00Storage: Specifying Data Resilience and Data Protection<br /><br />In Communications theory, there are two distinct concepts:<br /><br /><ul><li>Errors [you get a signal, but noise has induced one or more symbol errors], and</li><li>Erasures [you lost the signal for a time or it was swamped by noise]</li></ul><br />Erasures are often in "bursts", so techniques are needed to not just recover/correct a small number of symbols, but<br /><br />This is the theory behind the Reed-Solomon [Galois Field] encoding for CD's and later DVD's.<br />It uses redundant symbols to recreate the data, needing twice as many symbols to correct errors as recreate erasures. A [24,28] RS code encodes 24 symbols into 28, with 5 symbols/bytes of redundancy. This can be used to correct up to 2 errors (2*2 symbols used) plus 1 erasure.<br /><br />The innovation in CD's was applying 2 R-S codes successively, but between them using Cross Interleaving to handle burst errors by spreading a single L1 [24,28] frame across a whole 2352 sector [86?,98]. Only 1 byte of an erased L1 frame would appear in any single L2 sector.<br /><br />DVD's use a related but different form of combining two R-S codes: Internal/External Parity.<br /><br /><a href="http://www.hometheaterhifi.com/the-dvd-benchmark/the-dvd-benchmark/dvd-benchmark-part-3-functionality.html">CR-ROM's apply an L3 R-S Product Code</a> on top of the L1&L2 RS codes + CIRC to get more acceptable Bit Error Rates (BER's) of ~10^15, vs 10^9. Data per frame goes down to 2048by (2Kb) fro 2352by.<br /><br />With Hard Disks, and Storage in general, the last two big advances were:<br /><br /><ul><li>RAID [1988/9, Patterson, Gibson, Katz]</li><li>Snapshots [Network Appliance, early 1990's]</li></ul><br />RAID-3/4/5 was notionally about catering for <i>erasures</i> caused by the failure of a whole drive or component, such as a controller or cable.<br />This was done with low overhead by using the computationally cheap and fast XOR operation to calculate a single parity block.<br /><br />But in use, the ability to correct both errors and erasures with parity blocks has been conflated...<br /><br />RAID-3/4/5 is now generally though to be about <i>Error</i> Correction, not Failure Protection.<br /><br />The usual metrics quoted for HDD's & SSD's are:<br /> - MTBF (~1M hours) or Annualised Failure Rate (AFR) 0.6-0.7%<br /> - BER (unrecoverable Bit Error Rate) 1 in 10^15<br /> - Size, Avg seek time, max/sustained transfer rate.<br /><br />Operational Questions, Drive Reliability:<br /><br /> - For a fleet, per 1000 drives, average drives fail per year?<br /> [1 year = ~8700 hrs, = ~8.5M hours/year/1000 drives = 8.5 drive<br />fails/year]<br /> Alternatively, AFR: 0.6-0.7% * 1000, = 6-7 drives/1000/year<br /><br /> - What's the minimum wall-clock time to rebuild a full drive?<br /> [Size / sustained transfer rate: 4Tb @ 150MB/sec write = 7.5Hrs ]<br /><br /> - what's the likelihood of a drive fail during a rebuild?<br /> 7.5 hrs / 1M hrs = 0.001% [???] per drive.<br /> - for RAID-set of 10, (7.5/1M)/10 = 0.01%<br /><br /> - probability data loss in rebuild (N = 10):<br /> Transfer / BER = 4TB * 10 = 32 * 10* 10^12 bits =<br /> 3.2 * 10^14 / 10^15<br /> = .32 = 32% [suggests further protection is needed against data loss]<br /><br />Data Protection questions. I don't know how to address these...<br /><br /> - If we store data in RAID-6/7 units of 10-drive-equivalents<br /> with a lifetime of 5 years per set:<br /> - In a "lifetime" (60 year = 12 sets),<br /> what's the probability of Data Loss?<br /><br /> - How many geographically separated replicas do we need to<br /> store data 100 years?<br /><br /><br />I think I know how to specify Data Protection: the same way (%) as AFR.<br /><br />What you have to build for is Mean-Years-Between-Dataloss<br />and I guess that implies the degree of Dataloss: 1-by, 1-block (4Kb), 1MB?<br />And well as complete failure of a dataset-copy.<br /><br />Typical AFR's are 0.4%-0.7%, as quoted by drive manufacturers based on<br />accelerated testing.<br /><br />We know from those 2008(?) studies of large cohorts of drives, this is<br />optimistic by an order of magnitude...<br /><br />An AFR of 1 in 10^6 results in a 99.99% 100YR-F-R.<br />(1 - .0000010) ^ 100<br /><br />AFR of 1 in 10^5 is 99.9% 100YR-FR (CFR? Century Failure Rate)<br /><br /><br />AFR of 1 in 10^4 is 99.0% CFR.<br /><br />So we have to estimate a few more probabilities:<br /> - site suffering natural disaster or fire etc.<br /> - site suffering war damage or intentional attack<br /> - country or economy crumbling [ every 40-50 yrs a depression ]<br /> - company surviving (Kodak lated 100yrs<br /> - admins doing their job competently and fully.<br /> - managers not scamming (selling disks, not provide service)<br /><br />Are there more??<br />Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0tag:blogger.com,1999:blog-29875143.post-9260172004662422822012-12-17T21:17:00.000-08:002012-12-17T21:17:08.357-08:00Storage: Active spares in RAID volumesIf you have a spare HDD in a chassis powered-up and spinning, then the best use of the power you're burning is to <i>use</i> the drive.Steve Jenkinhttps://plus.google.com/109924516161100161245noreply@blogger.com0