Monday, December 22, 2014

Disk / Storage Timeline

First cut at timeline of significant events in Disk and Storage, ignoring "historical" devices like floppies and bubble memory. Edward Grochowski's 2012 "Flash Memory Summit" talk tracks multiple storage capacity, price & technology from 1990.

First commercial computers were built  in 1950 and 1951: LEO[UK], Zuse[DE] and UNIVAC[US].
LEO claim the first working Application in 1951.
 [1949: BINAC built by the Eckert–Mauchly Computer Corporation for Northrup]

Ignored technologies include:
Tapes: used in the first computers as large, cheap linear access storage.
Drums: in use a little later and continued for some time, often in specialist roles (paging).

Friday, July 04, 2014

OS/X Time Machine, performance comparison to command line tools.

A performance comparison for Mac Owners:

Q: Just how quick is Apple’s Time Machine?
A: Way faster than you can do with OS/X command line tools.

The headline is that command line tools take 80 minutes to do what Time Machine does in 3-10 mins.

Wednesday, June 18, 2014

RAID-1: Errors and Erasures calculations

RAID-1 Overheads (treating RAID-1 and RAID-10 as identical)

N = number of drives mirrored. N=2 for duplicated
G = number of drive-sets in a Volume Group.
\(N \times G\) is the total number of drives in Volume Group.
An array may be composed of many Volume Groups.

Per-Disk:
  • Effective Capacity
    • N=2. \( 1 \div 2 = 50\% \) [duplcated]
    • N=3. \(1 \div 3 = 33.3\% \) [triplicated]
  • I/O Overheads & scaling
    • Capacity Scaling: linear to max disks.
    • Random Read: \(N \times G \rm\ of\ rawdisk = N \times G \rm\ singledrive = RAID-0\)
    • Randdom Write: \(1 \times G \rm\ of\ rawdisk = 100\% \rm\ singledrive\)
    • Streaming Read: \(N \times G \rm\ of\ rawdisk = N \times G \rm\ singledrive = RAID-0\)
    • Streaming Write: \(1 \times G \rm\ of\ rawdisk = 100\% \rm\ singledrive\)

Thursday, June 12, 2014

mathjax test & Demo

MathJax setup in Blogger:
http://mytechmemo.blogspot.com.au/2012/02/how-to-write-math-formulas-in-blogger.html

MathJax Examples

Note:
  1. I had to hunt for the "HTML/Javascript" gadget, down the list aways.
  2. I ended up putting the gadget in as a footer.
  3. You'll have to add that gadget to all blogs you want it to work for.
  4. Preview and Edit mode don't compute the TeX. You need to save the doc, then view the post.
  5. In compose "Options", "Line Breaks", I'm using 'Press "Enter" for line breaks.
  6. The "MyTechMemo" author doesn't use the exact code he suggests, though it works for me. His actual gadget is:
Powered by <a href="http://www.mathjax.org/docs/1.1/start.html">MathJax</a>

<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
Alternate Hub Config in gadget, replace just first line.
MathJax.Hub.Config({
        TeX: { equationNumbers: { autoNumber: "AMS" } },
         tex2jax: {
                    inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
                   processEscapes: true }
   });

Using "all", numbers all equations.
"AMS" numbers only specified equations.
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: {autoNumber: "all"} }
});
</script>

Monday, June 09, 2014

RAID++: Erasures aren't Errors

A previous piece in this series starts as quoted below the fold, raising the question: The Berkeley group in 1987 were very smart, and Leventhal in 2009 no less smart, so how did they both make the same fundamental attribution error? This isn't just a statistical "Type I" or "Type II" error, it's conflating and confusing completely differences sources of data loss.

Sunday, June 08, 2014

RAID, Archives and Tape v Disk

There's a long raging question in I.T. Operations: How best to achieve data? [What media to use?]
This question arose again for me as I was browsing retail site.

Conclusions:

  1. The break-even for 2.5TB/6.25TB tapes is 85 and 140 tapes (compressed/uncompressed), or
    • $13,150 and $17,400 capital investment.
  2. At just 2 times data duplication, uncompressed tapes are not cost effective.
    • Enterprise backup show data duplication rates of 20-50 times.
  3. Compressed tapes are cost-effective up to 5-times data duplication.
    • If you run 10 Virtual Machines and do full backups, you've passed that threshold.

Thursday, June 05, 2014

Retail Disk prices, Enterprise drives, grouped by manufacturer & type

Table of current retail prices for various types of disk with cost-per-GB.
Only Internal drives, Hard Disks.

Disclaimer: This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.

Most drives are from a single manufacturer, Western Digital, to allow like-for-like comparisons.
Most manufacturers are close to the same pricing for the same specs.
  • There is ~$25 extra for SAS interface over SATA [1TB WD 'RE', SAS vs SATA]
  • There's ~$30/TB extra for higher spec drives [2TB & 3TB, WD SATA, NAS vs RE]
  • WD sell four 3.5" 1TB drives [03, 04, 26, 41]
    • SAS vs SATA, ~$25
    • about double for 10,000RPM over 7,200RPM (Velociraptor vs RE)
    • about 25% less for the Intellipower, 'Capacity' drive
  • While it's cheaper with Seagate to go from 15,000RPM/3.5" to 10,000RPM/2.5", there's no simple relation for the discount.
Western Digital list these "Purchase Decision Criteria" for drives:
  • Capacity [GB]
  • Workload Capability [duty cycle or TB read/write per year]
  • Reliability [MTBF and BER]
  • Cost/GB
  • Performance [sustained throughput,  latency or IO/sec = {RPM, seek time}]
  • Power used [not included by WD]
  • Racking density [not included by WD]

Sunday, June 01, 2014

Historical External Disk Storage Data: IDC Worldwide tracking report

Data from IDC's Quarterly Worldwide External Disk Storage Systems Factory Revenues series (Press Releases). Multiply quarterly values by 4 for an approx yearly value. Full data not available prior to 2011.
For 2013: US$24.4 billion and 34.6PB.

Tuesday, May 27, 2014

"MAID" using 2.5 in drives

What would a current attempt at MAID look like with 2.5" drives?

"MAID", Massive Array of Idle Disks, was an attempt by Copan Systems (bought by SGI in 2009) at near-line Bulk Storage. It had a novel design innovation, mounting drives vertically back-to-back in slide-out canisters (patented), and was based on an interesting design principle: off-line storage can mostly be powered down.

It was a credible attempt, coming out of The Internet Archive, and their "Petabox" (a more technical view and on Wikipedia).  At 24 x 3.5" drives per 4RU, they contain around half the 45 drives of the Backblaze 4.0 Storage Pod. The Petabox has 10Gbps uplinks and much beefier CPU's and more DRAM.

The Xyratex ClusterStor (now Seagate) offers another benchmark: their Scalable Storage Unit (SSU) stores 3 rows of 14 drives in 2.5RU x 450mm slide-out draws, allowing hot-plug access to all drives. Two SSU's comprise a single 5RU unit of 84 drives, with up to 14 SSU's per rack for 1176 drives per rack, an average of 28 x 3.5" drives per Rack Unit.

Sunday, May 04, 2014

RAID-0 and RAID-3/4 Spares

This piece is not based on an exhaustive search of the literature. It addresses a problem that doesn't seem to have been addressed as RAID-0 and the related RAID-3/4, a single parity drive.

Single parity drives seem to be deemed early on to be impractical because it apparently comprises a deliberate system bottleneck. RAID-3/4 has no bottleneck for streaming reads/writes and for writes, performance becomes, not approaches, the raw write performance of the array is available, identical to RAID-0 (stripe). For random writes, the 100-150 times speed differential between sequential and random access of modern drives can be leveraged with a suitable buffer to remove the bottleneck. The larger the buffer, the more likely the pre-read of data, to save to calculate the new parity, won't be needed. This triples the array throughput by avoiding the full revolution forced by the read/write-back cycle.

Multiple copies of the parity drive (RAID-1) can be kept to mitigate against the very costly failure of a parity drive: all blocks on every drive must be reread to recreate a failed parity drive. For large RAID groups and the very low price of small drives, this is not expensive.

With the availability of affordable, large SSD's, naive management of a single parity drive also removes the bottleneck for quite large RAID groups. The SSD can be backed by a log-structured recovery drive, trading on-line random IO performance for rebuild time.

Designing Local and/or Global Spares for large (N=64..512) RAID sets is necessary to reduce overhead, improve reconstruction times and avoid unnecessary partitioning, limiting recovery options and causing avoidable data loss events.

Saturday, May 03, 2014

Comparing consumer drives in small-systems RAID

This stems from an email conversation with a friend: why would he be interested in using 2.5" drives in RAID, not 3.5"?

There are two key questions for Admins at this scale, and my friend was exceedingly sceptical of my suggestion:
  • Cost/GB
  • 'performance' of 2.5" 5400RPM drives vs 7200RPM drives.
I've used retail pricing for the comparisons. Pricelist and sorted pricelist.

Retail Disk Prices, as printed

Table of current retail prices for various types of disk with cost-per-GB.

Disclaimer: This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.

Retail disk prices, sorted.

Table of current retail prices for various types of disk, sorted on cost-per-GB.
Disclaimer: This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.

3.5" Internal drives are the cheapest $/GB, ranging from 4.3 cents/GB to 10-11 cents/GB. Generally, larger drives have cheaper $/GB. Higher spec drives, suitable for high duty-cycle applications, are more expensive. This retailer doesn't sell 10K or SAS drives.

It's not possible to track 3.5" drives from Internal to External to arrive at a cost of packaging.

2.5" Internal drives range 8 to 16.5 cents/GB, generally higher than 3.5" drive costs. There seems to be little extra cost of packaging for external drives. There is a small premium in consumer drives for 7200RPM. This retailer only sells 2TB drives (15mm vs 9.5mm?) as external drives.

There was no information in the retailers rather compact format on the thickness (5mm, 7mm, 9.5mm, 12.5mm, 15mm) of 2.5" drives.

Solid State Disks are 5+ times more expensive than Hard Disk Drives, at 59 cents/GB to $1.37/GB.
The smaller mSATA drives start at 72.8 cents/GB.
No supplier information on SSD specs are included: SLC/MLC, transfer rates, IO/sec and number of write cycles. SSD's are very sensitive to wear and device selection requires very careful reading of device specifications.


01-May-2014
http://www.msy.com.au/Parts/PARTS.pdf

F-FacTypDskConnRPMBrand/ModelCapCost$/GBGB

3.5"IntHDDSATA37200?WD Green EZRX3TB1290.04303000GB
3.5"IntHDDSATA37200?Seagate3TB1290.04303000GB
3.5"IntHDDSATA37200?WD Green EZRX4TB1850.04624000GB
3.5"IntHDDSATA37200?Seagate4TB1890.04734000GB
3.5"IntHDDSATA37200?WD Green EZRX2TB950.04752000GB
3.5"IntHDDSATA37200?Seagate2TB950.04752000GB
3.5"IntHDDSATA37200?Seagate NAS3TB1600.05333000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX3TB1650.05503000GB
3.5"IntHDDSATA37200?Seagate NAS4TB2290.05734000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX4TB2350.05874000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance3TB1790.05973000GB
3.5"IntHDDSATA?7200?Hitachi HGST NAS3TB1790.05973000GB
3.5"IntHDDSATA?7200?Hitachi HGST NAS4TB2490.06224000GB
3.5"IntHDDSATA37200?Seagate NAS2TB1250.06252000GB
3.5"IntHDDSATA37200?WD Green EZRX1TB640.06401000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX2TB1290.06452000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance4TB2590.06484000GB
3.5"IntHDDSATA37200?Seagate1TB650.06501000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance2TB1350.06752000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX1TB890.08901000GB
3.5"IntHDDSATA27200?Hitachi HGST UltraStar1TB890.08901000GB
3.5"IntHDDSATA37200?Hitachi HGST3TB2700.09003000GB
3.5"IntHDDSATA37200?Hitachi HGST4TB3650.09124000GB
3.5"IntHDDSATA37200?Hitachi HGST2TB1850.09252000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance1TB950.09501000GB
3.5"IntHDDSATA37200?Seagate500G550.1100500GB

3.5"ExtHDDUSB3.07200?WD Element3TB1290.04303000GB
3.5"ExtHDDUSB3.07200?Seagate Expansion3TB1390.04633000GB
3.5"ExtHDDUSB3.07200?Seagate Expansion2TB950.04752000GB
3.5"ExtHDDUSB3.07200?WD Mybook Essential3TB1490.04973000GB
3.5"ExtHDDUSB3.07200?WD Mybook Essential4TB2090.05224000GB
3.5"ExtHDDUSB3.07200?Seagate BackUp Plus3TB1590.05303000GB
3.5"ExtHDDUSB3.07200?Seagate BackUp Plus2TB1150.05752000GB
3.5"ExtHDDUSB3.07200?WD Mybook Essential2TB1390.06952000GB

2.5"IntHDDSATA?5400Hitachi HGST1TB800.08001000GB
2.5"IntHDDSATA?5400WD JPVX1TB830.08301000GB
2.5"IntHDDSATA?5400WD BPVX750G640.0853750GB
2.5"IntHDDSATA?5400Hitachi HGST750G660.0880750GB
2.5"IntHDDSATA?5400Hitachi HGST1.5TB1390.09271500GB
2.5"IntHDDSATA?7200Hitachi HGST1TB930.09301000GB
2.5"IntHDDSATA?7200WD BPKX750G780.1040750GB
2.5"IntHDDSATA?5400Hitachi HGST500G550.1100500GB
2.5"IntHDDSATA?5400Seagate500G560.1120500GB
2.5"IntHDDSATA?7200Hitachi HGST750G850.1133750GB
2.5"IntHDDSATA?5400WD LPVX500G570.1140500GB
2.5"IntHDDSATA?7200Hitachi HGST500G640.1280500GB
2.5"IntHDDSATA?7200Seagate500G640.1280500GB
2.5"IntHDDSATA?7200WD BPKX500G670.1340500GB
2.5"IntHDDSATA?5400Hitachi HGST320G530.1656320GB
2.5"IntHDDSATA?5400WD LPVX320G530.1656320GB
2.5"IntHDDSATA?5400Seagate320G530.1656320GB

2.5"ExtHDDUSB3.05400?WD Element2TB1490.07452000GB
2.5"ExtHDDUSB3.05400?Samsung2TB1490.07452000GB
2.5"ExtHDDUSB3.05400?Samsung1.5TB1150.07671500GB
2.5"ExtHDDUSB3.05400?WD Passport2TB1590.07952000GB
2.5"ExtHDDUSB?.05400?Hitachi HGST Touro Mobile1TB800.08001000GB
2.5"ExtHDDUSB3.05400?WD Passport Ultra2TB1650.08252000GB
2.5"ExtHDDUSB3.05400?WD Passport1.5TB1290.08601500GB
2.5"ExtHDDUSB3.05400?Samsung1TB860.08601000GB
2.5"ExtHDDUSB3.05400?WD Element1TB890.08901000GB
2.5"ExtHDDUSB3.05400?WD Passport1TB890.08901000GB
2.5"ExtHDDUSB?.05400?Hitachi HGST Touro Pro1TB920.09201000GB
2.5"ExtHDDUSB3.05400?Seagate BackUp Plus1TB990.09901000GB
2.5"ExtHDDUSB3.05400?WD Passport Ultra1TB1040.10401000GB
2.5"ExtHDDUSB?.05400?Hitachi HGST Touro Mobile500G560.1120500GB
2.5"ExtHDDUSB3.05400?Samsung500G620.1240500GB
2.5"ExtHDDUSB3.05400?Seagate Expansion500G690.1380500GB
2.5"ExtHDDUSB3.05400?WD Passport Ultra500G740.1480500GB
2.5"ExtHDDUSB3.05400?Seagate BackUp Plus500G880.1760500GB

2.5"IntSSDSATA3-Samsung 840 EVO1TB5890.58901000GB
2.5"IntSSDSATA?-SanDisk Ultra Plus256G1570.6133256GB
2.5"IntSSDSATA?-Seagate 600480G2990.6229480GB
2.5"IntSSDSATA?-Plextor M5-PRO512G3290.6426512GB
2.5"IntSSDSATA?-Plextor M5S256G1680.6562256GB
2.5"IntSSDSATA3-Samsung 840 EVO500G3290.6580500GB
2.5"IntSSDSATA?-Kingston V300240G1590.6625240GB
2.5"IntSSDSATA?-Seagate 600240G1590.6625240GB
2.5"IntSSDSATA?-Fujitsu256G1700.6641256GB
2.5"IntSSDSATA3-Samsung 840 EVO250G1700.6800250GB
2.5"IntSSDSATA?-Kingston V300480G3290.6854480GB
2.5"IntSSDSATA?-SanDisk Ultra Plus128G890.6953128GB
2.5"IntSSDSATA?-Plextor M5-PRO256G1790.6992256GB
2.5"IntSSDmSATA3-Samsung 840 EVO250G1820.7280250GB
2.5"IntSSDSATA?-Kingston V300120G880.7333120GB
2.5"IntSSDSATA?-Fujitsu512G3830.7480512GB
2.5"IntSSDSATA?-OCZ Vertec 450128G970.7578128GB
2.5"IntSSDSATA?-SanDisk Extreme240G1850.7708240GB
2.5"IntSSDSATA?-Fujitsu128G990.7734128GB
2.5"IntSSDSATA?-SanDisk Extreme II480G3790.7896480GB
2.5"IntSSDSATA3-Samsung 840 EVO120G950.7917120GB
2.5"IntSSDSATA?-SanDisk Extreme II240G1950.8125240GB
2.5"IntSSDSATA?-Seagate 600120G990.8250120GB
2.5"IntSSDSATA?-Kingston HyperX240G1990.8292240GB
2.5"IntSSDSATA3-Samsung 840 PRO512G4390.8574512GB
2.5"IntSSDSATA?-Intel 520120G1040.8667120GB
2.5"IntSSDSATA?-Intel 530240G2090.8708240GB
2.5"IntSSDSATA?-Kingston HyperX120G1050.8750120GB
2.5"IntSSDmSATA3-Samsung 840 EVO120G1050.8750120GB
2.5"IntSSDSATA3-Samsung 840 PRO256G2320.9062256GB
2.5"IntSSDSATA?-Intel 530120G1150.9583120GB
2.5"IntSSDSATA?-SanDisk Extreme II120G1180.9833120GB
2.5"IntSSDSATA?-Kingston SMS200s3120G1190.9917120GB
2.5"IntSSDmSATA3-Intel 530240G2421.0083240GB
2.5"IntSSDSATA?-Intel 530180G1841.0222180GB
2.5"IntSSDSATA?-Plextor M5-PRO128G1351.0547128GB
2.5"IntSSDSATA?-Fujitsu64G691.078164GB
2.5"IntSSDSATA3-Samsung 840 PRO128G1381.0781128GB
2.5"IntSSDSATA?-Kingston V30060G651.083360GB
2.5"IntSSDmSATA3-Intel 530120G1301.0833120GB
2.5"IntSSDmSATA3-Intel 525120G1391.1583120GB
2.5"IntSSDSATA?-SanDisk Ultra Plus64G751.171964GB
2.5"IntSSDSATA?-Intel 730240G2851.1875240GB
2.5"IntSSDSATA?-Intel S3500240G2881.2000240GB
2.5"IntSSDSATA?-Kingston SMS200s360G761.266760GB
2.5"IntSSDSATA?-Intel S3500160G2091.3062160GB
2.5"IntSSDSATA?-Intel S3500120G1641.3667120GB

2.5"IntSSHDSATA?5400?Seagate1TB1290.12901000GB
2.5"IntSSHDSATA?5400?Seagate500G850.1700500GB
Retail Disk Prices

Tuesday, April 22, 2014

RAID++: RAID-0+ECC

Current RAID schemes, and going back to the 1987/8 Patterson, Gibson, Katz RAID paper, make no distinction between transient and permanent failures: errors or dropouts versus failure.

Monday, April 21, 2014

Storage: Spares and Parity in large disk collections

What approaches are available to deal with spare drives and RAID parity for 300-1,000 drives in a single box?
Will existing models scale well?
Do other technologies fill any gaps?

Storage: First look at Hardware block diagram

Stuffing 500-1000 2.5" drives in an enclosure is just the start of a design adventure.

The simplest being choosing fixed or hot-plug drive mounting. There's a neat slide-out tray system for 3.5" drives that allows hot-plug access for densely vertically packed drives that could be adapted to 2.5" drives.

Sunday, April 20, 2014

Storage: Challenges of high-count disk enclosures

Stuffing 500-1,000 2.5" drives in a single enclosure may be technically possible, but how do you make those drives do anything useful?

Increasing drives per enclosure from 15-45 for 3.5" drives to 1,000 requires a deep rethink of target market, goals and design.

Not the least is dealing drive failures. With an Annualised Failure Rate (AFR) of 0.4%-0.75% now quoted by Drive Vendors, dealing with 5-15 drive failures per unit, per year is a given. In practice, failure rates are at least twice the Vendor quoted AFR not the least because in systems, conditions can be harsh and other components/connectors also fail, not just drives. Drives have a design life of 5 years, with an expected duty-cycle. Consumer-grade drives aren't expected to run 24/7 like the more expensive enterprise drives. Fail Rates over time, when measured on large fleets in service, increase over time and considerably towards end of life.

It's isn't enough to say "we're trying to minimise per unit costs", all designs do that, but for different criteria.
What matters is the constraints you're working against or parameters being optimised.

Storage: How many drives can be stuffed in a Box?

How many 2.5" drives can be stuffed into a single enclosure, allowing space for power, cooling, wiring and a single motherboard? Short answer: ~500-1000.

Sunday, March 23, 2014

Storage: more capacity calculations

Following on from the previous post on Efficiency and Capacity, baselining "A pile of Disks" as "100% efficient".

Some additional considerations:

Thursday, March 20, 2014

Storage: Efficiency measures

In 2020 we can expect bigger disk drives and hence Petabyte stores. Price per bit will come at a premium, it won't track capacity as it does now: larger capacity drives will cost more per unit.

What are the theoretical limits on which Storage solution "efficiency" can be judged?

We're slowly approaching what could be the last factor-10 improvement, to 10Tbits/in², in rotational 2-D magnetic recording technologies of Hard Disk Drives. Jim Gray (~2000) and Mark Kryder (2009) suggested 7TB/platter for 2.5" disk drives by 2020, assuming a 40%/yr capacity growth.

Rosenthal et al (2012) suggest that, like CPU-speed "Moore's Law", disk capacity growth rates have slowed, suggesting 100Tbits/in² may be possible in the far future. They predict 1.8 Tbits/in² commercially available in 2020, vs 0-6-0.7Tb/in² currently.