Monday, December 22, 2014

Disk / Storage Timeline

First cut at timeline of significant events in Disk and Storage, ignoring "historical" devices like floppies and bubble memory. Edward Grochowski's 2012 "Flash Memory Summit" talk tracks multiple storage capacity, price & technology from 1990.

First commercial computers were built  in 1950 and 1951: LEO[UK], Zuse[DE] and UNIVAC[US].
LEO claim the first working Application in 1951.
 [1949: BINAC built by the Eckert–Mauchly Computer Corporation for Northrup]

Ignored technologies include:
Tapes: used in the first computers as large, cheap linear access storage.
Drums: in use a little later and continued for some time, often in specialist roles (paging).

Friday, July 04, 2014

OS/X Time Machine, performance comparison to command line tools.

A performance comparison for Mac Owners:

Q: Just how quick is Apple’s Time Machine?
A: Way faster than you can do with OS/X command line tools.

The headline is that command line tools take 80 minutes to do what Time Machine does in 3-10 mins.

Wednesday, June 18, 2014

RAID-1: Errors and Erasures calculations

RAID-1 Overheads (treating RAID-1 and RAID-10 as identical)

N = number of drives mirrored. N=2 for duplicated
G = number of drive-sets in a Volume Group.
\(N \times G\) is the total number of drives in Volume Group.
An array may be composed of many Volume Groups.

  • Effective Capacity
    • N=2. \( 1 \div 2 = 50\% \) [duplcated]
    • N=3. \(1 \div 3 = 33.3\% \) [triplicated]
  • I/O Overheads & scaling
    • Capacity Scaling: linear to max disks.
    • Random Read: \(N \times G \rm\ of\ rawdisk = N \times G \rm\ singledrive = RAID-0\)
    • Randdom Write: \(1 \times G \rm\ of\ rawdisk = 100\% \rm\ singledrive\)
    • Streaming Read: \(N \times G \rm\ of\ rawdisk = N \times G \rm\ singledrive = RAID-0\)
    • Streaming Write: \(1 \times G \rm\ of\ rawdisk = 100\% \rm\ singledrive\)

Thursday, June 12, 2014

mathjax test & Demo

MathJax setup in Blogger:

MathJax Examples

  1. I had to hunt for the "HTML/Javascript" gadget, down the list aways.
  2. I ended up putting the gadget in as a footer.
  3. You'll have to add that gadget to all blogs you want it to work for.
  4. Preview and Edit mode don't compute the TeX. You need to save the doc, then view the post.
  5. In compose "Options", "Line Breaks", I'm using 'Press "Enter" for line breaks.
  6. The "MyTechMemo" author doesn't use the exact code he suggests, though it works for me. His actual gadget is:
Powered by <a href="">MathJax</a>

<script type="text/javascript" src="">
Alternate Hub Config in gadget, replace just first line.
        TeX: { equationNumbers: { autoNumber: "AMS" } },
         tex2jax: {
                    inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
                   processEscapes: true }

Using "all", numbers all equations.
"AMS" numbers only specified equations.
<script type="text/x-mathjax-config">
TeX: { equationNumbers: {autoNumber: "all"} }

Monday, June 09, 2014

RAID++: Erasures aren't Errors

A previous piece in this series starts as quoted below the fold, raising the question: The Berkeley group in 1987 were very smart, and Leventhal in 2009 no less smart, so how did they both make the same fundamental attribution error? This isn't just a statistical "Type I" or "Type II" error, it's conflating and confusing completely differences sources of data loss.

Sunday, June 08, 2014

RAID, Archives and Tape v Disk

There's a long raging question in I.T. Operations: How best to achieve data? [What media to use?]
This question arose again for me as I was browsing retail site.


  1. The break-even for 2.5TB/6.25TB tapes is 85 and 140 tapes (compressed/uncompressed), or
    • $13,150 and $17,400 capital investment.
  2. At just 2 times data duplication, uncompressed tapes are not cost effective.
    • Enterprise backup show data duplication rates of 20-50 times.
  3. Compressed tapes are cost-effective up to 5-times data duplication.
    • If you run 10 Virtual Machines and do full backups, you've passed that threshold.

Thursday, June 05, 2014

Retail Disk prices, Enterprise drives, grouped by manufacturer & type

Table of current retail prices for various types of disk with cost-per-GB.
Only Internal drives, Hard Disks.

Disclaimer: This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.

Most drives are from a single manufacturer, Western Digital, to allow like-for-like comparisons.
Most manufacturers are close to the same pricing for the same specs.
  • There is ~$25 extra for SAS interface over SATA [1TB WD 'RE', SAS vs SATA]
  • There's ~$30/TB extra for higher spec drives [2TB & 3TB, WD SATA, NAS vs RE]
  • WD sell four 3.5" 1TB drives [03, 04, 26, 41]
    • SAS vs SATA, ~$25
    • about double for 10,000RPM over 7,200RPM (Velociraptor vs RE)
    • about 25% less for the Intellipower, 'Capacity' drive
  • While it's cheaper with Seagate to go from 15,000RPM/3.5" to 10,000RPM/2.5", there's no simple relation for the discount.
Western Digital list these "Purchase Decision Criteria" for drives:
  • Capacity [GB]
  • Workload Capability [duty cycle or TB read/write per year]
  • Reliability [MTBF and BER]
  • Cost/GB
  • Performance [sustained throughput,  latency or IO/sec = {RPM, seek time}]
  • Power used [not included by WD]
  • Racking density [not included by WD]

Sunday, June 01, 2014

Historical External Disk Storage Data: IDC Worldwide tracking report

Data from IDC's Quarterly Worldwide External Disk Storage Systems Factory Revenues series (Press Releases). Multiply quarterly values by 4 for an approx yearly value. Full data not available prior to 2011.
For 2013: US$24.4 billion and 34.6PB.

Tuesday, May 27, 2014

"MAID" using 2.5 in drives

What would a current attempt at MAID look like with 2.5" drives?

"MAID", Massive Array of Idle Disks, was an attempt by Copan Systems (bought by SGI in 2009) at near-line Bulk Storage. It had a novel design innovation, mounting drives vertically back-to-back in slide-out canisters (patented), and was based on an interesting design principle: off-line storage can mostly be powered down.

It was a credible attempt, coming out of The Internet Archive, and their "Petabox" (a more technical view and on Wikipedia).  At 24 x 3.5" drives per 4RU, they contain around half the 45 drives of the Backblaze 4.0 Storage Pod. The Petabox has 10Gbps uplinks and much beefier CPU's and more DRAM.

The Xyratex ClusterStor (now Seagate) offers another benchmark: their Scalable Storage Unit (SSU) stores 3 rows of 14 drives in 2.5RU x 450mm slide-out draws, allowing hot-plug access to all drives. Two SSU's comprise a single 5RU unit of 84 drives, with up to 14 SSU's per rack for 1176 drives per rack, an average of 28 x 3.5" drives per Rack Unit.

Sunday, May 04, 2014

RAID-0 and RAID-3/4 Spares

This piece is not based on an exhaustive search of the literature. It addresses a problem that doesn't seem to have been addressed as RAID-0 and the related RAID-3/4, a single parity drive.

Single parity drives seem to be deemed early on to be impractical because it apparently comprises a deliberate system bottleneck. RAID-3/4 has no bottleneck for streaming reads/writes and for writes, performance becomes, not approaches, the raw write performance of the array is available, identical to RAID-0 (stripe). For random writes, the 100-150 times speed differential between sequential and random access of modern drives can be leveraged with a suitable buffer to remove the bottleneck. The larger the buffer, the more likely the pre-read of data, to save to calculate the new parity, won't be needed. This triples the array throughput by avoiding the full revolution forced by the read/write-back cycle.

Multiple copies of the parity drive (RAID-1) can be kept to mitigate against the very costly failure of a parity drive: all blocks on every drive must be reread to recreate a failed parity drive. For large RAID groups and the very low price of small drives, this is not expensive.

With the availability of affordable, large SSD's, naive management of a single parity drive also removes the bottleneck for quite large RAID groups. The SSD can be backed by a log-structured recovery drive, trading on-line random IO performance for rebuild time.

Designing Local and/or Global Spares for large (N=64..512) RAID sets is necessary to reduce overhead, improve reconstruction times and avoid unnecessary partitioning, limiting recovery options and causing avoidable data loss events.

Saturday, May 03, 2014

Comparing consumer drives in small-systems RAID

This stems from an email conversation with a friend: why would he be interested in using 2.5" drives in RAID, not 3.5"?

There are two key questions for Admins at this scale, and my friend was exceedingly sceptical of my suggestion:
  • Cost/GB
  • 'performance' of 2.5" 5400RPM drives vs 7200RPM drives.
I've used retail pricing for the comparisons. Pricelist and sorted pricelist.

Retail Disk Prices, as printed

Table of current retail prices for various types of disk with cost-per-GB.

Disclaimer: This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.

Retail disk prices, sorted.

Table of current retail prices for various types of disk, sorted on cost-per-GB.
Disclaimer: This table is for my own point-in-time reference, does not carry any implicit or explicit recommendations or endorsement for the retailer, vendor or technologies.

3.5" Internal drives are the cheapest $/GB, ranging from 4.3 cents/GB to 10-11 cents/GB. Generally, larger drives have cheaper $/GB. Higher spec drives, suitable for high duty-cycle applications, are more expensive. This retailer doesn't sell 10K or SAS drives.

It's not possible to track 3.5" drives from Internal to External to arrive at a cost of packaging.

2.5" Internal drives range 8 to 16.5 cents/GB, generally higher than 3.5" drive costs. There seems to be little extra cost of packaging for external drives. There is a small premium in consumer drives for 7200RPM. This retailer only sells 2TB drives (15mm vs 9.5mm?) as external drives.

There was no information in the retailers rather compact format on the thickness (5mm, 7mm, 9.5mm, 12.5mm, 15mm) of 2.5" drives.

Solid State Disks are 5+ times more expensive than Hard Disk Drives, at 59 cents/GB to $1.37/GB.
The smaller mSATA drives start at 72.8 cents/GB.
No supplier information on SSD specs are included: SLC/MLC, transfer rates, IO/sec and number of write cycles. SSD's are very sensitive to wear and device selection requires very careful reading of device specifications.



3.5"IntHDDSATA37200?WD Green EZRX3TB1290.04303000GB
3.5"IntHDDSATA37200?WD Green EZRX4TB1850.04624000GB
3.5"IntHDDSATA37200?WD Green EZRX2TB950.04752000GB
3.5"IntHDDSATA37200?Seagate NAS3TB1600.05333000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX3TB1650.05503000GB
3.5"IntHDDSATA37200?Seagate NAS4TB2290.05734000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX4TB2350.05874000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance3TB1790.05973000GB
3.5"IntHDDSATA?7200?Hitachi HGST NAS3TB1790.05973000GB
3.5"IntHDDSATA?7200?Hitachi HGST NAS4TB2490.06224000GB
3.5"IntHDDSATA37200?Seagate NAS2TB1250.06252000GB
3.5"IntHDDSATA37200?WD Green EZRX1TB640.06401000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX2TB1290.06452000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance4TB2590.06484000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance2TB1350.06752000GB
3.5"IntHDDSATA37200?WD Red NAS EFRX1TB890.08901000GB
3.5"IntHDDSATA27200?Hitachi HGST UltraStar1TB890.08901000GB
3.5"IntHDDSATA37200?Hitachi HGST3TB2700.09003000GB
3.5"IntHDDSATA37200?Hitachi HGST4TB3650.09124000GB
3.5"IntHDDSATA37200?Hitachi HGST2TB1850.09252000GB
3.5"IntHDDSATA37200?WD Purple PURX Surveillance1TB950.09501000GB

3.5"ExtHDDUSB3.07200?WD Element3TB1290.04303000GB
3.5"ExtHDDUSB3.07200?Seagate Expansion3TB1390.04633000GB
3.5"ExtHDDUSB3.07200?Seagate Expansion2TB950.04752000GB
3.5"ExtHDDUSB3.07200?WD Mybook Essential3TB1490.04973000GB
3.5"ExtHDDUSB3.07200?WD Mybook Essential4TB2090.05224000GB
3.5"ExtHDDUSB3.07200?Seagate BackUp Plus3TB1590.05303000GB
3.5"ExtHDDUSB3.07200?Seagate BackUp Plus2TB1150.05752000GB
3.5"ExtHDDUSB3.07200?WD Mybook Essential2TB1390.06952000GB

2.5"IntHDDSATA?5400Hitachi HGST1TB800.08001000GB
2.5"IntHDDSATA?5400WD JPVX1TB830.08301000GB
2.5"IntHDDSATA?5400WD BPVX750G640.0853750GB
2.5"IntHDDSATA?5400Hitachi HGST750G660.0880750GB
2.5"IntHDDSATA?5400Hitachi HGST1.5TB1390.09271500GB
2.5"IntHDDSATA?7200Hitachi HGST1TB930.09301000GB
2.5"IntHDDSATA?7200WD BPKX750G780.1040750GB
2.5"IntHDDSATA?5400Hitachi HGST500G550.1100500GB
2.5"IntHDDSATA?7200Hitachi HGST750G850.1133750GB
2.5"IntHDDSATA?5400WD LPVX500G570.1140500GB
2.5"IntHDDSATA?7200Hitachi HGST500G640.1280500GB
2.5"IntHDDSATA?7200WD BPKX500G670.1340500GB
2.5"IntHDDSATA?5400Hitachi HGST320G530.1656320GB
2.5"IntHDDSATA?5400WD LPVX320G530.1656320GB

2.5"ExtHDDUSB3.05400?WD Element2TB1490.07452000GB
2.5"ExtHDDUSB3.05400?WD Passport2TB1590.07952000GB
2.5"ExtHDDUSB?.05400?Hitachi HGST Touro Mobile1TB800.08001000GB
2.5"ExtHDDUSB3.05400?WD Passport Ultra2TB1650.08252000GB
2.5"ExtHDDUSB3.05400?WD Passport1.5TB1290.08601500GB
2.5"ExtHDDUSB3.05400?WD Element1TB890.08901000GB
2.5"ExtHDDUSB3.05400?WD Passport1TB890.08901000GB
2.5"ExtHDDUSB?.05400?Hitachi HGST Touro Pro1TB920.09201000GB
2.5"ExtHDDUSB3.05400?Seagate BackUp Plus1TB990.09901000GB
2.5"ExtHDDUSB3.05400?WD Passport Ultra1TB1040.10401000GB
2.5"ExtHDDUSB?.05400?Hitachi HGST Touro Mobile500G560.1120500GB
2.5"ExtHDDUSB3.05400?Seagate Expansion500G690.1380500GB
2.5"ExtHDDUSB3.05400?WD Passport Ultra500G740.1480500GB
2.5"ExtHDDUSB3.05400?Seagate BackUp Plus500G880.1760500GB

2.5"IntSSDSATA3-Samsung 840 EVO1TB5890.58901000GB
2.5"IntSSDSATA?-SanDisk Ultra Plus256G1570.6133256GB
2.5"IntSSDSATA?-Seagate 600480G2990.6229480GB
2.5"IntSSDSATA?-Plextor M5-PRO512G3290.6426512GB
2.5"IntSSDSATA?-Plextor M5S256G1680.6562256GB
2.5"IntSSDSATA3-Samsung 840 EVO500G3290.6580500GB
2.5"IntSSDSATA?-Kingston V300240G1590.6625240GB
2.5"IntSSDSATA?-Seagate 600240G1590.6625240GB
2.5"IntSSDSATA3-Samsung 840 EVO250G1700.6800250GB
2.5"IntSSDSATA?-Kingston V300480G3290.6854480GB
2.5"IntSSDSATA?-SanDisk Ultra Plus128G890.6953128GB
2.5"IntSSDSATA?-Plextor M5-PRO256G1790.6992256GB
2.5"IntSSDmSATA3-Samsung 840 EVO250G1820.7280250GB
2.5"IntSSDSATA?-Kingston V300120G880.7333120GB
2.5"IntSSDSATA?-OCZ Vertec 450128G970.7578128GB
2.5"IntSSDSATA?-SanDisk Extreme240G1850.7708240GB
2.5"IntSSDSATA?-SanDisk Extreme II480G3790.7896480GB
2.5"IntSSDSATA3-Samsung 840 EVO120G950.7917120GB
2.5"IntSSDSATA?-SanDisk Extreme II240G1950.8125240GB
2.5"IntSSDSATA?-Seagate 600120G990.8250120GB
2.5"IntSSDSATA?-Kingston HyperX240G1990.8292240GB
2.5"IntSSDSATA3-Samsung 840 PRO512G4390.8574512GB
2.5"IntSSDSATA?-Intel 520120G1040.8667120GB
2.5"IntSSDSATA?-Intel 530240G2090.8708240GB
2.5"IntSSDSATA?-Kingston HyperX120G1050.8750120GB
2.5"IntSSDmSATA3-Samsung 840 EVO120G1050.8750120GB
2.5"IntSSDSATA3-Samsung 840 PRO256G2320.9062256GB
2.5"IntSSDSATA?-Intel 530120G1150.9583120GB
2.5"IntSSDSATA?-SanDisk Extreme II120G1180.9833120GB
2.5"IntSSDSATA?-Kingston SMS200s3120G1190.9917120GB
2.5"IntSSDmSATA3-Intel 530240G2421.0083240GB
2.5"IntSSDSATA?-Intel 530180G1841.0222180GB
2.5"IntSSDSATA?-Plextor M5-PRO128G1351.0547128GB
2.5"IntSSDSATA3-Samsung 840 PRO128G1381.0781128GB
2.5"IntSSDSATA?-Kingston V30060G651.083360GB
2.5"IntSSDmSATA3-Intel 530120G1301.0833120GB
2.5"IntSSDmSATA3-Intel 525120G1391.1583120GB
2.5"IntSSDSATA?-SanDisk Ultra Plus64G751.171964GB
2.5"IntSSDSATA?-Intel 730240G2851.1875240GB
2.5"IntSSDSATA?-Intel S3500240G2881.2000240GB
2.5"IntSSDSATA?-Kingston SMS200s360G761.266760GB
2.5"IntSSDSATA?-Intel S3500160G2091.3062160GB
2.5"IntSSDSATA?-Intel S3500120G1641.3667120GB

Retail Disk Prices

Tuesday, April 22, 2014


Current RAID schemes, and going back to the 1987/8 Patterson, Gibson, Katz RAID paper, make no distinction between transient and permanent failures: errors or dropouts versus failure.

Monday, April 21, 2014

Storage: Spares and Parity in large disk collections

What approaches are available to deal with spare drives and RAID parity for 300-1,000 drives in a single box?
Will existing models scale well?
Do other technologies fill any gaps?

Storage: First look at Hardware block diagram

Stuffing 500-1000 2.5" drives in an enclosure is just the start of a design adventure.

The simplest being choosing fixed or hot-plug drive mounting. There's a neat slide-out tray system for 3.5" drives that allows hot-plug access for densely vertically packed drives that could be adapted to 2.5" drives.

Sunday, April 20, 2014

Storage: Challenges of high-count disk enclosures

Stuffing 500-1,000 2.5" drives in a single enclosure may be technically possible, but how do you make those drives do anything useful?

Increasing drives per enclosure from 15-45 for 3.5" drives to 1,000 requires a deep rethink of target market, goals and design.

Not the least is dealing drive failures. With an Annualised Failure Rate (AFR) of 0.4%-0.75% now quoted by Drive Vendors, dealing with 5-15 drive failures per unit, per year is a given. In practice, failure rates are at least twice the Vendor quoted AFR not the least because in systems, conditions can be harsh and other components/connectors also fail, not just drives. Drives have a design life of 5 years, with an expected duty-cycle. Consumer-grade drives aren't expected to run 24/7 like the more expensive enterprise drives. Fail Rates over time, when measured on large fleets in service, increase over time and considerably towards end of life.

It's isn't enough to say "we're trying to minimise per unit costs", all designs do that, but for different criteria.
What matters is the constraints you're working against or parameters being optimised.

Storage: How many drives can be stuffed in a Box?

How many 2.5" drives can be stuffed into a single enclosure, allowing space for power, cooling, wiring and a single motherboard? Short answer: ~500-1000.

Sunday, March 23, 2014

Storage: more capacity calculations

Following on from the previous post on Efficiency and Capacity, baselining "A pile of Disks" as "100% efficient".

Some additional considerations:
  • Cooling.
    • Drives can't be abutted, there must be a gap for cooling air to circulate.
    • Backblaze allow 15x25.4mm drives across a 17.125in chassis, taking around 12%, 52.5mm, of space for cooling.
      • This figure is used to calculate a per-row capacity, below.
  • A good metric on capacity is "Equivalent 2.5 in drive platters", table below.
    • 9.5mmx2.5" drives with 3 platters, yield highest capacity.
    • Better than 5 platter, 3.5" drives.
    • Drive price-per-GB is still higher for 2.5" drives.
      • This may change in time if new technologies, such as "Shingled Writes" are introduced into 2.5" drive fabrication lines and not into 3.5" lines.
  • Existing high density solutions, measured in "Drives per Rack Unit". SGI "Modular" units, which also support 4xCPU's per 4RU chassis, are the most dense storage current available.
    • Backblaze achieve the lowest DIY cost/GB known:
      • 4RU, vertical orientation, 15 drives across, in 3 rows.
      • fixed drives.
      • 45x3.5" drives = 11.25x3.5" drives per RU
      • 450x3.5" drives per 42RU
    • Supermicro 'cloud' server (below) achieves 12 drives, fixed, in 1RU
      • 12x3.5" drives per RU
      • 504x3.5" drives per 42RU
    • Supermicro High Availability server, supports 36x3.5" removable drives in 4RU
      • 9x3.5" drives per RU
      • 360x3.5" drives per 42RU
      • An alternative 2.5" drive unit puts 24x2.5" removable drives in 2RU,
      • = 8.5x3.5" drives per RU
      • 356 drives per 42RU
    • Open Compute's Open Rack/Open Vault use of 21", not 19" racks still in 30" floor space, allows higher disk densities:
      • 30x3.5" drives in 2RU
      • 15x3.5" drives per RU
      • 630x3.5" drives per 42RU
    • SGI Modular InfiniteStorage uses modules of 3x3 3.5" drives, 3x3 2.5" 15mm and 3x6 9.5mmx2.5" drives in a 4RU chassis. Drives are mounted vertically.
      • Modules are accessible from front and rear.
      • All modules are accessible externally and are removable.
      • 81x3.5"drives per extension case, 72x3.5" drives per main chassis, 4 expansion cases per main chassis.
      • 720 to 792x3.5" drives per 42RU (same for 2.5" 15mm drives)
      • 1140 to 1584x 2.5" 9.5mm drives per 42RU
    • SGI/COPAN "MAID" uses "Patented Canisters" to store 14x3.5" drives back-back per canister. 8 canisters per 4RU drive shelf, 112x3.5" drives per shelf. These devices no longer appear on the SGI website, though have featured in a Press Release.
      • MAID attempts to reduce power consumption by limiting active drives to at most half installed drives.
      • Up to 8 shelves per 42RU unit.
      • Power, CPU's and 
      • 21.33x3.5" drives per RU (28x3.5" drives per RU per shelf)
      • 896x3.5" drives per 42RU
    • EMC Isilon S200, X200 Nodes [2011 figures] are 2RU units
      • EMC support 144 Nodes per cluster
      • 24x2.5" drives and 12x3.5" drives respectively
      • 12x2.5" drives per RU and 6x3.5" drives per RU respectively
      • 504x2.5" and 252x3.5" drives per 42RU
      • 5.5 racks to support maximum 144 node cluster [unchecked for 2014 config]

In the previous piece, I said there were just 4 'interesting' drive orientations of 6 possible, due to "flat plate" blocking of airflow.

If you include a constraint for uninterrupted front-back airflow, there are only two good orientations:
  • the drive connectors, on the shortest side, have to be to one-side (bottom or left/right)
    • vertical, thin-side forward, 100mm high  x thickness (5mm-15mm) width
      • allows many drives across the rack (table below)
      • stacked drives take 75mm depth. Allows 6 in 450mm. 900mm deep possible
    • horizontal, thin-side forward, thickness (5mm-15mm) high x 100mm wide
      • allows 4 drives across Rack
      • stack drives vertically with small separation.
  • drive connectors in-line with airflow will restrict it, eliminating horizontal & vertical end-to-end.

Table of 2.5" Platter equivalent across 19" Rack

Rack Width: 435mm (17.125in, allows for sliders)
Interdrive space (cooling): 52.5mm
Usable space: 382.5mm

25.4mm15106 (75 actual)

Backblaze V4.

$688ea for the new SAS/SATA cards (to Backblaze in 100 Qty):

"The Rocket 750's revolutionary HBA architecture allows each of the 10 Mini-SAS ports to support up to four SATA hard drives: A single Rocket 750 is capable of supporting up to 40 4TB 6Gb/s SATA disks,"

$3,387.28 full chassis
$9,305 total 180TB [$131/drive, $5917.72 total]

$5,403 ‘Storinator’ by Protocase.
$7,200 $160 per 4TB drive is $7,200
$12,603 Protocase + drives

$872.00 case
$355.99 PSU
~$360 motherboard, CPU, RAM
$1,376.40 SATA cards (2)

From SuperMicro: scale-out storage products already, mostly 3.5", but some 2.5"
- 360 3.5” drives in 42RU. 4Ux36-bay SSG

And for ‘hardoop’, they do a little denser.

Supermicro have multiple innovative designs [below], with 9-12x3.5” drives/RU, 12x2.5” drives/RU and their microblande & microclould servers with proprietary motherboards & high-bandwidth.

e.g. hardoop, 1RU, fixed:

12x3.5” in a 1RU rack. 43mm x 437mm x 908mm (H, W, D)
- 2 full length columns (3 drives, fans, 2 drives)
- 1 short column (fans, 2 drives)
- PSU, m’board, Addon-card (PCI on riser) and front panel on one side of chassis
- AddOnCard w/ 8x LSI 2308 SAS2 ports and 4x SATA2/3 ports
- dual 1Gbps ethernet
- m’board 216mm x 330mm, LGA 1155/Socket H2, 4xDDR3 slots
- 650W

Front and back Hot-swap, 4RU, 36 drives:

- 178mm x 437mm x 699mm (H, W, D)
- dual CPU, 4x1Gbps ethernet
- 2x1280W Redundant Power Supplies
- 24xDDR3 slots
- LSI 2108 SAS2 RAID AOC (BBU optional), Hardware RAID 0, 1, 5, 6, 10, 50, 60
- 2x JBOD Expansion Ports
- BPN-SAS2-826EL1 826 backplane with single LSI SAS2X28 expander chip
- BPN-SAS2-846EL1 Backplane supports upto 24 SAS/SATA

Alt. system, 2RU, 24x2.5” hot-plug:

- 89mm x 437mm x 630mm (H, W, D)
- 12Gbps SAS 3.0
- no CPU specified.

Open Vault/Open Rack.

The Open Vault is a simple and cost-effective storage solution with a modular I/O topology that’s built for the Open Rack.
The Open Vault offers high disk densities, holding 30 drives in a 2U chassis, and can operate with almost any host server.
Its innovative, expandable design puts serviceability first, with easy drive replacement no matter the mounting height.

Open Rack

Open Rack is a mounting system designed by Facebook's Open Compute Project that has the same outside dimensions as typical 19-inch racks (e.g. 600 mm width), but supports wider equipment modules of 537 mm or about 21 inches.

SGI® Modular InfiniteStorage™

Image: [whole unit]
Image: [module: 3x3, vertical mount]

Extreme density is achieved with the introduction of modular drive bricks that can be loaded with either nine 3.5 inch SAS or SATA drives, or 18 2.5 inch SAS or SSD drives.

SGI® Modular InfiniteStorage™ JBOD

(SGI MIS JBOD) is a high-density expansion storage platform, designed for maximum flexibility and the ability to be tuned to specific customer requirements.
Whether as a standalone dense JBOD solution, or combined with SGI Modular InfiniteStorage Server (SGI MIS Server), SGI MIS JBOD provides unparalleled versatility for IT managers while also dramatically reducing the amount of valuable datacenter real estate required to accommodate rapidly-expanding data needs.

Up to 81 3.5" or 2.5" Drives in 4U
up to 3.2PB of disk capacity can be supplied within a single 19" rack footprint.

SGI MIS JBOD shares the same innovative dense design with SGI MIS Server, which can be configured with up to
81 3.5" or 2.5" SAS, SATA SSD drives.
This enables SGI MIS JBOD to have up to 324TB in 4U.

SGI MIS JBOD comes with a SAS I/O module, which can accommodate four quad port connectors or 16 lanes.
An additional SAS/IO module can be added as an option for increased availability.

SGI Modular InfiniteStorage Platform Specifications

Servers are hot pluggable, and can be serviced without impacting the rest of the chassis or the other server.
Through an innovative rail design, the chassis can be accessed from the front or rear, enabling drives and other components to be non disruptively replaced.
RAIDs 0, 1, 5, 6 and 10 can be deployed in the same chassis simultaneously for total data protection.
Battery backup is used to allow for cache de-staging for an orderly shutdown in the event of power disruptions.

Connectivity Up to 4 SGI MIS JBODs per SGI MIS Server enclosure
Rack Height 4U
Height 6.94” (176 mm)
Width 16.9” (429.2 mm)
Depth 36” (914.4 mm)
Max weight 250 lbs. (113kg)
Internal Storage
Up to 72 X 3.5” or 2.5” 15mm drives (max 288TB)
Up to 144 x 2.5” 9.5mm drives.
RAID or SAS Controllers
Single server: Up to four 8 ports cards
Dual server: Up to two 8 ports card per server mother board (four per enclosure)
External Storage Attachment Up to 4 SGI MIS JBOD chassis per server enclosure

JBOD modules:
Connectivity Four quad port SAS standard. Eight quad port SAS optional
Internal Storage
Up to 81 X 3.5” drives (max 324TB)
Up to 162 x 2.5” 9.5mm drives

SGI® COPANTM 400M Native MAID Storage

  • up to 2.7PB raw of data in a compact storage footprint.
  • 8x4RU, 8xcanisters ea 4RU, 112 drives/4RU.
  • 2x4RU power, cache and management
  • Up to 6,400 MB/s (23TB/hr) of disk-based throughput
  • idling: power consumption of the storage system by up to 85%.
  • Patented Disk Aerobics® Software
  • Patented Power Managed RAID® Software. Provides full RAID 5 data protection and helps lower energy costs as a maximum of 25% or 50%

Capacity: 224TB to 2688TB per cabinet - 1 shelf = 112x2TB, = 14 drives/module, 8 canisters/shelf
Shelves: 1–8
Connectivity: Up to eight 8-Gbps Fibre Channel ports [later docs: 16x8Gbps FCAL]

Max Spinning Drives at Full Operation up to 50%
Spare Drives 5 per shelf for a maximum of 40
Disk Drive 2TB & 3TB SATA
Dimensions 30” (76.2 cm) w x 48” (121.9 cm)d x 87” (221 cm) h
Clearances Front–40” (101.6 cm), Rear–36” 91.4 cm), Side–0”
Weight Maximum 3,193 lbs. (1,447 kg)

Power Consumption @ Standby (min/max) 426/2,080 watts
Power Consumption @ 25% power (min/max) 649/3,819 watts
Power Consumption @ 50% power (min/max) 940/6,554 watts

Storage tiering software SGI Data Migration Facility (DMF)
D2D backup IBM® TSM®, CommVault® Simpana® Quantum® StorNext®

SGI® COPAN™ 400M Native MAID
Connectivity Up to sixteen 8-Gbps Fibre Channel ports

MAID Platforms
A New Approach to Data Backup, Recovery and Archiving

COPAN products are all based on an Enterprise MAID (Massive Array of Idle Disks) platform, which is ideally suited to cost effectively address the long-term data storage requirements of write-once/read-occasionally (WORO) data.


  • Data Archiving
  • Data Protection: Backup & Recovery
  • Storage Tiering
  • Power Efficient Storage

For backup, recovery and archiving of persistent data:

Unprecedented reliability - six times more reliable than traditional spinning disk solutions

  • Massive scalability - from 224 TB to 2688 TB raw capacity
  • High Density - 268 TB per ft.² (2688 TB per .93 m²)
  • Small Footprint - 10 ft.²
  • Energy Efficiency - up to 85% more efficient than traditional, always spinning disk solutions

COPAN technology simplifies your long-term data storage,
drastically lowers your utility costs, and
frees up valuable data center floor space.

  • Lowest Cost Solution
    • Savings in operational costs and capital expenses
  • Smallest Disk-Based Storage Footprint
    • 268 TB per square foot or 2688 TB per .93 m²
  • High Performance
    • Fast Restores up to 23 TB/hour system
  • Breakthrough Energy Efficiency
    • Save up to 85% on power and cooling costs

COPAN Patented Cannister Technology

  • Patented mounting scheme to eliminate "rotational vibration" within a storage shelf
  • Canister technology enables efficient and quick servicing of 14 disk drives
  • Data is striped across canisters with a shelf in 3+1 RAID sets

Environment Factors
Disk-Based Storage
COPAN Systems'
Enterprise MAID
Quick Data RecoveryX
Cost per GBX
Operating ExpenseXX
Small FootprintX
Power & Cooling EfficiencyX
Ease of ManagementX
Built for Long-Term Data StorageXX

MAID is designed for Write Once Read Occasionally (WORO) applications.
Six times more reliable than traditional SATA drives

Disaster Recovery Replication Protection:
Three-Tier System Architecture
• Simplifies system management of persistent data
• Scales performance with capacity
• Enables industry-leading, high density, storage capacity in a single footprint
• Enhances drive reliability with unique disk packaging, cooling and vibration management

Patented Canister Technology
• Patented mounting scheme eliminates “rota- tional vibration” within a storage shelf
• Canister technology enables efficient and quick servicing of the 14 disk drives
• Data is striped across canisters with a shelf in 3+1 RAID sets

MONDAY, MAY 16, 2011

EMCWorld: Part2 - Isilon Builds on Last Months Announcements with Support for 3TB Drives and a 15PB FileSystem

With list pricing starting at $57,569 ($4.11/GB) for the S200
the value metric is not the
traditional capacity view but
$/IOP and
i.e. $6/IOP and $97/MBs respectively for the S200.
With a starting price of $27,450/Node the X200 comes in at nearly
$13/IOP and
but has an attractive starting price of
even more attractive when their 80% utilization claim is factored in.

They doubled their IOP number for
a maximum cluster size of 144 nodes
to 1.4M IOPs and
doubled of their maximum throughput to 85GB/s.
It is not just the power of Intel (Westmere/Nahalem upgrades) that has driven this performance increase but also
the intelligent use of SSD’s.
By supporting HDD’s and SSD’s in the same enclosure and by placing the file metadata on SSD, performance gets a significant boost.
The IOP number has not yet been submitted to SpecFS so the performance number is still “unofficial”.

The latest announcement last week at EMCWorld increased the maximum supported single file system,
single volume to 15PB plus support for 3TB HDD’s on their capacity platform, the NL-108.
Worth noting that this impressive scalability is only for the NL108 configured with 3TB drives.
In comparison the higher performance X200 scalability tops out at 5.2PB.

Form Factor2U2U
Maximum Drives2412
Drive Types2.5” SAS, SSD3.5” SATA, SSD
Maximum Node Capacity14TB24TB
Max Memory96GB48GB
Global Coherent Cache14TB7TB
Max Cluster Size144144
ProtocolsNFS, CIFS, FTP, HTTP, iSCSI (NFS 4.0, Native CIFS and Kerberized NFSv3 supported)
Maximum IO/s1,414,944 IO/s309,312 IO/s
Maximum Throughput84,960 MB/s35,712 MB/s
List Price Starting at;$57,569/node$27,450/node

Front and center in Isilon’s promotional pitches are the advantages of scale-out namely, scalability, efficiency, ease of use and availability and are positioning themselves as the scale-out architecture that is integrated with capabilities that elevate it to enterprise class. This they believe serves them well in both their traditional space as well positioning them to penetrate the commercial HPC, Big Data space.


EMCWorld; Part 3, The Final installment; VNXe, ATMOS and VMWare

VNX Series: As you all are probably well aware the VNX series is the EMC mid-tier, unified storage offering that is in the process of replacing the CLARiiON and Celerra lines.
It was launched back in January and continues to evolve as these announcements suggest:

1. FLASH 1st is the VNX SSD strategy which incorporates FAST, FASTCache and soon to be available server side cache code named project lightening.
On this feature I must admit I became a bit of a convert, see my comments in my earlier blog.
2. A Cloud Tiering Appliance designed to offload cold unstructured data from the VMX to the cloud was introduced.
This device can also operate as a migration tool to siphon data from other devices such as NetApp.
This announcement really resonated with me, more coverage in my earlier blog.
3. A ruggedized version of the SMB version of the VNXe was introduced.
It was mentioned a couple of times in the presentations that EMC have not done well in the federal space.
This is an obvious attempt to help fix that deficiency.
Napolitano also mentioned that 50% of the customers who have purchased VNXe were new to EMC and during the 1st quarter EMC signed 1100 new VNXe partners.
4. SSD support for the VNXe.
Another reinforcement of EMC’s commitment to solid state storage.
5. VAAI support for NFS and block enhancements including thin provisioning.
No surprise here - a deeper integration with VMWare which all storage vendors should be doing.
EMC just happens to have a bit of an advantage.
6. A Google Search Appliance was introduced.
This device enables updated files to be searched sooner and comes in two flavors the 7007 supporting up to 10M files and 9009 supporting up to 50M files.
Clever announcement; in the world of big data findability (my word) is valuable currency.
7. A high density disk enclosure supporting 60, 3.5” SAS, NL-SAS or Flash drives.
GB/RU is one of today’s metrics and this helps EMC’s capacity positioning big time.
8. Doubled bandwidth performance with a high bandwidth option that triples the 6Gb SAS backend ports.
Bandwidth, IOPS & capacity and interesting balancing act particularly when you throw in cost.

ATMOS: I first started to write about Atmos when Hulk was the star of the rumor mill; boy how time fly’s.
Hulk is still there in its evolved instantiation but its role has most certainly moved as a back-up player in the chorus line.
The lead player, ATMOS 2.0 featured in the announcement with the declaration of a significant performance boost.
The claim is an 5x increase in performance with a current ability to handle up to 500M objects per day.
They have also changing their protection scheme they can increase storage efficiency by 65%.
Change is probably the wrong word, they continue to support the multiple copy approach but have added there new object segmentation approach.

Previously data protection was achieved by the creation of multiple copies that were distributed within the Atmos cloud.
The EMC Geo Parity as it is called is similar to the
Cleaversafe approach where rather than storing multiple copies of a complete object
it breaks the object into segments (12) with four segments being parity,
analogous to a RAID group.
These segments are then distributed throughout the cloud with the data protected with a tolerance to a multiple failures.

VMWare: Not much in terms of announcement but some the adoption stats was interesting.

• VM migration (vMotion) has increased from 53% to 86%
• High availability use has increased from 41% to 74%
• Dynamic Resource Scheduling (DRS) has increased from 37% to 64%
• Dynamic migration (storage vMotion) has increased from 27% to 65%
• Fault tolerant use has grown from zero to 23%

IBM Delivers Technology to Help Clients Protect and Retain "Big Data"

Introduces industry-first tape library technology capable of storing nearly 3 exabytes of data -- enough to store almost 3X the mobile data in U.S. in 2010

ARMONK, N.Y., - 09 May 2011: IBM (NYSE: IBM) today announced new tape storage and enhanced archiving, deduplication offerings designed to help clients efficiently store and extract intelligence from massive amounts of data.

At the same time, demand for storage capacity worldwide will continue to grow at a compound annual growth rate of 49.8 percent from 2009-2014, according to IDC (1). Clients require new technologies and ways to capitalize on the growing volume, variety and velocity of information known as "Big Data."

IBM System Storage™ TS3500 Tape Library is enabled by a new, IBM-developed shuttle technology -- a mechanical attachment that connects up to 15 tape libraries to create a single, high capacity library complex at a lower cost. The TS3500 offers 80 percent more capacity than a comparable Oracle tape library and is the highest capacity library in the industry, making it ideal for the world's largest data archives (3).

Thursday, March 20, 2014

Storage: Efficiency measures

In 2020 we can expect bigger disk drives and hence Petabyte stores. Price per bit will come at a premium, it won't track capacity as it does now: larger capacity drives will cost more per unit.

What are the theoretical limits on which Storage solution "efficiency" can be judged?

We're slowly approaching what could be the last factor-10 improvement, to 10Tbits/in², in rotational 2-D magnetic recording technologies of Hard Disk Drives. Jim Gray (~2000) and Mark Kryder (2009) suggested 7TB/platter for 2.5" disk drives by 2020, assuming a 40%/yr capacity growth.

Rosenthal et al (2012) suggest that, like CPU-speed "Moore's Law", disk capacity growth rates have slowed, suggesting 100Tbits/in² may be possible in the far future. They predict 1.8 Tbits/in² commercially available in 2020, vs 0-6-0.7Tb/in² currently.

Three platter 2.5" drives are normally 12.5mm thick, but are in 9.5mm drives available in 2013 (HGST, 1.5TB). Four platter 2.5" drives are 12.5mm or 15mm usually, according to Seagate, with three 667GB/platter in 9.5mm for 2TB total (using 2.3W for read/write).

Slim-line 7mm and 5mm 2.5" drives are on the market. 7mm drives are two platter.

In 2020, the 2.5" disk drive market will differentiate by both thickness (5, 7, 9.5, 12.5,15mm) and number of platters, from 1 to 4. Laptop and ultrabook manufacturers will determine if 7mm replaces 9.5mm as the standard consumer portable form factor, giving them a volume production price advantage.

Per-platter, we can expect 1.5TB-2TB, or total 1TB-6TB in 2.5" drives [vs 5 platter 3.5" drives at 15TB].

Storage system builders will be able to select drive combinations on, not just SSD + HDD:
  • Cost per GB
  • GB per cubic-inch
  • Watts per GB, and
  • spindles per TB, setting maximum IO/sec and streaming IO performance.
  • How many drives can fit in a single rack?
    • How much raw capacity?
  • How much power would they use? [and how much cooling]
  • How much does it all weight? [can the floor hold it up?]
  • Time to back it up?
    • Dependant on external ports and interface speeds.
  • Performance:
    • How many IO/sec?
    • Aggregate internal streaming throughput?
    • Normalised multi-media transactions/sec: 1MB Object requests/sec?
    • Scan Time for searching, data mining, disk utilities & RAID rebuild?

Disk drives have 3 different dimensions: WxDxH and 3 different 'faces', WxD, WxH, DxH
For 2.5" drives, approx 70mm x 101mm x 9.5mm
For 3.5" drives, 101.6mm (4 inch) x 146mm (6inch) x 19-26.1mm (nominally "1 inch")

Drives can be placed with any of the 3 faces down and rotated about a vertical axis, giving potentially 6 orientations.
In practice, the thinest cross-section has to face forward, into the airflow, to allow effective cooling.
This gives just 4 orientations: 2 'flat' and 2 'vertical'.

19 inch racks are "mostly standard":
  • 19 inches across the faceplate, posts are each 5/8 inch, fasteners & holes are well defined.
    • But need extra space either side for cabling and airflow, increasing external rack dimension.
  • 17.75 inches internal clearance (450mm). With sliders: 17.25 inches internal. (435mm)
  • 1RU (Rack Unit) = 1 .75 inches high
  • convention is 42RU high = 73.5 inches of usable space
    • Allow for plinth, first usable RU is off the floor
    • Allow for head piece, plate + structural rails,fans and cable organisers on top
  • Depth varies on use:
    • 600mm (24 inch) common in Telecoms
    • 966mm (38 inch) common in IT.
    • Need extra space front and rear for doors, cabling, power strips, ...
  • External dimensions: 30in x 48in x 87in (WxDxH)
    • Notionally, a single rack uses ten square feet (1 square meter) of floor space.
    • side clearance of zero: racks bolt together to stabilise the structure.
    • Front and rear clearance, often 40" and 30" are needed to open doors and load/unload parts.
    • Aisles are needed between rows to allow work and access.
      • In many facilities, need to open two doors at once, 50" minimum.
    • "Hot Aisle": exhaust adjacent rows into the one sealed area with extractor fan.
  • Floor space in server rooms
    • Only around 33%-50% of the available floor space can be used for racks.
    • Racks are best organised in rows parallel to long dimension of room
    • long rooms need breaks in rows, creating cross-aisles
    • Additional clearance is needed around walls of rooms
    • Entrance doors need to be double and handle shipping pallets
    • Extra spare space is needed around doors for staging equipment in, and storing packing waste before removal
    • Dedicated space is needed for "Air handling units" (at least two), power distribution boards and fire control systems. These need clearance for servicing and removal/replacement.
    • In room UPS units need space and cooling (No-break power supplies)
      • lead-acid battery banks of any capacity need to be housing in separate, spark-proof rooms with additional fire control and sprinklers.

Stacking 3.5 inch drives, no allowance for cooling, wiring, power or access:

3.5" drives, at 4 inches wide can be stacked flat, 4 abreast in a rack.
6 drives will fit end-to-end in a 36"-38" cabinet, for 24 drives in a layer.
Alternatively, 17 drives can be stood on their sides across a rack, 4" tall layers.
With 102 drives/layer and 1836 drives/rack.

For nominal 1" thick drives, 72 layers can be stacked, giving 1,752 drives per rack.
With 15TB 3.5" drives, 26PB/rack.
With 4TB 3.5" drives, 7PB/rack.
7200 RPM 3.5" drives consume 8W-10W, or 14kW-16kW per rack.
7200 RPM, 120Hz, drives are capable each of 240 IO/sec, for 400k IO/sec aggregate.
3.5" drives weight ~600grams each, for a load of about 1 ton (or 1,000kg/m²)
15TB 3.5" drives will stream at around 2Gbps, for 3.5Tbps aggregate internal bandwidth.

Stacking 2.5" drives, at 5400 RPM (90Hz)
5mm17,800 drives3,204k IO/sec@ 0.5TB 9PB/rack@ 1.5TB 27PB/rack
7mm12,714 drives2,288k IO/sec
9.5mm9,368 drives1,686k IO/sec@ 1TB 9.5PB/rack@ 3TB 30PB/rack
12.5mm7,120 drives1,281k IO/sec
15mm5,933 drives1,069k IO/sec@ 2TB 12PB/rack@ 6TB 36PB/rack

Power consumption at 1.2W for 9.5mm drives of 8kW, around half the power needed for 3.5" drives.

Aggregate internal bandwidth is higher, even though the per-drive streaming rate is up to 25% lower, 1.5Gbps.
For 9.5mm drives, 14Tbps aggregate internal bandwidth (3TB drives).

5mm drives weigh around 95grams each and 15mm drives 200grams, the same weight ± 15% as 3.5" drives.