Tuesday, May 27, 2014

"MAID" using 2.5 in drives

What would a current attempt at MAID look like with 2.5" drives?

"MAID", Massive Array of Idle Disks, was an attempt by Copan Systems (bought by SGI in 2009) at near-line Bulk Storage. It had a novel design innovation, mounting drives vertically back-to-back in slide-out canisters (patented), and was based on an interesting design principle: off-line storage can mostly be powered down.

It was a credible attempt, coming out of The Internet Archive, and their "Petabox" (a more technical view and on Wikipedia).  At 24 x 3.5" drives per 4RU, they contain around half the 45 drives of the Backblaze 4.0 Storage Pod. The Petabox has 10Gbps uplinks and much beefier CPU's and more DRAM.

The Xyratex ClusterStor (now Seagate) offers another benchmark: their Scalable Storage Unit (SSU) stores 3 rows of 14 drives in 2.5RU x 450mm slide-out draws, allowing hot-plug access to all drives. Two SSU's comprise a single 5RU unit of 84 drives, with up to 14 SSU's per rack for 1176 drives per rack, an average of 28 x 3.5" drives per Rack Unit.
Key design questions:
  • How do you power and cool all those drives?
  • Do you want drives fixed, like Backblaze, or in drawers or canisters like Copan and Xyratex?
I'd use USB 3.0 as the power and connection method. Best packing density can be achieved if the drive manufacturers integrate the USB 3.0 chips onto the drive, replacing the SAS/SATA chips, in the same way both Seagate and Western Digital have recently announced ethernet connected drives.

Ethernet, with PoE, is an alternative method with drives already nearing production.

Groups of drives would share a USB hub or Ethernet switch, with a single connection back to the motherboard or another level of hubs/switches.

For MAID, the motherboard(s) must be able to power-down individual devices, specifically be able to turn power per-port off and on, hence either control ports on USB hubs or per-port PoE feeds on switches. Power supplies feed the USB hubs or PoE switches. Per hub/switch power monitoring is necessary to avoid overloads. Small local batteries or ultra-capacitors might provide enough head-room to allow short duration overloads.

Using the Xyratex vertical mounting technique, two drawers of drives can fit in 5RU, while limiting the unit to 7 x 5RU units to allow Top of Rack switches and power supplies.

Alternatively 14 x 3RU units also fit in a 42RU rack, without space for PSU's or switches.

Allowing for drawer sliders, 432-435mm is available internally to house drives.

Drive Cooling: their are multiple vendors selling custom-built "Cold Plate" technology. [An example productWikipedia, and a 2011 article].

The unclear thickness of cold-plates for low-power applications isn't clear. With modern machining, 0.5mm channels could be reliably & precisely machined in either Copper or Aluminium blocks, resulting in 1.5mm-2.0mm thick plates, the length of the rack: 750mm-950mm.

The critical factor for liquid cooling is the thermal coupling of plates to drives. This is well known for CPU heat-sinks: pastes, sometimes toxic, are applied to do this job. This would impact in-place drive replacement, forcing whole modules to be replaced at once, with individual drives replaced elsewhere.

As well, the thermal coupling must also dampen vibrations, not be a source of direct mechanical coupling between drives, suggesting either a rubber/synthetic plate with metal contact pads or a metal plate with rubberised thermal coupling pads.

The bottom connectors and top & side locators (drives cannot be screwed in) would also have to mechanically isolate each drive to avoid vibration coupling.

Using cold-plate cooling would allow drives to be mounted front-to-back or side-to-side, unlike air-cooling where only front-to-back is possible to maintain an airflow.

The modules, ("sleds"), holding the drives can be open-frames, not adding to the width required, but adding a little to the height. In both formats, there is space available for this.

I haven't factored in the space needed for motherboards and power supplies.
This may reduce the effective depth to 560mm-630mm, 7 or 8 drives deep.

Single row per Sled:

In this way, drives could sit without gaps along a module, taking just 70mm each:
  • 11 drives in 770mm or 13 drives in 910mm [14 in 980mm only works for fixed drives]
  • 9.5mm drives, with 1.5mm cold-plates, 11mm centres, allows 39 drives across a rack:
    • 117 platters per row
    • 429 (11-deep) or 507 (13-deep) drives per unit.
    • for 6006 to 7098 drives per 42RU rack (14 units)
    • a 66% - 78% packing density
  • 15mm drives with 2mm cold-plates at 17mm centres allows 25 drives across a rack:
    • 100 platters per row
    • 275 drives (11-deep) or 325 (13-deep)
    • for 3850 to 4550 drives per 42RU rack (14 units)
    • mini:~ steve$ echo 4k 6000 sa 3850 la /p 4550 la /p|dc
    • a 64% to 75% packing density
At ~100gm (0.1kg) per drive, there will be ~500kg per rack in drives, required drawers to support 20kg each.

Dual rows per Sled:

The same cold-plate can cool back-to-back disk drives.

A 1.3-1.5mm thick plate bonded to 9.5 mm drives creates a module 20.3mm-20.5mm thick.

Across a rack, 6 drives @ 70mm spacing (12 per module), with 12mm-15mm space for frames/couplings.
This allows 10 modules in 205mm, and 44 modules in 902mm [or 47 @ 20.3mm in 954.1mm]
A total of 524 [564] per layer.

Along a rack, 13 drives can fit in 910mm [26 per module] and 20 modules @ 20.5mm in 410mm, or 21 modules @ 20.3mm in 426mm.
A total of 520 [546] per layer.

The next smaller layout, 840mm and 12 drives, fits 480 [504] 9.5mm drives.

The advantage of across-the-rack layout with cold-plates is that depth of rack used can be changed in much smaller increments.

Note: this extreme packing density is possible in a USB connected MAID device. It wouldn't be a first choice for SAS/SATA drives.

For dual 15mm drives bonded to 1.8mm-2.0mm cold-plates, 31.8mm-32.0mm thick modules,

  • 28 modules take 896mm for 336 drives per layer, across the rack, and
  • 13 modules at 910mm deep [13 drives], for 338 drives per layer along the rack,
    • or 13 modules at 840mm deep [12 drives], for 312 drives per layer.
At 14 layers per rack, 7280 drives/rack @ 9.5mm and 4368 drives/rack @ 15mm.
Around an 80% packing density.

For 7mm drives, double the 15mm drive figures.
As this is a MAID design, single-platter drives wouldn't be considered as we're not trying to maximise IO/sec or minimise latency, we're maximising capacity, storage density or $$/Gb.

No comments: