Tuesday, December 13, 2011

Revolutions End: Computing in 2020

We haven't reached the end of the Silicon Revolution yet, but "we can see it from here".

Why should anyone care? Discussed at the end.

There are two expert commentaries that point the way:
  • David Patterson's 2004 HPEC Keynote, "Latency vs Bandwidth", and
  • Mark Kryder's 2009 paper in IEEE Magnetics, "After Hard Drives—What Comes Next?"
    [no link]
Kryder projected the current expected limits of magnetic recording technology in 2020 (2.5": 7Tb/platter) and how another dozen technologies will compare, but there's no guarantee. Some unanticipated problem might, like CPU's, derail Kryders' Law before then: disk space doubles every year.
We will get an early "heads-up": by 2015 Kryder expects 7Tb/platter to be demonstrated.

This "failure to fulfil the roadmap" has happened before: In 2005 Herb Sutter pointed out that 2003 marked the end of Moore's Law for single-core CPU's in "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software". Whilst Silicon fabrication kept improving, CPU's hit a "Heat Wall" limiting the clock-frequency, spawning a new generation of "multi-core" CPUs.

IBM with its 5.2GHz Z-series processors and gamers "over-clocking" standard x86 CPUs showed part of the problem was a "Cooling Wall". This is still to play out fully with servers and blades.
Back to water-cooling, anyone?
We can't "do a Cray" anymore and dunk the whole machine in a vat of Freon (a CFC refrigerant, now banned).

Patterson examines the evolution of four computing techologies over 25 years from ~1980 and the increasing disparity between "latency" (like disk access time) and "bandwidth" (throughput):
  • Disks
  • Memory (RAM)
  • LANs (local Networking)
  • CPUs
He  neglects "backplanes", PCI etc, Graphic sub-systems/Video interfaces and non-LAN peripheral interconnection.

He argues there are 3 ways to cope with "Latency lagging Bandwidth":
  • Caching (substitute different types of capacity)
  • Replication (leverage capacity)
  • Prediction (leverage bandwidth)
Whilst  Patterson doesn't attempt to forecast the limits of technologies like Kryder, he provides an extremely important and useful insight:
If everything improves at the same rate, then nothing really changes
When rates vary, require real innovation
In this new milieu, Software and System designers will have to step-up to build systems that are effective and efficient, and any speed improvements will only come from better software.

There is an effect that will dominate bandwidth improvement, especially in networking and interconnections (backplanes, video, CPU/GPU and peripheral interconnects):
the bandwidth-distance product
This affects both copper and fibre-optic links. Using a single technology, a 10-times speed-up shortens the effective distance 10-times. Well know in transmission line theory.

For LANs to go from 10Mbps to 100Mbps to 1Gbps, higher-spec cable (Cat 4, Cat 5, Cat 5e/6) had to be used. Although 40Gbps and 100Gbps Ethernet have been agreed and ratified, I expect these speeds will only ever be Fibre Optic. Copper versions will either be very limited in length (1-3m) or use very bulk, heavy and expensive cables: worse in every dimension than fibre.

See the "International Technology Roadmap for Semiconductors" for the expert forecasts of the underlying Silicon Fabrication technologies, currently out to 2024. There is a lot of detail in there.

The one solid prediction I have is Kryder's 7Tb/platter.
A 32 times increase in bit-areal density, Or 5 doublings of capacity.
This implies the transfer rate of disks will increase 5-6 times, given there's no point in increasing rotational speed, to roughly 8Gbps. Faster than "SATA 3.0" (6Gbps) but within the current cable limits. Maintaining the current "headroom" would require a 24Gbps spec - needing a new generation of cable. The SATA Express standard/proposal of 16Gbps might work.

There are three ways disk connectors could evolve:
  • SATA/SAS (copepr) at 10-20Gbps
  • Fibre Optic
  • Thunderbolt (already 2 * 10Gbps)
Which type to dominate will be determined by the Industry, particularly the major Vendors.

The disk "scan time" (to fully populate a drive) at 1GB/sec, will be about 2hours/platter. Or 6 hours for a 20Tb laptop drive, or 9 hours for a 30Tb server class drive. [16 hours if 50TB drives are packaged in 3.5" (25.4mm thick) enclosures].  Versus the ~65 minutes for a 500Gb drive now.

There is one unequivocal outcome:
Populating a drive using random I/O, as we now do via filesystems, is not an option. Random I/O is 10-100 times slower than streaming/sequential I/O. It's not good enough to take a month or two to restore a single drive, when 1-24 hours are the real business requirements.

Also, for laptops and workstations with large drives (SSD or HDD), they will require 10Gbps networking as a minimum. This may be Ethernet or the much smaller and available Thunderbolt.

A caveat: This piece isn't "Evolutions' End", but "(Silicon) Revolutions' End". Hardware Engineers are really smart folk, they will keep innovating and providing Bigger, Faster, Better hardware. Just don't expect the rates of increase to be nearly as fast. Moores' Law didn't get repealed in 2003, the rate-of-doubling changed...


Why should anyone care? is really: Who should care?

If you're a consumer of technology or a mid-tier integrator, very little of this will matter. In the same way that now when buying a motor vehicle, you don't care about the particular technologies under the hood, just what it can do versus your needs and budget.

People designing software and systems, the businesses selling those technology/services and Vendors supplying parts/components hardware or software that others build upon, will be intimately concerned with the changes wrought by Revolutions End.

One example is provided above:
 backing up and restoring disks can no longer be a usual filesystem copy. New techniques are required.

No comments: