Friday, November 22, 2013

Nostalgic Look at Cray

This colorful monster used to be the face of high-performance computing.  Nowadays we use massive parallelism -- clusters built from relatively ordinary standalone architectures connected by "fabrics" of high-speed networks.

Cray's design features were enviable in the 1990s, but are now considered decidedly passé.  They concentrated on single-CPU, single-pipeline designs, using exotic circuitry and cooling to eke the highest performance out of that limited design.  The industry went a different direction, using relatively cheap COTS processors and main boards.

The Cray CPU was a collection of bipolar transistor logic modules immersed in a proprietary liquid coolant called Fluorinert.  The circular cross section of the CPU cabinet was to reduce transmission distances for the wiring.  Bipolar electronics -- not to be confused with manic-depressive circuits -- are insanely fast, but have the undesirable property of consuming electrical current constantly in order to maintain state.  Modern electronics consume current only when they change state, so electrical power needs fluctuate with actual use.  And the Fluorinert coolant was expensive and messy; the only advantage was the integrated seating offered by elements of the cooling system surrounding the cabinet.  We still use liquid cooling, but we rely on cold plates, water blocks, and other less messy heat transfer setups.

Let's be honest: the sheer "geek chic" factor of computers immersed in liquid coolant was enough to make us drool over Crays.  The colorful, minimalist cabinets with their integrated lighting and cylindrical, towering forms were almost literal quotes from the mythical Krell architecture of the landmark 1950s science fiction films.

But it was their programming model that really shone at the time.  Piles of general-purpose registers, all 64-bit.  Another glistening pile of vector registers, with compilers for Fortran that would vectorize the inner loops.  With a few cycles of setup, each vector element operation took only one CPU cycle.  And then an even larger pile of "secondary" registers -- essentially an L1 cache under explicit programmer control.  Data could be exchanged with cache registers in a single cycle.

While we achieve astonishing performance with the IBM SP and Intel x86 architectures in SIMD and MIMD designs, we have to genuflect to the sheer elegance -- in all aspects -- of the legacy Cray design.  From the color of the case to the orthogonal Zen of its programming model, it was a champion.

No comments:

Post a Comment