jay's code rage: November 2013

Friday, November 22, 2013

Nostalgic Look at Cray

This colorful monster used to be the face of high-performance computing. Nowadays we use massive parallelism -- clusters built from relatively ordinary standalone architectures connected by "fabrics" of high-speed networks.

Cray's design features were enviable in the 1990s, but are now considered decidedly passé. They concentrated on single-CPU, single-pipeline designs, using exotic circuitry and cooling to eke the highest performance out of that limited design. The industry went a different direction, using relatively cheap COTS processors and main boards.

The Cray CPU was a collection of bipolar transistor logic modules immersed in a proprietary liquid coolant called Fluorinert. The circular cross section of the CPU cabinet was to reduce transmission distances for the wiring. Bipolar electronics -- not to be confused with manic-depressive circuits -- are insanely fast, but have the undesirable property of consuming electrical current constantly in order to maintain state. Modern electronics consume current only when they change state, so electrical power needs fluctuate with actual use. And the Fluorinert coolant was expensive and messy; the only advantage was the integrated seating offered by elements of the cooling system surrounding the cabinet. We still use liquid cooling, but we rely on cold plates, water blocks, and other less messy heat transfer setups.

Let's be honest: the sheer "geek chic" factor of computers immersed in liquid coolant was enough to make us drool over Crays. The colorful, minimalist cabinets with their integrated lighting and cylindrical, towering forms were almost literal quotes from the mythical Krell architecture of the landmark 1950s science fiction films.

But it was their programming model that really shone at the time. Piles of general-purpose registers, all 64-bit. Another glistening pile of vector registers, with compilers for Fortran that would vectorize the inner loops. With a few cycles of setup, each vector element operation took only one CPU cycle. And then an even larger pile of "secondary" registers -- essentially an L1 cache under explicit programmer control. Data could be exchanged with cache registers in a single cycle.

While we achieve astonishing performance with the IBM SP and Intel x86 architectures in SIMD and MIMD designs, we have to genuflect to the sheer elegance -- in all aspects -- of the legacy Cray design. From the color of the case to the orthogonal Zen of its programming model, it was a champion.

Errors matter

"If your error_log is bigger than your access_log, you might be a bad webmaster."
--Bill the Web Hosting Engineer

True and wise words. Most of the problems web designers run into, and escalate to the hosting provider, could be solved simply by paying attention to the logs produced by the web server, the database server, and the language runtime.

The PHP runtime tells you when you've committed a language faux pas such as referencing undefined variables. While you, the original programmer, might understand that the reference in question is safe, the subsequent maintenance programmer may not. Write your code to be transparent, so that its intent and correctness are evident by inspection.

Web server logs tell you many things such as broken links that invoke potentially heavyweight HTTP 404 error handlers.

MySQL "slow logs" give you insight into which queries are problematic. Lots of them indicate that you need to think carefully about what needs indexes. The ultimate example, I think, came from a client who reported a MySQL error that indicated a full disk. Investigation showed it was the MySQL slow log that had grown to over 9 gigabytes. Thus explained all his own end users' complaints about site performance. His queries were searching an unindexed table containing every commercial transaction his business had performed since it started -- some 3 million rows.

"So you're saying indexing the columns would make the site faster?"
--the client

Yes we can make the point that proper data modeling and implementation is part of the job. But any person who writes code for the web needs to realize that a properly rendered site in the browser is not the final (or even most important) metric of how successful that product is. Learn what makes the web work, and learn how to read the indications of success or failure.

Thursday, November 21, 2013

Don't be a dick

Ars Technica reports that LG smart TVs are working over your home network in NSA mode.

If your home network is like mine, it puts to shame pretty much any office LAN of 20 years ago. Just a brief glance at the Untangled DHCP assignments reveals my growing collection of laptops, a small handful of media servers, a few desktop workstations, and a regiment of mobile devices all my friends bring with them. That's dozens of devices, some of which probably have accessible folders.

Yes, you should lock down your network and any of the devices on it. But let's face it: not all my friends are as tech-savvy as me and my evil genius roommate. They won't necessarily know how to do this. And let's face it: no one wants the paranoia of wondering whether your friend's appliances are spying on him.

"The doll's trying to kill me and the toaster's been laughing at me!"
-- Homer Simpson

If you program embedded systems, don't do this. Don't send all kinds of intrusive information back to your company. Don't wander aimlessly over whatever network you find yourself on. It's just impolite.

Let me illustrate. Years ago I worked on next-generation satellite television systems. I mostly worked on the spacecraft-integration end. But along the way we came up with the idea of using the emerging on-demand features of the medium to tailor advertising to the viewing habits of the end user. We had good motives. Most of us were single men, and had a fervent desire never to see feminine hygiene advertisements.

But that meant storing viewing preferences -- programs watched, etc. And it also meant transmitting that information to edge servers that could deliver the tailored content. Even though the association was only to the device ID, we had moral reservations. We firmly believed that one's viewing habits were a matter of individual privacy, and we had no desire to facilitate whatever nefarious purpose someone else might want to make of that information later.

(Yes, Netflix unabashedly does this now. We were angels back then.)

But let's take it a step further. You might actually be incurring legal liability by snooping on private networks and sending the data off-network. Most networks provide the concept of trust among well-defined peers. This means your desktop might provide more lenient access to other hosts on the local network, simply by virtue of their being on the network. If your embedded appliance code blindly transmits things like medical or financial records off-network to your company servers, where it suddenly becomes accessible to your data managers, then that is a clear breach of trust and ethics.

You run a terrible privacy risk covertly tracking your users' habits even for your own business purposes. You have no business intruding into other parts of their lives, or networks.

Who and why?

This blog has been a long time in coming. I'm an engineer and a computer scientist. My field is computational tools for engineering, which means I see a lot of beautiful things like the 787 Dreamliner taking flight. (I worked on the computational fluid dynamics that drove its design.) Lately my engineering duties have grown to encompass systems engineering for consumer Internet service.

Through it all I have seen the best and worst of computer programming. This blog focuses on good and bad software practice. As software becomes more responsible for how our world works, and as more of it devolves the purview of "web designers," it becomes more important to maintain a high standard of qualification and competence.