jay's code rage: 2013

Tuesday, December 3, 2013

What's in a name?

What's a developer? A programmer? A software engineer? I think it matters because how we describe ourselves tells us a lot about how we approach our jobs. And that's not a value judgment, because the industry is quite vast and thus encompasses necessarily a wide range of approaches. Here's my take.

Computer Scientist

This describes the field in academic terms. For decades "computer science" referred to the abstract principles by which computing happens, and it's what went on the diplomas, regardless of what people actually did to . People who call themselves computer scientists want you to see them as theorists and brainiacs, even if they also write code. They tend toward the analytical activity in the industry and will produce more paper than code if you let them. They are notorious for setting aside "good enough" in the pursuit of "best."

However, lately the industry has suffered from a lack of computer science among its practitioners -- many of whom enter with a paucity of formal training. Early Fortran is an example of what you get without some good theory behind you -- a disorganized mess. Sound computation theory gave us such abstractions as map-reduce (the genius behind the Google search engine's scalability) and useful technologies such as face recognition. Closer to home, practitioners with a sound theoretical vocabulary are more apt to understand the more esoteric factors that contribute to the success of a software project.

Software Engineer

Full disclosure: I'm an engineer who frequently writes software. That's different in many respects to being a software engineer, as I will explain. The term "engineer" was thrown around far too often in the 1990s and 2000s, used to describe practically any technical job. I backlashed when the term "sales engineer" entered the business vocabulary.

True engineering is a licensed profession. The road to certification is long and rigorous, and requires passing the Core Engineering curriculum in college. All engineers, regardless of specialization, must do this. Now very few software jobs require an expert understanding of thermodynamics and differential equations, but that's not the point. The core engineering curriculum teaches a specific, rigorous approach to problem analysis and solving. It also teaches the important concept of how technical requirements and limitations are balanced against schedules, budgets, and natural limitations.

Computer scientists learn this too, but not in nearly as helpful and rigorous a way. Engineering is not just building machines -- it's how to think about all the things that contribute to a problem and its various solutions. It's about creating methods that keep you safe and honest, and sticking to them. It's about knowing as much as you can before you need to know it. Engineers ask questions like "What happens if this piece breaks?" "Can this be made simpler?" "How are we going to build and test this?" "Is this component good enough?"

The fact remains that as engineers use the term, there is very little engineering in software engineering. But a "software engineer" is someone who wants you to know that he's producing a software product on a prescribed budget, within a predictable schedule, and having the functional properties listed, and with as few parasitic behaviors as required. Well engineered software is everything you need and nothing you don't, no surprises -- software engineers are those who know how to get that.

Software Architect

This term just needs to go away. And I say that from the perspective of someone who does a lot of what this title is meant to convey -- the high-level, overarching design of software systems. I embrace the activity, but I eschew the abstractionist overtone.

The title connotes that a certain person is responsible for the high-level or critical design elements of large-scale software, while the detailed component designs are left to others and do not concern the architect. My father and sister are architects of the brick-and-mortar variety, and they assure me the title is misplaced. Your high-level designs in physical architecture are not credible if they gloss over the problems that arise in the details. Thither software.

This truth is confirmed from my personal experience with some of the so-called architects of systems and software. I have seen so many high-level designs fail miserably because of naive assumptions for what the components are and what they will do, and most importantly: how they will interact. System behavior is not generally determined by the elegance of the high-level design. It is not determined by the individual behavior of components -- although this is sometimes true, such as in the case of a mis-tuned database. System behavior is most visibly the product of the interactions among components, most of which were not foreseen or accounted for by the architect. The ability to see these interactions and explore their behavior before building the system is what constitutes good architecture.

This isn't a rant against high-level design, which obviously has to occur and be done correctly. It's a harangue against those who practice only this type of design and consider it an appropriate division of labor. Good design simultaneously considers the system at multiple levels of abstraction and implementation, and as proper subsystems having their own unique holistic behavior. Insisting upon the title "software architect" conveys to me a dangerous indifference toward where the success of a software design truly lies, and improperly aggrandizes the role of high-level design.

Programmer / Coder / Analyst

I've long considered these to be mostly obsolete terms. They're ironically the most straightforward, at least in terms of their common-sense meanings.

In decades of doing this kind of work, I've never figured out what an "analyst" is, either in software or in engineering. In my experience, analysis (in the dictionary sense) is something everyone needs to do. "Requirements analysis" is, at first, just reading comprehension. Breaking down requirements into actionable tasks may elude some people, but really it's part of everyone's job. Everyone learns to do it, so there isn't any need for a dedicated analyst.

Programmer used to be a broad, general term. But now it seems synonymous with "coder," which in turn seems to suggest the foot-soldier of the software industry. Again, this is part of everyone's job who has hands on the design and production of software, so one should never be just a coder.

In a more amusing sense, I considered this part of some 1970s ideal of software workflow in which necktie-wearing geniuses wrote software on paper pads -- yes I still have a pad of "coding forms" -- and turned them over to talented typists who punched them onto manila cards without understanding what they meant.

Developer

I think "developer" is coming to the fore as a replacement for "engineer," and I applaud the change. Not because I'm an elitist engineer, but because I think the development of software encompasses a range of activities that doesn't fit any of the earlier pigeonholes well.

Classical engineering operates in a world dominated by small design vocabularies and rigid silos of activity from design to test to manufacturing and operation. It is successful in this pattern, but the pattern doesn't always fit software well. The component software producer must have a broader scope of activity than the classical engineer, and therefore a broader scope of competency.

Calling yourself a software developer tells me you can work in requirements analysis, design, programming, unit and integration testing, and deployment with relatively equal success. While the production of the actual finished code is obviously paramount, the software developer has a larger perspective.

More so than the others, this title lends itself to specialized prefixes. People introduce themselves to me, for example, as "Windows developers" or "firmware developers," and this tells me important information. However, the term "web developer" has come to comprise an astounding range of talent, from highly-skilled and highly-trained practitioners who apply their talents to products for the Web, to self-proclaimed (and often astonishingly incompetent) folks who simply read a book on PHP and now offer their services to a paying public.

Lest elitism rear its ugly head again, some of the worst program code I've ever seen has been written by electrical engineers and academic scientists -- in other words, by people with appreciable intellect and training (albeit in other fields). "Web developers" most often arise from the ranks of graphic designers and other people skilled in visual or literary arts, who discover suddenly that computer programming is now part of their jobs. The smarter and richer ones hire competent software developers to produce the technology behind their web sites. But the height of pretense, in my book, is to install Wordpress, upload some content, and call yourself a "web developer."

That's probably going to be an unpopular opinion. Production pressures and values dictate that this is probably how ninety percent of all web sites have been and will be produced. And that's as it probably should be, since honestly I'd rather have the visual artists concentrating on what they do best, and because gear-heads are notorious for producing very ugly web sites. But equating "web development" with "software engineering" or "computer science," when what you are doing is content management, is a bit unfair to the technically savvy whose title you're borrowing.

So there's the cast of characters as I see them, in late 2013. Imagine what we'll be calling ourselves in ten years.

Friday, November 22, 2013

Nostalgic Look at Cray

This colorful monster used to be the face of high-performance computing. Nowadays we use massive parallelism -- clusters built from relatively ordinary standalone architectures connected by "fabrics" of high-speed networks.

Cray's design features were enviable in the 1990s, but are now considered decidedly passé. They concentrated on single-CPU, single-pipeline designs, using exotic circuitry and cooling to eke the highest performance out of that limited design. The industry went a different direction, using relatively cheap COTS processors and main boards.

The Cray CPU was a collection of bipolar transistor logic modules immersed in a proprietary liquid coolant called Fluorinert. The circular cross section of the CPU cabinet was to reduce transmission distances for the wiring. Bipolar electronics -- not to be confused with manic-depressive circuits -- are insanely fast, but have the undesirable property of consuming electrical current constantly in order to maintain state. Modern electronics consume current only when they change state, so electrical power needs fluctuate with actual use. And the Fluorinert coolant was expensive and messy; the only advantage was the integrated seating offered by elements of the cooling system surrounding the cabinet. We still use liquid cooling, but we rely on cold plates, water blocks, and other less messy heat transfer setups.

Let's be honest: the sheer "geek chic" factor of computers immersed in liquid coolant was enough to make us drool over Crays. The colorful, minimalist cabinets with their integrated lighting and cylindrical, towering forms were almost literal quotes from the mythical Krell architecture of the landmark 1950s science fiction films.

But it was their programming model that really shone at the time. Piles of general-purpose registers, all 64-bit. Another glistening pile of vector registers, with compilers for Fortran that would vectorize the inner loops. With a few cycles of setup, each vector element operation took only one CPU cycle. And then an even larger pile of "secondary" registers -- essentially an L1 cache under explicit programmer control. Data could be exchanged with cache registers in a single cycle.

While we achieve astonishing performance with the IBM SP and Intel x86 architectures in SIMD and MIMD designs, we have to genuflect to the sheer elegance -- in all aspects -- of the legacy Cray design. From the color of the case to the orthogonal Zen of its programming model, it was a champion.

Errors matter

"If your error_log is bigger than your access_log, you might be a bad webmaster."
--Bill the Web Hosting Engineer

True and wise words. Most of the problems web designers run into, and escalate to the hosting provider, could be solved simply by paying attention to the logs produced by the web server, the database server, and the language runtime.

The PHP runtime tells you when you've committed a language faux pas such as referencing undefined variables. While you, the original programmer, might understand that the reference in question is safe, the subsequent maintenance programmer may not. Write your code to be transparent, so that its intent and correctness are evident by inspection.

Web server logs tell you many things such as broken links that invoke potentially heavyweight HTTP 404 error handlers.

MySQL "slow logs" give you insight into which queries are problematic. Lots of them indicate that you need to think carefully about what needs indexes. The ultimate example, I think, came from a client who reported a MySQL error that indicated a full disk. Investigation showed it was the MySQL slow log that had grown to over 9 gigabytes. Thus explained all his own end users' complaints about site performance. His queries were searching an unindexed table containing every commercial transaction his business had performed since it started -- some 3 million rows.

"So you're saying indexing the columns would make the site faster?"
--the client

Yes we can make the point that proper data modeling and implementation is part of the job. But any person who writes code for the web needs to realize that a properly rendered site in the browser is not the final (or even most important) metric of how successful that product is. Learn what makes the web work, and learn how to read the indications of success or failure.

Thursday, November 21, 2013

Don't be a dick

Ars Technica reports that LG smart TVs are working over your home network in NSA mode.

If your home network is like mine, it puts to shame pretty much any office LAN of 20 years ago. Just a brief glance at the Untangled DHCP assignments reveals my growing collection of laptops, a small handful of media servers, a few desktop workstations, and a regiment of mobile devices all my friends bring with them. That's dozens of devices, some of which probably have accessible folders.

Yes, you should lock down your network and any of the devices on it. But let's face it: not all my friends are as tech-savvy as me and my evil genius roommate. They won't necessarily know how to do this. And let's face it: no one wants the paranoia of wondering whether your friend's appliances are spying on him.

"The doll's trying to kill me and the toaster's been laughing at me!"
-- Homer Simpson

If you program embedded systems, don't do this. Don't send all kinds of intrusive information back to your company. Don't wander aimlessly over whatever network you find yourself on. It's just impolite.

Let me illustrate. Years ago I worked on next-generation satellite television systems. I mostly worked on the spacecraft-integration end. But along the way we came up with the idea of using the emerging on-demand features of the medium to tailor advertising to the viewing habits of the end user. We had good motives. Most of us were single men, and had a fervent desire never to see feminine hygiene advertisements.

But that meant storing viewing preferences -- programs watched, etc. And it also meant transmitting that information to edge servers that could deliver the tailored content. Even though the association was only to the device ID, we had moral reservations. We firmly believed that one's viewing habits were a matter of individual privacy, and we had no desire to facilitate whatever nefarious purpose someone else might want to make of that information later.

(Yes, Netflix unabashedly does this now. We were angels back then.)

But let's take it a step further. You might actually be incurring legal liability by snooping on private networks and sending the data off-network. Most networks provide the concept of trust among well-defined peers. This means your desktop might provide more lenient access to other hosts on the local network, simply by virtue of their being on the network. If your embedded appliance code blindly transmits things like medical or financial records off-network to your company servers, where it suddenly becomes accessible to your data managers, then that is a clear breach of trust and ethics.

You run a terrible privacy risk covertly tracking your users' habits even for your own business purposes. You have no business intruding into other parts of their lives, or networks.

Who and why?

This blog has been a long time in coming. I'm an engineer and a computer scientist. My field is computational tools for engineering, which means I see a lot of beautiful things like the 787 Dreamliner taking flight. (I worked on the computational fluid dynamics that drove its design.) Lately my engineering duties have grown to encompass systems engineering for consumer Internet service.

Through it all I have seen the best and worst of computer programming. This blog focuses on good and bad software practice. As software becomes more responsible for how our world works, and as more of it devolves the purview of "web designers," it becomes more important to maintain a high standard of qualification and competence.