jay's code rage: 2017

Frank Deford

A lot of guys in my line of work have no use for sports. But the passing a few months ago of the great sports writer and commentator Frank Deford means I'm going to talk about sports. Or at least about sports writing. On NPR every Wednesday morning for decades, in what The New Yorker's Nicholas Dawidoff calls a "breezy vernacular," Frank told a long-form sports story that me feel like I -- a non-athlete and professional nerd -- understood what sports was about.

Frank was up front with Dawidoff about his secret -- Chekov's rifle. Chekov, the great Russian playwright, said that if there's a rifle above the fireplace in Act One, by the end of the play someone had better take it down and use it. Defore always opened his reports with some interesting sports tidbit, then spun a seemingly unrelated tale that always managed to find its way back to home plate and that opening tidbit. Unconsciously you knew he was going to do this, so you listened.

Despite years of listening to Frank Deford, I'm no good at sports tidbits. But I do know history-of-computing tidbits. The venerable Fred Brooks authored one of the earliest and still-best books on software engineering project management and left with us Brooks' Law:

How does a large software project get to be one year late? One day at a time!

The notion of incremental crapulence attends us in many ways other than just schedule slippage. I'm going to focus on one of those other ways, with an example.

Code Rot

"Code rot" is industry jargon for the effects of various forces that tend, over time, to make code more buggy, less maintainable, and less reusable. We use the metaphor of decay to illustrate that this seems to happen without deliberate effort. Code that sits largely unchanged will still start to "smell," giving off odors such as API skew, abandoned idioms, obsolete patches, and the like. Especially when its original developers move on to other companies or tasks, then rotten stanzas of code whose original purpose is known only to the departed author begin to fester and beget bugs.

Let me tell you a typical story of rot.

There once was a great software stack for a best-selling appliance. It incorporated everything, as we say, "from sand to pixels" -- custom hardware and drivers all the way through the operating system and middleware applications to an attractive web GUI. But as with many such stacks it had moved in places, stagnated in others, and was about as rotten as a codebase could get while still being profitable. It had originally been written by a small startup and arrived at a major corporation through a series of acquisitions and reorganizations. The OS was ten years old (although security-patched). The build had splintered into a half-dozen incompatible tool chains, the bugaboo of more than a decade of unbridled developer preference.

The newly minted head of the fifty-strong development team immediately prescribed a refresh. The OS was traded out for a brand-new commercial-grade Linux distro. The build was normalized to a single compiler and set of libraries. A handful of the best and brightest on this team worked six months to refresh the entire system from top to bottom. All the old code rot was gone; the system gleamed and hummed like a lovingly restored Shelby Mustang.

Everything new is old again

All was fine until a developer told me he'd been chastised for making some needed changes in a particular application, which we'll call Whiskey. We'll call the developer Mike.

Whiskey was the newest member of the software stack. Prior to the great refactoring, it had been postured as the glimpse of the future. It used the latest tools, adhered to the 2011 standard of C++ with aspirations to go to C++14. After the refactor, it was almost entirely free of rot.

Almost. Soon after, the lead developer -- we'll call him Peter -- had checked into the Whiskey code base a new and safer database layer and used it in one of Whiskey's modules. Mike had made some changes in that module's usage and Peter was furious. So we called a meeting of all the stakeholders, including George the system architect.

"Mike didn't understand the workings of this code," began Peter. "It uses a generic framework and relies on generated modules to do the detailed work." Mike had unknowingly made his changes in generated code. Okay, generic frameworks and modules are good. Automatic generation of tedious code is good. Should be easy to get everyone on the same page.

"All right, lets start at the beginning," I said. "Where did this framework come from?" Come to find out it had been copy-pasted from elsewhere in the system and modified to interface with Whiskey's tool chain and reporting APIs. "So we have our first problem. We have two slightly different versions of the same code being compiled and run in different apps. That's undesirable from a maintenance perspective." Our refresh team had spent considerable effort collecting and centralizing all pseudo-shared and pseudo-reused code.

Turns out this "generic" framework wasn't as generic as Peter had advertised. It had to be modified for use in each different applications, and Peter hadn't bothered to backport or refactor in order to achieve his goals for genuine reuse. As usual, schedule pressure had persuaded him to add that to the tech-debt pile and move on.

"I saw the code generator," said Mike. "But it wasn't being invoked in the build. There was just this file in the SCM repository and that's what needed to be changed. So that's where I made the change." That surely merited a follow-up.

"Wait -- so the generated code was checked into the SCM?"

"Yeah, it had to be," said Peter. "The generated code needed some rework in order to fit into Whiskey, so I manually ran the code generator, made the edits, and then checked in the edited, generated code." George was starting to look a little worried.

I pressed the question. "So it sounds like Mike was only following the policy you yourself put into place." Manually-generated, hand-patched code was part of the rot we'd worked so hard to eradicate. It was bothersome for developers unfamiliar with parts of the system to have to figure out the "secret sauce" for building the parts that weren't automated. "And that's not a good policy, certainly not one we want to embrace moving forward."

Of course Peter knew that. You didn't get to be a lead developer without knowing what best practices were supposed to be. You were supposed to go back and fix the code generator so that the code it generated worked wherever it was supposed to. You were supposed to incorporate the code generation step into the build so that developers would know to make their changes to the generator's prototypes. But schedule pressure and momentum had overcome Peter's desire to do the right thing instead of the expedient thing, and now there was Mike laying there under the bus and George shaking his head at seeing the Athlete's Foot start to grow on his brand new creation. (See, we're still sort of talking about sports.)

Happy ending, of course. Mike committed to learning the ins and outs of the code generator so that he could backport his fixes. Peter committed to fixing the code generator and integrating it into the Whiskey build, pulling his one-off changes out of the SCM.

Jay's Corollary to Brooks' Law

So here's my corollary to Brooks:

How do codebases rot? One shortcut at a time.

Keep in mind that everything still worked after the shortcuts. The code built and ran correctly. Out of the 400 or so source files that made up Whiskey, only one of them needed this special handling. But that's at the outset. Incremental crapulence is, well, incremental. It doesn't seem so intolerable at the time you make it. The first file to become a special case is the first spore of rot. People normalize to it and start again to work that way habitually. The Peters help us meet important deadlines. The Mikes do their best to keep rot at bay. And the Georges keep us profitable.

Fred Brooks didn't write a book on how to manage a software development project. Brooks used software development as a framework to describe how people behave in groups when given a task to accomplish. And you either come to grips with that group behavior or you go under, no matter what your task. One of Brooks' discoveries -- the self-organizing team -- is still the core of nearly all today's Agile methods. Some people are the short-cut developers who get quickly to the milestones and others are the housekeeper developers who keep things tidy and maintainable. It's like having the heavy-hitters who can knock one out of the park, and the clean-up batters who can hit consistently to some neglected corner of the infield, if not always for a home run. You need all of those to make a team, so that their strengths compound and their weaknesses cancel out.

There, a baseball metaphor. I've shot Chekov's rifle and brought us back to what Frank Deford was doing all those years. Deford wasn't really writing about sports. Like Chekov and Brooks, he was using what he knew to frame important insights about people. And codebases rot because people are people, and they take one seemingly insignificant shortcut at a time. It's management's job to make sure the clean-up batters on the software team have not only the key positions in the lineup, but are allowed to do what they do to save the team. Thanks, Frank. I'll miss you.

jay's code rage

Tuesday, August 29, 2017

How Codebases Rot

Code Rot

Everything new is old again

Jay's Corollary to Brooks' Law

Wednesday, March 29, 2017

Why the passage of HJR86 is so bad

Who is this guy?