Friday, March 19, 2010

Go Green: Reduce, Reuse, Refactor

Today's columnist is Christopher Sean Morrison from BRL-CAD. Sean writes:

I recently had an epiphany. Go Green. At least, that was what initially came to mind. I wasn't thinking about recycling soda cans, planting trees, preserving rain forests, or reducing my carbon footprint -- however honorable and beneficial those activities may be. I was thinking about open source. Open source needs to Go Green.

Before I lose you on the metaphor, let me rewind back and set the stage a little. A few weeks ago, I found myself doing what I love most, in an intense coding session working on one of my favourite open source projects. I was "in the zone", working in code for hours and hours on end without any distractions. Intense maybe isn't the right word for it, though. Painful. Yes, painful and frustrating. That fits much better. You see, instead of writing something cool and awesome during my insular zone time, I was hunting down a rather elusive bug. No epiphany to be had there.

It was the sort of frustrating grunt work that one loves to hate. I can usually isolate a bug in source code pretty quickly, but this bug was not cooperating at all. Quite rude. If fact, after many hours of hunting across dozens of debugger sessions, recompiles, and testing, I eventually threw in the towel on a direct approach.

Contrary to the vast array of debugging tools and experience at my fingertips, I found myself manually hunting through commit history in a binary-search fashion so that I could at least pin-point the change that caused the bug in hopes that the cause would be evident. After about four hours of hunting, I had finally narrowed it down to within a mere 1000 commits. Add a couple more hours and I'd finally found the exact problematic commit, isolated the bug, and a fix was in place by the end of the day. Still no epiphany.

I originally avoided manually searching history as I knew it would be tedious, boring, time-consuming, and most importantly, I knew that this non-critical bug had been there for a while. Our regression test suite caught the bug the moment it happened and -- while I'm sure there was a perfectly reasonable justification at the time -- it was duly ignored by all developers and it stayed that way for many months.

No epiphany necessary there either. It's obvious in hindsight that the bug shouldn't have been ignored. It would have probably been less costly to fix had it been addressed the moment it was detected during regression testing. Kaner, Bach, and Pettichord write about how bugs are cheaper to fix the earlier they are found in Lessons Learned in Software Testing: A Context-Driven Approach. McConnell strongly reinforces this notion in his book Code Complete. That decision blunder isn't the highlight, though.

A couple days later as I was tidying up the loose ends, finishing bug fix verification, and updating documentation for our release notes, I was flabbergasted by what I found. Not only was the bug already documented, there were notes on the cause, details on why it wasn't immediately handled, and thoughts on a proposed fix, all conveniently offered in plain sight exactly where it was supposed to be. Not only that, the commit log added insult to injury: I was the one that documented it.

You see, this particular open source project is a large, complex software suite with more than a million lines of code, dozens of libraries, and hundreds of tools. There is a lot of documentation. So much so that when some contributors started translating some of our documentation to another language, I was astonished to discover that we had somewhere between half a million and a million words. That's at least a few thousand pages even without pictures! Even still, what is written is considered insufficient and many features are still not documented despite our efforts.

Epiphany realization begins.

I started thinking hard about documentation and software complexity. If I had just overlooked that little piece of developer documentation that was succinct, easily accessible, in the right place, and even written by me, how awful might the situation be for our users that have to wade through a sea of unfamiliar documentation in a myriad of formats and locations. The organization, clarity, and complexity of the documentation is in direct correlation with the software itself. This is a software ecosystem problem.

The ecosystem software developers deal with is a complex world where toxic and eco-friendly practices for managing software share common ground. The structures we erect and fields we sow are impacted by ingrained complexity of features and complexity of implementation which in turn affect the manner in which they are managed. The more complex the environment, the harder they are to make easily navigable by others, the more overhead and infrastructure are required to provide basic services, the more costly they are to maintain.

Applying similar concepts of environmental responsibility, going "green" is about making real and lasting changes to the way software is managed. A few basic guidelines can be applied to pretty much all software projects, including open source, with relatively minimal effort. Eco-friendly developer activities strive to minimize complexity, maximize value, and optimize efficiency. This philosophical conservancy is aimed directly at improving the environment of all open source through Reduce, Reuse, Refactor.

Reduce Complexity

Complexity is acquired in open source projects in many ways but none as harmful (to users) as through the proliferation of options. If there's a request that can be solved with one more checkbox, another menu option, or just one more tool, there's generally a developer willing to implement it. The user gets their feature. Everyone else pays the price.

Source code complexity increases to implement the feature. The complexity of the user interface increases to present the feature. Documentation complexity (if even updated) increases to annotate how it works. All of those increases have implicit long-term maintenance and associated support costs.

In the commercial world, there are often managers, corporate culture, designers, and marketing teams to keep the proliferation of options in check. Apple has consistently demonstrated a strong capacity to produce relatively simple interfaces that expose a relatively minimal subset of options. While most open source thrives on freedom and choice, that does not mean that every option under the sun has to be presented to the user or that usability should be an afterthought. Ubuntu Linux is a great example within open source where usability is a primary focus.

Pay more attention to usability. Your users will thank you. The OpenUsability initiative specifically focuses on improving the usability of open source software by helping pair usability experts and designers with open source software developers. Open source projects need to organize their information, categorize features, and carefully consider the impact of exposing every feature to the user. Get rid of functionality that provides marginal value to users.

Reuse Components

Consolidate and collaborate. Instead of reinventing the wheel and writing things from scratch, spend that extra time trying to make someone else's code work. Most open source developers hate to hear this, thinking they can write what they need from scratch faster than they could integrate someone else's work. The truth of the matter is, though, that it's simply not nearly as much fun to read code as it is to write it yourself.

Reusing components doesn't just refer to other people's code. Reuse your own code and make it modular, well documented, and reusable by others. Joel Spolsky of Fog Creek Software wrote a fantastic article on this very subject entitled Things You Should Never Do, Part I where he writes about not being delusional that code written from scratch is inherently any better than code that already exists.

Even if a component is not directly usable or might take longer to integrate than it would to write from scratch (regardless of that claim generally shown to be a fallacy in the long-term), there is more aggregate value adding through reuse. In other words, you are being socially responsible to the open source environment by helping improve the existing landscape rather than merely adding to it. Collaborate with others.

Refactor Functionality

There are now several excellent books and an abundance of online resources that cater to code refactoring. Kerievsky talks about "bad smells" in his book Refactoring to Patterns where he points out a series of common source code issues that are indicative of potential problems. Martin Fowler and other authors provide a compendium of about 70 different improvements that can be made in Refactoring: Improving the Design of Existing Code. Particularly with regards to code duplication, adhere to the principle of "Don't Repeat Yourself" (also known as "Duplication is Evil"). Hunt and Thomas articulate this and other concepts in their book The Pragmatic Programmer where they characterize how to make code flexible, easy to adapt, and reusable.

From a practical developer perspective, there are a plethora of basic guidelines that will help future development and maintainability. Eliminate duplicate code. Break large complex functions up into smaller simpler functions. Remove classes and structures that don't provide value. Simplify complicated design patterns. Use clear, consistent, and simple naming conventions. For open source projects, this is an optimization-minimization problem. Refactor to improve usability. Refactor to encourage reuse. Refactor to improve maintainability.

No comments: