Solve The Real Problem

Discussions about professional software development with a focus on building real, solid, performing, reliable, monitorable large-scale online services in asynchronous C++. You know, good solid breakfast food.

Tuesday, April 20, 2010

Are we really still debating comments?

A colleague pointed me to a list of reasons why comments are bad (with a sigh):

They do not change with the code.
Only if you don't change them when you change the code, which is irresponsible. They're right there next to the code. They helped you remember the intent of the code (why it was trying to do something, and how it was going about it). If you don't update the comments for the next guy, or the next instance of you, you aren't finished the job.
A wrong comment is worse than horribly complex non-commented code.
This is an excellent reason to write proper comments and change them when the code changes. It is not a reason to omit them altogether.
A comment almost always expresses an absolute thing, where a well named variable or method can express an abstract concept. Comments can actually tightly couple your code. Take the example above where int i = 5 had a comment stating that 5 was the default. DEFAULT_PRODUCTION_WIDGET_COUNT communicates the same information, but at a higher level of abstraction.
This reasoning is flawed. The presented argument (via the example) is a false choice between comments and good variable names. Choose both.
Comments don’t show up in stack traces.
Stack traces are the poor man's error handler or log message. And besides, if a stack trace means something to you, you are familiar with the code and thus you have it and its commented intent.
Reading comments is optional. Reading code is mandatory.

Understanding the intent of code is mandatory. The vehicles for achieving that include reading the comments, whose primary purpose is to tell the "why" behind the "what" that is expressed in code.

This list sounds like rationalization for not doing the entire job, and not doing a good job. The presumption that one's code is so good, and so clear, and so well-suited to its purpose is naive and bordering on pompous, since it doesn't take into account the reality of multi-developer, multi-year, multi-site, multi-purpose, multiply-reworked and repaired code. Not commenting your code may be good enough if you are its only consumer, but otherwise, it's amateurish and a red flag. Stay away.

Wednesday, February 17, 2010

More Giants' Shoulders

The 2010 issue of Core arrived the other day, and a statement by Steve Blank, one of the contributors, caught my eye:

Each generation assumes it is inventing the future, with no recollection that it's already been done.

To me, that's a reminder that there are very few problems that are truly new. Look for lessons from those who have gone before and see how they can apply to what you're doing now.

For example, I've also been reading Classic Feynman, which truly is "all the adventures of a curious character", as the subtitle says. The stories Feynman tells about his life are fascinating, and the tone and frankness is fun to read. Near the end of the book, there's a reproduction of his (well-known) appendix to the report from the Presidential Commission that looked into the 1986 Challenger disaster. The are two paragraphs that stood out to me. The appear together in the book, but I've commented on each separately below.

The usual way that such engines are designed (for military or civilian aircraft) may be called the component system, or bottom-up design. First it is necessary to thoroughly understand the properties and limitations of the materials to be used (for turbine blades, for example), and tests are begun in experimental rigs to determine those. With this knowledge larger component parts (such as bearings) are designed and tested individually. As deficiencies and design errors are noted they are corrected and verified with further testing. Since one tests only parts at a time, these tests and modifications are not overly expensive. Finally one works up to the final design of the entire engine, to the necessary specifications. There is a good chance, by this time, that the engine will generally succeed, or that any failures are easily isolated and analyzed because the failure modes, limitations of materials, etc., are so well understood. There is a very good chance that the modifications to the engine to get around the final difficulties are not very hard to make, for most of the serious problems have already been discovered and dealt with in the earlier, less expensive, stages of design.

Replace engines with software systems, turbines and fan blades with modularized libraries of components, test rigs with automated unit tests, and you have the basic recipe for successful software development. Start with a good foundation of components that you understand (in terms of both capabilities and limitations), and build on it in a structured, methodical manner to produce more and more capabilities without being swamped by complexity, and thus bugs. This is the philosophy that I and the other founders of Starscale share, and we've seen it succeed in several environments.

The subsequent paragraph is as follows.

The Space Shuttle Main Engine was handled in a different manner—top down, we might say. The engine was designed and put together all at once with relatively little detailed preliminary study of the material and components. Then when troubles are found in the bearings, turbine blades, coolant pipes, etc., it is more expensive and difficult to discover the causes and make changes. For example, cracks have been found in the turbine blades of the high pressure oxygen turbopump. Are they caused by flaws in the material, the effect of the oxygen atmosphere on the properties of the material, the thermal stresses of startup or shutdown, the vibration and stresses of steady running, or mainly at some resonance at certain speeds, etc.? How long can we run from crack initiation to crack failure, and how does this depend on power level? Using the completed engine as a test bed to resolve such questions is extremely expensive. One does not wish to lose an entire engine in order to find out where and how failure occurs. Yet, an accurate knowledge of this information is essential to acquire a confidence in the engine reliability in use. Without detailed understanding, confidence cannot be attained.

This so accurately describes the bad software systems I've seen (and in some cases, replaced) that I was taken aback. Think of the over-featured, under-tested, mis-designed, over-generalized, poorly-implemented, bug-ridden monsters you've worked on, and what happens when someone finds a bug that uncovers a fundamental flaw in the design of some subsystem or the entire thing. It's a case of uninformed design, where classic errors have been made, such as presuming everything will work out, and the details can be dealt with later and can't possibly influence the overall system's shape. How can you design something if you don't know what it's even made of?

With software, we can make free copies when testing it as a whole (unlike the engine), but instead we have the costs of seeing through the complexity to actually diagnose a failure. Most often, poorly designed systems have poorly designed diagnostics, and tracking down a failure is a whole lot harder when you don't have something as obvious as a cracked fan blade to start with. Plus, the big risk is the same: that what we've built is no good. Throwing away the design is way more expensive than throwing away a copy in both examples.

If we design with components that we understand, we can attain confidence that it will work and it will work well.

Wednesday, August 12, 2009

Real Progress

I've got nothing signficant to say at the moment, except that I happened across The Secret of Success: Suck Less, and yes, I think he's on to something.  I've had similar experiences using software that used to be great but then just accrues a pile of new useless-to-me features that I never intend to use until it gets to that point where it just sucks.  

I think this happens a lot with the "shiny thing" method of product development.  Too often, people can be easily distracted by something shiny and new.  If those people are driving a software product forward, that can lead to a loss of focus on the core goodness that makes people want it.  It's especially bad when shiny features distract effort away from quality improvements (as Maz says in the referenced post).  Of course, some whizbang features are great additions to the software . . . especially when your users have told you it would suck a lot less if you'd implement them. :)

Oh, and while you're there, give Simplicity and Security a read, too.

Wednesday, October 17, 2007

Embarrassing Code

I've started reading Beautiful Code and the first three chapters are just as I've expected. Verterans talking about some specific problems more to give you insight in to how they think than to help you fully understand the problem. They've been good.

Now, however, I've read the fourth chapter, by Tim Bray, and I'm embarrassed for him. The article reads as biased Ruby evangelism with religious reasons for why Ruby is great, in stark contrast to the previous chapters that focused on making good design and implementation choices. I don't dislike or know Tim, but his chapter in this book actually bothered me enough to write about it.

Tim's style of prose is a little too informal and the chapter feels like it is targeted at beginner programers. However, it's the content that really bothered me. For example, he states the following:

This example (and most of the other examples in this chapter) is in the Ruby programming language because I believe it to be, while far from perfect, the most readable of languages.

If you don't know Ruby, learning it will probably make you a better programmer. In Chapter 29, the creator of Ruby, Yukihiro Matsumoto (generally known as "Matz"), discusses some of the design choices that have attracted me and so many other programmers to the language.

He then follows that with his first example program, which he elaborates on later to explore the problem domain he has chosen: showing the ten most popular articles on his personal blog (which is of course referred to with its URL). The program he shows is:

1 ARGF.each_line do |line|
2   if line =~ %r{GET /ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+ }
3     puts line
4   end
5 end

Tim puts himself in religous territory right away by declaring his belief that Ruby is not just a beautiful langauge or among beautiful languages, but is actually the most beautiful language. If this was an isolated statement, it would just be a poor choice of words, but much of the article is rife with Ruby praise. And, where Ruby lacks (such as the need for the Pascal-like verbose block terminator "end"), he acknowledges this, but in a dismissive way that makes you feel like he's waving his hands in front of the warts. I found this especially distracting since he had to hand-wave away two of the five lines of his first program example.

The start of the second paragraph in his program's preamble almost offended me. It seems very presumptuous for Tim to declare that I'm not as good a programmer as I could (should?) be because I don't know Ruby. The sentiment is compounded by the next sentence which smacks of "all the other kids are doing it". I'm prepared to be surprised and disturbed by facts and even anecdotes in a book like this, but not by judgements and peer pressure.

So, after presenting the reader with this five-line program in "the most readable of languages", Tim then takes an entire page to describe what it does, line by line. Clearly, he needs to explain some of the unique, custom syntax (both of those adjectives tend to fall outside of my personal beauty bucket, by the way) used by the program. I'm not sure why he's needed this line by line explanation if the language and program are so readable and beautiful. This isn't something any of the previous chapters' authors have felt the need to do, and one was performing an incremental complexity analysis of Quicksort. Ironically, the line numbers in the example are apparently added by Tim to aid readability.

Much of the praise in this early section of the chapter is devoted to regular expressions, and that is justified. There seems to be an implication that that praise is somehow attributable to Ruby, but this is probably more the fault of the mood set by Tim than it is that of the actual relevant text.

Notice also that this program is equivalent to:

egrep "GET /ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+ "

Given the application domain of reporting on log files, grep seems a more suitable solution so far. Tim expands the example after an overly detailed and unnecessarily exotic explanation of associative data structures (which he refers to as "Content-Addressable Storage") to instead count each article reference. I think it's important to see his expanded example for context.

counts = {}
counts.default = 0

ARGF.each_line do |line|
  if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
    counts[$1] += 1
  end
end

keys_by_count = count.keys.sort { |a, b| counts[b] <=> counts[a] }
keys_by_count[0 .. 9].each do |key|
  puts "#{count[key]}: #{key}"
end

From what I can tell, it becomes equivalent to:

egrep -o "/ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) " \
  |sort |uniq -c |sort -r -n |head -10

Approaches similar to the command line above seems acceptable to Tim because later when discussing more complex problems he talks about using multiple programs that produce intermediate files (although not in a pipeline) and do the processing as a series of separate Ruby programs.

Doing the same took another six lines of code in Ruby, and all of the useful syntax appears to be directly borrowed from Perl. In uncompressed first-draft Perl, you can write that as:

while(<>) { 
  ++$counts{$1} if m!GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) !; 
} 
@sorted = sort { $b->[1] <=> $a->[1] } map { [$_, $counts{$_}] } keys %counts; 
for $i (0..9) { 
  print "$sorted[$i]->[0] $sorted[$i]->[1]\n"; 
}

For my logs on my system (with a different regex, of course), the command line runs in ~2.4 seconds and the perl runs in about ~0.76 seconds. Tim's Ruby version processes my logs in ~2.0 seconds. My Perl took me longer to write than the command line, of course, but it didn't take more than a couple of minutes.

Tim says he wondered if his Ruby program was slow, so he wrote a Perl version to compare it to. But he doesn't include his Perl version in the article, so he doesn't draw any comparisons between the two implementations. Given his assertions about how much more readable Ruby is, I would like to have seen Tim do a direct comparison, especially since an alternative implementation was prepared for performance comparisons anyway. Its omission leaves his claims looking unfounded and tenuous.

The chapter does raise good points about when to spend time optimizing, and there are a few paragraphs that explicitly credit the "scanning lines of text with regexes" approach to awk, but they are in a different typeface and I wondered if they'd been inserted by the editors (but they aren't; they're written by Tim himself as a sidebar). The discussion of binary search and its Java implementation are just textbook cases with a small amount of advice, but nothing you wouldn't guess or know already. The chapter ends with an out-of-place discussion about the large Internet search engines that has nothing to do with design or code that I could see.

The point is that there is no great improvement in programmer efficiency, performance or readibility demonstrated here with the use of Ruby, so why does Tim focus on the language instead of the application domain and how its problems can be solved elegantly? Why did he chose such a simplistic problem that any experienced programmer can solve with a one-liner UNIX command or a few lines of a common log-processing language (Perl)? I don't like being fed an evangelist message that comes without respect or substance, but that's what much of this chapter feels like. Given Tim Bray's experience and expertise in information systems, I was looking forward to something really interesting and worth exploring.

Tuesday, January 23, 2007

On Simplicity

I had a discussion today with someone about software, and when I was trying to describe some of the key qualities I seek when building systems, I listed off the usual set of descriptors: scalable, fault-tolerant, simple, monitorable, maintainable, and so on. What surprised me was that the other person picked right up on the word "simple". As soon as I said "simple", they understood exactly what I was getting at, and they could see the kinds of benefits that simplicity could have. The surprising parts were that they aren't a software expert by any means, and I almost didn't even include "simple" in the list.

Why was it an afterthought? Because, for me, it has become so obvious, so necessary that it almost literally goes without saying. Which is funny, because if I had to chose one word, and only one word, to describe what I aim for, "simple" would be it. I'm reminded of a quote I read in the official Bell Labs history:

"Cognitive engineering" is what [Joe] Condon called it, "...that the black box should be simple enough such that when you form the model of what's going on in the black box, that's in fact what is going on in the black box."

This is an interesting way to think about design. Through appropriate uses of abstraction, even well-designed complex systems can be see as a collection of simple "black boxes" that, at a high level, just do what you think they do. Their complexity comes out of the number of abstraction layers and the number of abstractions that interact throughout those layers, and not from within any of the individual abstractions. Any individual black box has a clear, simple function. It might need several inner black boxes in order to perform that function, but those again, are clear and simple. This self-similarity forms a very organized chaos, which, if you tried to view it all at once with all the walls of the boxes taken away, would be an incomprehensible mess. It is the power of abstraction, the power of the black box in turn enabled by the power of the well-defined interface that turns complexity into simplicity.

So, as a systems designers and builder, I will frequently be faced with a complex problem that needs solved. By identifying the individual "real problems" within the overall problem space, I am applying the abstraction tools to impose an order, piece by piece. Whether I do this top-down, bottom-up, or any which way, the point is that I start building the walls of the black boxes and start defining the interfaces between them at a variety of levels. I try to use as many existing boxes as I can (sometimes with renovations), and, in fact, my experience is such that this process itself exposes more opportunities to use the same kinds of boxes in different problem areas. If someone can look at the end result and see simplicity, then I have been successful in designing a flexible, trustable solution. That's when we know we've done well.

Sunday, December 17, 2006

Responsibility and Optimization

I was reviewing an external large-scale system architecture the other day, and suffice it to say the documents made a point of avoiding referential integrity (i.e. "no foreign keys"), stored procedures, and so on, in their databases, all in the name of scale and performance. Now, from what my database-expert friends tell me, sometimes you can drop certain key constraints in production once you're confident you don't have many (any?) bugs in that area, and you can squeeze some more performance out that way, but that was not looking like the case here.

In fact, it sounds like the same misguided approach to scale I've seen even on occasion in large-scale organizations like the mothership. To me, being the C++ programmer I am, it's analogous to saying you're going to use C-strings instead of std::string because there's less overhead. And then you miss the point entirely of high-level optimizations. And, while clearly you need fast basics, we all know that high-level optimizations are where most of the win is. Lose one high-level op and you save billions of operations. Love one memcpy and you save hundreds, or even thousands?

Then take the fact that higher-level constructs (created by the imposition of the same constraints that cause things to be "slower") give you a better optimization framework. Sticking with the string example, just think about how much time and memory you can save if you take advantage of std::string's (usual) copy-on-write semantic. Try doing that while still letting every libc string function every assembly-level C string "optimization" be legal and safe. You can't because you end up with the essentially same constraints imposed on you as are on the higher-level std::string construct . . . for the same reasons. You need rules so that you know what can't happen and thus you can make better assumptions about what can and thus better optimizations.

Constraints and interface limitations yield implementation freedom within that black box which equals optimization freedom. Without clean interfaces and clear lines of responsibility, you can't have a practically optimizable system because you don't get to change your mind when your requirements change or you learn new things: you're stuck with your initial optimization "guess" 'cause it's baked in to your design by way of your lack of good interfaces. And then it just boils down to premature optimization, and we all know what that means.

On the flip side, constraints and interface limitations split the system into human-manageable pieces, which is incidentally one of the original goals of C++ (see A History of C++: 1979 - 1991). This simplifies the system from the designers' perspectives and often informs us of (very-)high-level optimizations that can be made through this clarity of understanding. Plus, we then take a system with clean interfaces and divide the work up amongst domain experts who further analyze and optimize within a clear area of responsibility. This saves us runtime effort and improves the efficiency of the production pipeline by parallelizing the development effort. Of course, one of the keys to doing this right is to avoid producing six thousand pages of useless documentation in the process, but to instead have tools that let you write your interfaces in human-readable form first, but can then still serve as the canonical form for actually implementing the interface. After all, if you must live without all but one kind of documentation, it is interface documentation that you keep. But try to live better than that!

For me, it is always structure and organization first towards a correct and maintainable system, and then optimization within that space. You break the rules only when you prove to yourself that you must.

Friday, November 03, 2006

A Little Deconstructive Criticism

So Tim sends me email yesterday:

Ok, I'm not trolling here, I really want to learn something.

Came across http://scienceblogs.com/goodmath/2006/11/the_c_is_efficient_language_fa.php which states among other things that C/C++ are no good for numerical computing.

The quote that got me wondering was "In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++".

He quotes an experiment he did where his OCaml implementation of some algorithm beat his C implementation and his C++ implementation of the same language. (Aside, the Java and Python implementations were crushed by these first three).

Thoughts?

Clearly that's gibberish because C is basically a macro assembler and can do anything. This reminds me of when Keith Lea that said Java was faster than C++ . . . because he wrote all of his C++ naively with the most obvious approach instead of the most efficient. He was open to the having an email discussion, however, but IIRC he seemed determined that I was unfairly making the C++ faster or equal because I was a good programmer.

In this most recent example, Mark makes a similar error in his analysis. His sample code loop exploits limitations of the particular optimizer he was using and not the language. It sounds like this OCaml is just an array and matrix aware language that has high-level constructs for such. And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing.

Nobody in their right mind writes C++ code with raw arrays as primitives. And this leads to The Eternal Flaw in language comparisons as I always see them. People are always saying, "Language X is better than Language Y because of [insert library feature here]." Just because a language's standard distribution does not come with a library piece you need to do your thing well does not mean that language is no good. Get the right tools for the job, people. Some specialized domains can benefit from the expressive power of a specially-tuned language. Some languages (like C++) have the ability to be effectively extended by letting libraries overload their native notational forms and thus offer what looks like a specialized language but actually isn't (Boost.Spirit being the most extreme example I know of).

As an aside, most of the problems I have with Java are to do with the fact that it is not as extensible, and, more importantly, tends to force a paradigm (classic OO, typically) instead of offer tools. To me, Java is a framework and C++ is a toolbox.

I was disappointed to see that the poster of the comment Tim refered to lists himself as a Computer Scientist, but doesn't seem to have applied the scientific method. It certainly looks like he's saying "I did one experiment on one case and it proves C sucks." I would say the weight of his argument is outweighed by the strength of his evidence (i.e. it has none).