Solve The Real Problem: A Little Deconstructive Criticism

Friday, November 03, 2006

A Little Deconstructive Criticism

So Tim sends me email yesterday:

Ok, I'm not trolling here, I really want to learn something.

Came across http://scienceblogs.com/goodmath/2006/11/the_c_is_efficient_language_fa.php which states among other things that C/C++ are no good for numerical computing.

The quote that got me wondering was "In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++".

He quotes an experiment he did where his OCaml implementation of some algorithm beat his C implementation and his C++ implementation of the same language. (Aside, the Java and Python implementations were crushed by these first three).

Thoughts?

Clearly that's gibberish because C is basically a macro assembler and can do anything. This reminds me of when Keith Lea that said Java was faster than C++ . . . because he wrote all of his C++ naively with the most obvious approach instead of the most efficient. He was open to the having an email discussion, however, but IIRC he seemed determined that I was unfairly making the C++ faster or equal because I was a good programmer.

In this most recent example, Mark makes a similar error in his analysis. His sample code loop exploits limitations of the particular optimizer he was using and not the language. It sounds like this OCaml is just an array and matrix aware language that has high-level constructs for such. And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing.

Nobody in their right mind writes C++ code with raw arrays as primitives. And this leads to The Eternal Flaw in language comparisons as I always see them. People are always saying, "Language X is better than Language Y because of [insert library feature here]." Just because a language's standard distribution does not come with a library piece you need to do your thing well does not mean that language is no good. Get the right tools for the job, people. Some specialized domains can benefit from the expressive power of a specially-tuned language. Some languages (like C++) have the ability to be effectively extended by letting libraries overload their native notational forms and thus offer what looks like a specialized language but actually isn't (Boost.Spirit being the most extreme example I know of).

As an aside, most of the problems I have with Java are to do with the fact that it is not as extensible, and, more importantly, tends to force a paradigm (classic OO, typically) instead of offer tools. To me, Java is a framework and C++ is a toolbox.

I was disappointed to see that the poster of the comment Tim refered to lists himself as a Computer Scientist, but doesn't seem to have applied the scientific method. It certainly looks like he's saying "I did one experiment on one case and it proves C sucks." I would say the weight of his argument is outweighed by the strength of his evidence (i.e. it has none).

2 Comments:

At 9:45 AM, Anonymous said…

"And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing."

This is wrong and for the following three reasons:
- O'Caml is implemented in O'Caml not in C.
- Even if O'Caml would be implemented in C then it still is possible to generate better code than C. Because it can exploit certain features of the language to optimize the compiled code (static type checking to name one).
- By your reasoning: Since language X and C are turing complete you can implement any C compiler in X. So clearly X, given the right optimizer (in this case C), performs on an equal footing... The point I'm trying to make: it is irrelevant in which language a compiler is written when discussing the code it generates.
At 1:34 PM, Brad Spencer said…

[Note: The comment above is by a Blogger user named "Joop", but that information seems to have been lost in the Blogger migration to "beta".]

About "And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing."

You can write the exact same algorithm that takes the same number of instructions in C (or C++, or any decent language) as the running O'Caml program. It's a ridiculous pain and inconvenience and is almost certainly impractical for regular use, but it can be done. "So what?", you might say. That's perhaps the whole point: Why would anyone write in C? It's so much more painful to get the same result if you even can. Of course, but let me explain the perspective I have from my background of iterative design and implementation.

Let's imagine you've written the same series of instructions as the O'Caml program in C. Maybe it took you thirty years, but you finally did it and it works. Now, let's imagine that we look at what we did and we identify some patterns in it in terms of algorithms and data structures. So we decide to package those up into a set of tools that we can use to solve the next problem in less time with a higher-level view.

Say we keep iterating and keep adding high-level views that are easier and more direct to use. If we go far enough, we'll probably end up with O'Caml or something similar to it (if the goals are the same). So what we have is a continuum of models of abstraction. We've got a hard-to-write, impractical C mess one end and a high-level clean O'Caml program on the other end.

Now, there are other languages (and I'm thinking primarily of C++ here) that have better (baked-in) tools for creating your own (not-baked-in but added-on) high-level constructs. To contrast, C has (added-on) tools for "extending" the language (even if they have to resort to code generation), but they're a little clunky. C++ has built-in tools for extending the language within the scope of the compiler, and thus they have the ability to participate in initial compilation and optimization, and they use the same syntax (or they effectively "pseudo-extend" the syntax) as the rest of the language, so they're easy to use. O'Caml has the particular high-level constructs we're talking about baked-in so they're presumably the easiest to use here. It may or may not (I simply don't know) have the ability to add-on different kinds of constructs that are available with as much ease, but that's completely reasonable because that is not its goal.

Let me diverge to a parallel example for a moment involving regular expressions. ANSI C has no notion of regular expressions, but there have been external libraries supporting them for ages. C++98 doesn't either, but C++0x will (using today's Boost.Regex, essentially). Perl has regular expressions built right into its syntax. We would never say that any of these languages "don't have regular expressions" and mean anything useful by it. The easy-of-use continuum tends to follow the baked-in continuum (although C++0x regexs are only marginally "harder" to use than Perl's, and then only really if you are counting characters typed; there's no extra intellectual load on the programmer). Is there a need for the C regular expressions to be slower than C++ or Perl because of this? Pretending for a moment a true full Perl compiler, then on the tiny scale of function inlining and call overhead yes, but on the real bottlenecks of actually running the regular expressions, no. In fact, there's nothing (except for some modern system security measures perhaps) preventing the C or C++ (or Perl) regular expression compiler from generating code at runtime if that could be shown to be an advantage.

So, getting back to O'Caml and my statement, when I said "the right optimizer", I was referring to the compiler's optimizer. But I also mean the right tools, the right abstractions, the right libraries, the right exposure to the toolchain. My point is that there is not "straight ANSI C" and "straight out of the box O'Caml". There is:

- ANSI C
- C with library X
- C with library X and Y
- C with library X and Y and code generator Z
- ...etc...
- C++98
- C++ with library A (maybe Boost.uBLAS)
- C++ with library A and B
- ...etc...
- O'Caml
- O'Caml with library M
- ...etc...

So, by my reasoning, comparing implementations in two languages without the appropriate tools and expertise applied to both is inaccurate and will lead to poor results that don't really tell us much about real programs.

I advocate choosing the right tool for the job, and I would probably never chose C for numerical work. But I also advocate knowledgeable decision making, and if the case for O'Caml is strong and the language is a good one and it is powerful, then it should stand on its own merits without making false claims against other languages. Regardless of what you think of my multi-level tools-oriented perspective, my main point still stands about the claims in the original article, to which you have offered no rebuttal. I'm not claiming C (or C++) is faster, nor am I claiming it is easier to use than O'Caml. I am claiming that it is possible to use other tools to achieve the same results, and that it is possible to encapsulate some of the effort required to do so and form other add-on tools from it (and some of those tools exist now and must be used to perform comparisons). I'm claiming that the article is not properly comparing tools and the conclusions it leaps to are false.

Specifically, how can the C and C++ tests possibly have significantly different timings unless the tools (classes and algorithms) chosen for C++ were completely inappropriate? They are effectively the same languages, except one has more options. Getting slower results with more options available merely proves that the wrong options were chosen. Getting slower results with a single example does not prove that C++ is slower. That's basic scientific method.

Solve The Real Problem

Friday, November 03, 2006

A Little Deconstructive Criticism

2 Comments:

About Me

Previous Posts