Solve The Real Problem

Discussions about professional software development with a focus on building real, solid, performing, reliable, monitorable large-scale online services in asynchronous C++. You know, good solid breakfast food.

Friday, November 03, 2006

A Little Deconstructive Criticism

So Tim sends me email yesterday:

Ok, I'm not trolling here, I really want to learn something.

Came across http://scienceblogs.com/goodmath/2006/11/the_c_is_efficient_language_fa.php which states among other things that C/C++ are no good for numerical computing.

The quote that got me wondering was "In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++".

He quotes an experiment he did where his OCaml implementation of some algorithm beat his C implementation and his C++ implementation of the same language. (Aside, the Java and Python implementations were crushed by these first three).

Thoughts?

Clearly that's gibberish because C is basically a macro assembler and can do anything. This reminds me of when Keith Lea that said Java was faster than C++ . . . because he wrote all of his C++ naively with the most obvious approach instead of the most efficient. He was open to the having an email discussion, however, but IIRC he seemed determined that I was unfairly making the C++ faster or equal because I was a good programmer.

In this most recent example, Mark makes a similar error in his analysis. His sample code loop exploits limitations of the particular optimizer he was using and not the language. It sounds like this OCaml is just an array and matrix aware language that has high-level constructs for such. And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing.

Nobody in their right mind writes C++ code with raw arrays as primitives. And this leads to The Eternal Flaw in language comparisons as I always see them. People are always saying, "Language X is better than Language Y because of [insert library feature here]." Just because a language's standard distribution does not come with a library piece you need to do your thing well does not mean that language is no good. Get the right tools for the job, people. Some specialized domains can benefit from the expressive power of a specially-tuned language. Some languages (like C++) have the ability to be effectively extended by letting libraries overload their native notational forms and thus offer what looks like a specialized language but actually isn't (Boost.Spirit being the most extreme example I know of).

As an aside, most of the problems I have with Java are to do with the fact that it is not as extensible, and, more importantly, tends to force a paradigm (classic OO, typically) instead of offer tools. To me, Java is a framework and C++ is a toolbox.

I was disappointed to see that the poster of the comment Tim refered to lists himself as a Computer Scientist, but doesn't seem to have applied the scientific method. It certainly looks like he's saying "I did one experiment on one case and it proves C sucks." I would say the weight of his argument is outweighed by the strength of his evidence (i.e. it has none).

3 Comments:

  • At 9:45 AM, Anonymous Anonymous said…

    "And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing."

    This is wrong and for the following three reasons:
    - O'Caml is implemented in O'Caml not in C.
    - Even if O'Caml would be implemented in C then it still is possible to generate better code than C. Because it can exploit certain features of the language to optimize the compiled code (static type checking to name one).
    - By your reasoning: Since language X and C are turing complete you can implement any C compiler in X. So clearly X, given the right optimizer (in this case C), performs on an equal footing... The point I'm trying to make: it is irrelevant in which language a compiler is written when discussing the code it generates.

     
  • At 1:34 PM, Blogger Brad Spencer said…

    [Note: The comment above is by a Blogger user named "Joop", but that information seems to have been lost in the Blogger migration to "beta".]

    About "And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing."

    You can write the exact same algorithm that takes the same number of instructions in C (or C++, or any decent language) as the running O'Caml program. It's a ridiculous pain and inconvenience and is almost certainly impractical for regular use, but it can be done. "So what?", you might say. That's perhaps the whole point: Why would anyone write in C? It's so much more painful to get the same result if you even can. Of course, but let me explain the perspective I have from my background of iterative design and implementation.

    Let's imagine you've written the same series of instructions as the O'Caml program in C. Maybe it took you thirty years, but you finally did it and it works. Now, let's imagine that we look at what we did and we identify some patterns in it in terms of algorithms and data structures. So we decide to package those up into a set of tools that we can use to solve the next problem in less time with a higher-level view.

    Say we keep iterating and keep adding high-level views that are easier and more direct to use. If we go far enough, we'll probably end up with O'Caml or something similar to it (if the goals are the same). So what we have is a continuum of models of abstraction. We've got a hard-to-write, impractical C mess one end and a high-level clean O'Caml program on the other end.

    Now, there are other languages (and I'm thinking primarily of C++ here) that have better (baked-in) tools for creating your own (not-baked-in but added-on) high-level constructs. To contrast, C has (added-on) tools for "extending" the language (even if they have to resort to code generation), but they're a little clunky. C++ has built-in tools for extending the language within the scope of the compiler, and thus they have the ability to participate in initial compilation and optimization, and they use the same syntax (or they effectively "pseudo-extend" the syntax) as the rest of the language, so they're easy to use. O'Caml has the particular high-level constructs we're talking about baked-in so they're presumably the easiest to use here. It may or may not (I simply don't know) have the ability to add-on different kinds of constructs that are available with as much ease, but that's completely reasonable because that is not its goal.

    Let me diverge to a parallel example for a moment involving regular expressions. ANSI C has no notion of regular expressions, but there have been external libraries supporting them for ages. C++98 doesn't either, but C++0x will (using today's Boost.Regex, essentially). Perl has regular expressions built right into its syntax. We would never say that any of these languages "don't have regular expressions" and mean anything useful by it. The easy-of-use continuum tends to follow the baked-in continuum (although C++0x regexs are only marginally "harder" to use than Perl's, and then only really if you are counting characters typed; there's no extra intellectual load on the programmer). Is there a need for the C regular expressions to be slower than C++ or Perl because of this? Pretending for a moment a true full Perl compiler, then on the tiny scale of function inlining and call overhead yes, but on the real bottlenecks of actually running the regular expressions, no. In fact, there's nothing (except for some modern system security measures perhaps) preventing the C or C++ (or Perl) regular expression compiler from generating code at runtime if that could be shown to be an advantage.

    So, getting back to O'Caml and my statement, when I said "the right optimizer", I was referring to the compiler's optimizer. But I also mean the right tools, the right abstractions, the right libraries, the right exposure to the toolchain. My point is that there is not "straight ANSI C" and "straight out of the box O'Caml". There is:

    - ANSI C
    - C with library X
    - C with library X and Y
    - C with library X and Y and code generator Z
    - ...etc...
    - C++98
    - C++ with library A (maybe Boost.uBLAS)
    - C++ with library A and B
    - ...etc...
    - O'Caml
    - O'Caml with library M
    - ...etc...

    So, by my reasoning, comparing implementations in two languages without the appropriate tools and expertise applied to both is inaccurate and will lead to poor results that don't really tell us much about real programs.

    I advocate choosing the right tool for the job, and I would probably never chose C for numerical work. But I also advocate knowledgeable decision making, and if the case for O'Caml is strong and the language is a good one and it is powerful, then it should stand on its own merits without making false claims against other languages. Regardless of what you think of my multi-level tools-oriented perspective, my main point still stands about the claims in the original article, to which you have offered no rebuttal. I'm not claiming C (or C++) is faster, nor am I claiming it is easier to use than O'Caml. I am claiming that it is possible to use other tools to achieve the same results, and that it is possible to encapsulate some of the effort required to do so and form other add-on tools from it (and some of those tools exist now and must be used to perform comparisons). I'm claiming that the article is not properly comparing tools and the conclusions it leaps to are false.

    Specifically, how can the C and C++ tests possibly have significantly different timings unless the tools (classes and algorithms) chosen for C++ were completely inappropriate? They are effectively the same languages, except one has more options. Getting slower results with more options available merely proves that the wrong options were chosen. Getting slower results with a single example does not prove that C++ is slower. That's basic scientific method.

     
  • At 8:14 AM, Blogger Kerberos said…

    [Note: I'm Joop but apparently I had another account and couldn't figure out my other account name/pass.]

    "And obviously, OCaml is implemented in C. So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing."

    My interpretation of the quoted part is as follows:
    "It is possible to write in C (somehow!) any algorithm with equal performance than O'Caml."

    However, I pointed out that this may not be the case because the compiler could exploit certain features of a language to generate better code. So no matter how smart the C compilers are it could simply be intractable to generate as efficient code as O'Caml. Because the O'Caml compiler can extract more information easily from the source code than the C compiler.

    "You can write the exact same algorithm that takes the same number of instructions in C (or C++, or any decent language) as the running O'Caml program."

    Wat exactly do you mean with instructions?

    You ARE capable of writing the exact same algorithm in ANY turing complete language, which inludes C and O'Caml. However, this is not something I was arguing about. So the algorithm can be the same but the generated code can be completely different. That was what I was arguing about.

    "It's a ridiculous pain and inconvenience and is almost certainly impractical for regular use, but it can be done. "So what?", you might say. [...]"

    The C compiler may still not be able the generate as optimal as O'Caml (I assume you don't resort to inline ASM, since this is not C code).

    "Say we keep iterating and keep adding high-level views that are easier and more direct to use. If we go far enough, we'll probably end up with O'Caml or something similar to it (if the goals are the same). So what we have is a continuum of models of abstraction. We've got a hard-to-write, impractical C mess one end and a high-level clean O'Caml program on the other end."

    Even if you end up with a language like O'Caml using high level tools then it STILL does NOT imply the C compiler will be capable of generating code as fast (or better) code than O'Caml for some algorithms. The C compiler may still not be capable of extracting enough information because it only sees C code. It is _very_ unlikely that it can use these 'high-level views' to optimise the generated code UNLIKE the O'Caml compiler. Does this mean that it is completely impossible that for some algorithms to generate as fast (or better) code than the O'Caml compiler using a C compiler? No, but generating as fast or better code may be an intractable problem for the C compiler (for some algorithms).


    "Let me diverge to a parallel example for a moment involving regular expressions. ANSI C has no notion of regular expressions, but there have been external libraries supporting them for ages. C++98 doesn't either, but C++0x will (using today's Boost.Regex, essentially). Perl has regular expressions built right into its syntax. We would never say that any of these languages "don't have regular expressions" and mean anything useful by it. The easy-of-use continuum tends to follow the baked-in continuum (although C++0x regexs are only marginally "harder" to use than Perl's, and then only really if you are counting characters typed; there's no extra intellectual load on the programmer). Is there a need for the C regular expressions to be slower than C++ or Perl because of this? Pretending for a moment a true full Perl compiler, then on the tiny scale of function inlining and call overhead yes, but on the real bottlenecks of actually running the regular expressions, no. In fact, there's nothing (except for some modern system security measures perhaps) preventing the C or C++ (or Perl) regular expression compiler from generating code at runtime if that could be shown to be an advantage."

    IIRC Perl's regular expressions are implemented in C. Besides I'm talking about the more general case here not a specific example.

    "So, getting back to O'Caml and my statement, when I said "the right optimizer", I was referring to the compiler's optimizer. But I also mean the right tools, the right abstractions, the right libraries, the right exposure to the toolchain. My point is that there is not "straight ANSI C" and "straight out of the box O'Caml"."

    The thing is that these libraries may be implemented in another language. :)

    "So, by my reasoning, comparing implementations in two languages without the appropriate tools and expertise applied to both is inaccurate and will lead to poor results that don't really tell us much about real programs."

    Agreed.

    "Regardless of what you think of my multi-level tools-oriented perspective, my main point still stands about the claims in the original article, to which you have offered no rebuttal."

    I agree with that you can't draw conclusions based on one example when it comes to comparing compilers. So, why should I offer a rebuttal to that?

    "I'm not claiming C (or C++) is faster, nor am I claiming it is easier to use than O'Caml."
    ...
    "So clearly C, given the right optimizer (in this case OCaml), performs on an equal footing."

    You claimed that C can offer the same given the right optimizer. Implementing a compiler for language X in C is not about using 'the right optimizer' anymore. By doing this you effectively do not rely on C at all! Because the source code of your new compiler is language X and not C (assuming that you compile to ASM/machine code like O'Caml). As I said it is irrelevant in which language the compiler is implemented. Because all turing complete languages are equal in terms of what they can do, but not in terms of speed. There may be constant or even polynomial difference IIRC.

    "I am claiming that it is possible to use other tools to achieve the same results, and that it is possible to encapsulate some of the effort required to do so and form other add-on tools from it (and some of those tools exist now and must be used to perform comparisons)."

    The problem is that these tools can be implemented in another language and compiled with another compiler. So be carefull with the tools you pick if you want to compare compilers and languages.

    "I'm claiming that the article is not properly comparing tools and the conclusions it leaps to are false."

    I did not argue with this.

    "Getting slower results with a single example does not prove that C++ is slower. That's basic scientific method."

    Agreed, I wasn't arguing this either.

     

Post a Comment

<< Home