Solve The Real Problem

Discussions about professional software development with a focus on building real, solid, performing, reliable, monitorable large-scale online services in asynchronous C++. You know, good solid breakfast food.

Sunday, December 17, 2006

Responsibility and Optimization

I was reviewing an external large-scale system architecture the other day, and suffice it to say the documents made a point of avoiding referential integrity (i.e. "no foreign keys"), stored procedures, and so on, in their databases, all in the name of scale and performance. Now, from what my database-expert friends tell me, sometimes you can drop certain key constraints in production once you're confident you don't have many (any?) bugs in that area, and you can squeeze some more performance out that way, but that was not looking like the case here.

In fact, it sounds like the same misguided approach to scale I've seen even on occasion in large-scale organizations like the mothership. To me, being the C++ programmer I am, it's analogous to saying you're going to use C-strings instead of std::string because there's less overhead. And then you miss the point entirely of high-level optimizations. And, while clearly you need fast basics, we all know that high-level optimizations are where most of the win is. Lose one high-level op and you save billions of operations. Love one memcpy and you save hundreds, or even thousands?

Then take the fact that higher-level constructs (created by the imposition of the same constraints that cause things to be "slower") give you a better optimization framework. Sticking with the string example, just think about how much time and memory you can save if you take advantage of std::string's (usual) copy-on-write semantic. Try doing that while still letting every libc string function every assembly-level C string "optimization" be legal and safe. You can't because you end up with the essentially same constraints imposed on you as are on the higher-level std::string construct . . . for the same reasons. You need rules so that you know what can't happen and thus you can make better assumptions about what can and thus better optimizations.

Constraints and interface limitations yield implementation freedom within that black box which equals optimization freedom. Without clean interfaces and clear lines of responsibility, you can't have a practically optimizable system because you don't get to change your mind when your requirements change or you learn new things: you're stuck with your initial optimization "guess" 'cause it's baked in to your design by way of your lack of good interfaces. And then it just boils down to premature optimization, and we all know what that means.

On the flip side, constraints and interface limitations split the system into human-manageable pieces, which is incidentally one of the original goals of C++ (see A History of C++: 1979 - 1991). This simplifies the system from the designers' perspectives and often informs us of (very-)high-level optimizations that can be made through this clarity of understanding. Plus, we then take a system with clean interfaces and divide the work up amongst domain experts who further analyze and optimize within a clear area of responsibility. This saves us runtime effort and improves the efficiency of the production pipeline by parallelizing the development effort. Of course, one of the keys to doing this right is to avoid producing six thousand pages of useless documentation in the process, but to instead have tools that let you write your interfaces in human-readable form first, but can then still serve as the canonical form for actually implementing the interface. After all, if you must live without all but one kind of documentation, it is interface documentation that you keep. But try to live better than that!

For me, it is always structure and organization first towards a correct and maintainable system, and then optimization within that space. You break the rules only when you prove to yourself that you must.