Friday, 2 September 2011

Modelling Is Everything

I’m often asked, “What is the best way to learn about building high-performance systems”? There are many perfectly valid answers to this question but there is one thing that stands out for me above everything else, and that is modelling. Modelling what you need to implement is the most important and effective step in the process. I’d go further and say this principle applies to any development and the rest is just typing :-)

Domain Driven Design (DDD) advocates modelling the domain and expressing this model in code as fundamental to the successful delivery and ongoing maintenance of software. I wholeheartedly agree with this. How often do we see code that is an approximation of the problem domain? Code that exhibits behaviour which approximates to what is required via inappropriate abstractions and mappings which just about cope. Those mappings between what is in the code and the real domain are only contained in the developers’ heads and this is just not good enough.

When requiring high-performance, code for parts of the system often have to model what is happening with the CPU, memory, storage sub-systems, or network sub-systems. When we have imperfect abstractions on top of these domains, performance can be very adversely affected. The goal of my “Mechanical Sympathy” blog is to peek at what is under the hood so we can improve our abstractions.

What is a Model?

A model does not need to be the result of a 3-year exercise producing UML. It can be, and often is best as, people communicating via various means including speech, drawings, illustrations, metaphors, analogies, etc, to build a mental model for shared understanding. If an accurate and distilled understanding can be reached then this model can be turned into code with great results.

Infrastructure Domain Models

If developers writing a concurrent framework do not have a good model of how a typical cache sub-system works, i.e. it uses message passing to exchange cache lines, then the framework is unlikely to perform well or be correct. If their code drives the cache sub-system with mechanical sympathy and understanding, it is less likely to have bugs and more likely to perform well.

It is much easier to predict performance from a sound model when coming from an understanding of the infrastructure for the underlying platform and its published abilities. For example, if you know how many packets per second a network sub-system can handle, and the size of its transfer unit, then it is easy to extrapolate expected bandwidth. With this model based understanding we can test our code for expectations with confidence.

I’ve fixed many performance issues whereby a framework treated a storage sub-system as stream-based when it is really a block-based model. If you update part of a file on disk, the block to be updated must be read, the changes applied, and the results written back. Now if you know the system is block based and the boundaries of the blocks, you can write whole blocks back without incurring the read, modify, write back cycle replacing these actions with a single write. This applies even when appending to a file as the last block is likely to have been partially written previously.

Business Domain Models

The same thinking should be applied to the models we construct for the business domain. If a business process is modelled accurately, then the software will not surprise its end users. When we draw up a model it is important to describe the relationships for cardinality and the characteristics by which they will be traversed. This understanding will guide the selection of data structures to those best suited for implementing the relationships. I often see people use a list for a relationship which is mostly searched by key, for this case a map could be more appropriate. Are the entities at the other end of a relationship ordered? A tree or skiplist implementation may be a better option.

Identity

Identity of entities in a model is so important. All models have to be entered in some way, and this normally starts with an entity from which to walk. That entity could be “Customer” by customer ID but could equally be “DiskBlock” by filename and offset in an infrastructure domain. The identity of each entity in the system needs to be clear so the model can be accessed efficiently. If for each interaction with a model we waste precious cycles trying to find our entity as a starting point, then other optimisations can become almost irrelevant. Make identity explicit in your model and, if necessary, index entities by their identity so you can efficiently enter the model for each interaction.

Refine as we learn

It is also important to keep refining a model as we learn. If the model grows as a series of extensions without refining and distilling, then we end up with a spaghetti mess that is very difficult to manage when trying to achieve predictable performance. Never mind how difficult it is to maintain and support. Everyday we learn new things. Reflect this in the model and keep it up to date.

Implement no more, but also no less, than what is needed!

The fastest code is code that just does what is needed and no more. Perform the instructions to complete the task and no more. Really fast code is normally not a weird mess of bit-shifting and complier tricks. It is best to start with something clean and elegant. Then measure to see if you are within performance targets. So often this will be sufficient. Sometimes performance will be a surprise. You then need to apply science to test and measure before jumping to conclusions. A profiler will often tell you where the time is being taken. Once the basic modelling mistakes and assumptions have been corrected, it usually takes just a little mechanical sympathy to reach the performance goal. Unused code is waste. Try not to create it. If you happen to create some, then remove it from your codebase as soon as you notice it.

Conclusion

When cross-functional requirements, such as performance and availability, are critical to success, I’ve found the most important thing is to get the model correct for the domain at all levels. That is, take the principles of DDD and make sure your code is an appropriate reflection of each domain. Be that the domain of business applications, or the domain of interactions with infrastructure, I’ve found modelling is everything.

5 comments:

  1. And the key to producing good models is to learn the techniques that [Western] Philosophy has developed over 2500 years that computer science leaves up to intuition. See ExistentialProgramming.com for techniques that programmers need in day to day software development that Philosophers have known for ages.

    and in particular the thread about Identity which is much more subtle and complicated than you have probably ever imagined (which is why Banks still can't tell you how many distinct customers they have!).

    ReplyDelete
  2. So programmers should model in the code both their domain and the underlying hardware. Can you suggest any readings (additional to your blog) on the latter?

    ReplyDelete
    Replies
    1. When interacting with the hardware you are in that domain. Separation of concerns should lead one to keep those interactions apart from the business concepts. If appropriately layered then things stay clean.

      I tend to read the specs for hardware that I use to a significant extent. Even better seek out those who created the hardware, and/or associated systems software, and discuss it with them. Many hardware folk write great specifications. For example, the Intel specs are dense but useful.

      I feel it is a subject that could do with more attention.

      Delete
  3. This is a great topic Martin. The Id example was a thought provoking one. I find it difficult to think how to build abstractions which prevent such details from tainting business logic. For example how do you model customer id in a distributed application where the customer data might be replicated on multiple machines that fail? What happens if the storage details change later? Would love to hear more about this topic from you.

    ReplyDelete
    Replies
    1. I normally have a repository/service that assigns identity to significant entities. These services are highly available. Keys can be as simple as a node/context scope plus a monotonic counter.

      I don't understand your question regarding storage changing?

      Delete