After the first post on the general importance of programming in the earth sciences today I will start with this post to go into more detail of techniques. In general there are two simple paradigms, which help me through the most problems in programming and work in general: The first is KISS – Keep It Simple, Stupid (one of the many common extensions of this abbreviation). First roots of this principle can be found in design, but it can be applied within many different sorts of applications. To keep things simple helps a lot within programming, since at one point everybody has to revisit their own code after some sort of break. A quick understanding of it is essential to enhance its functionality, reuse debugged snippets or finding errors. Using simple solutions also helps to interact with co-workers and others. Especially when scientists have to expect to interact with other disciplines, simple ways are often the only way forward.
The second principle is IPO – Input, Processing, Output. Every programming code can be divided into these three steps and being aware of it during the code writing helps considerably to fulfil the KISS principle. A basic aim is to separate these three steps from each other within the code and within the documentation. By this it can be possible to change any of these steps without changing the other bits.
A basic application of both principles is modularisation. It bases on the division of a large problem into many different smaller ones. Each of these modules should have specific roles, which ideally is included to the documentation of the code. Furthermore, it should have an input, a processing and an output part and all should be clear before the actual coding takes place. Reason for the latter is the interface problem. Each module interacts somehow with other modules and every module should be changeable without changing others. To achieve this, the interfaces between modules have to be defined before the modules are written themselves. So it is possible to think at first on the general picture, then how the problem can be divided, afterwards how these pieces interact and at the end how these interactions get in and out of these pieces. Only then the actual solving of the problem starts within these pieces.
Such a way of software design is called “top-down- approach” and helps a lot when you know what you want. Up to this point you usually work “bottom-up”, test things out and learn to understand the nature of the problem. The critical question is when you are able to change the viewpoint.
Modules are intrinsically supported in most programming languages. Either they are named as such or they can be easily designed with functions or procedures. Of cause there are programming paradigms, who put the things to the extremes and make the most of modularisation (i. e. object orientation). In my general work I stay short of these last steps and work on a level which can be described as functional programming. There are many reasons for this, for example the problematic implementation in Fortran and other programming languages. But also that the efforts rises enormously, when OO is applied to simple problems. Nevertheless, I think it is for scientists, who have programming as a large part of their work, should themselves make familiar to the general principles of OO. Even when they are not fully applied in the daily work, they help to organise it.
Advantages of modularisation where already named in the explanation of the principles section, but are really observable in larger projects. When the work is distributed among several participants it is helpful that they just have to agree on the interfaces and the actual problem solving is an individual task. Other advantages can be found in a simple documentation and a simpler way to debugging. On such topics I will certainly come back in the future, so I keep them short in this post.
To show an example of a typical modularisation I like to show a figure from my diploma thesis. The problem therein was to create an effective two-dimensional energy balance climate model. It was designed to make simple clime calculations on larger time scales and the module plan can be seen below.
The main component is a loop, which starts after the initialisation and input procedures and is controlled by a moderator function. This function makes the calls of the other modules within the loop and decides when the analysis is finished. After each run information is written out and also at the end of the analysis the results are collected and written to the storage. One module at the top left named core is a typical variable container. What is behind this will be explained in a later post. All in all this scheme worked rather well and allowed by reorganisation (or simple rewriting the moderator function) to apply this module to different sets of problems.
Like I already have written at the top of this post modularisation is also helpful in other parts of the daily work. Small problems are easier to solve than large bits and a structurised work plan helps to get more efficient results. Some of the techniques I use for this I will discuss here in the future and it will be obvious that programming is more within my work as just to get the work done.