As a scientist in earth science, who is working more on the theoretical side, the daily work consists in large parts of programming. Nevertheless, even with the importance programming has nowadays in this field, I hear again and again from people that they had not got a systematical education on this during their studies. Of cause, I agree, learning by doing plays a very important part to become a good programmer, but without further insights into the background of programming it can be quite hard to generate the benefits of a well planed structured programm.
I thought long on whether I should write in this blog also on programming, since it is usually not really seen as part of science. Modelling yes, but the decisive steps in modelling are apart from the ordinary programming sections. So I decided to write some posts on how I personally design my programms. I would not say that I am a really good programmer, since I have neither a computer science degree or visited software design courses. Yes indeed, good programming consists of more than just hacking in some source code into the computer. I learned the most stuff on this topic quite a while a ago by try and error. Later I got lucky in school to have a nice computer science course as my main subject and so I learnt at least some of the basics of theoretical informatics (but I am honest, there would have been so much more to learn in a proper university course). Afterwards, I flipped through several languages and had several larger projects which I designed from scratch. My current main languages are Fortran 95 for the computer intensive jobs and R for plotting and as a general tool for the daily work.
So what makes programming in the scientific field special? I just can tell here my perspective, but there a surely some points, which have to be considered with priority. At first it is important to distinguish the different fields within the earth science community. Modelers, who work with the more established models with several developers involved, are usually much more used to good programming practices than the ordinary data cruncher. The reason is evident: Having in mind that the code has to be used by others in the future helps to think twice, whether you comment your code properly or design it in a way to be accessible by others. Nevertheless, this is unfortuneately not necessary the case for every larger model. Therefore, many comments of mine are not really applicable for this community.
More problematic is the community of those scientists, who are not expecting to share their code with others, but just the results. The quick result counts, and quick and dirty was always a source for controversies when it comes to programming. The problems start, when at one point the scientist change their position and another one have to reuse the existing infrastructure to finish the job. Jobs in science are usually a temporary business, with a lot of competing tasks at the end of an project (code refinement, paper and report writing, job applications…) lead to unclean project finishes. With other words: thus what not has been done on the spot during the development phases will not be done at all. Nevertheless, it has to be kept in mind that with the current changing research environment it is to be expected that in the future every source code has to be published. Using the chaos principle to prevent others from understanding your code is certainly not the solution while these changes happening.
Another point, which has to be mentioned (I already introduced it before), is the fact that not everyone in earth science has a proper computer science education or had even a good programming course at all. To teach programming is a very complicate topic. Having more than 20 students in a course, which sometimes struggle to switch the computer on at the beginning (ok, that is exaggerated), it is nearly impossible to do a good job as a teacher. Programming is one of the things, which has to be taught in small groups and furthermore most of the skills will be build up by try and error. Neverthless, I personally believe it helps to understand what the computer does internally, how it handles the source code and how storage management works. Even with ever more simple IDEs in place, so the programming interfaces that help with every little niggle, does not exempt to understand what really happens. This again is something, which has not to be taught in small groups or with a computer at hand at all. It is a usual theoretical lecture like maths or physics. With ever more challenging computer systems (parallel computers, different processor types, more processing power than storage available) the teaching of basic theoretical computer science (or informatics, like I prefer to call it) to earth science students is essential in the upcomming decade.
A further element of simple scientific programming is the lack of standardisation of interfaces. Many different file types are the usual rather than the exception, and as a consequence there is no jack of all trades to handle all these problems. Many specialised solutions require many programs, different programming languages and different tools, which either result in chaos or the opportunity for errors. Due to the history of each field, these problems will not simply go away. The only way to handle is to standardise the work arounds, build in every standard language the oportunity to read as much formats as possbile in a standardised way.
Of cause there are many other points, which make programming for scientists special, but I will stop here. In the upcomming months, I will write several posts on same strategies and ways I use to programm. I had already written on the different phases of programming in a past post within the scientific working cycle, so most of the strategies are of cause used only in the later stages. Like anybody else, I like to take a quick look on new things, and only when it is decided to make more with it I go into real propper source code managing.