## Wednesday, May 11, 2011

### Formula 1 and complexity

"It was a truly fascinating race, but you needed a slide rule and a degree in astrophysics to follow it in detail." Adam Cooper, Malaysian Grand Prix analysis.

A popular slogan employed in complex systems theory is that 'complexity lies between order and randomness'. Coincidentally, this phrase might have been coined with the express purpose of describing Formula 1, 2011-style.

Consider the lapchart of a Grand Prix to be one expression of its complexity. With 24 cars on the grid, a lapchart consists of 24 inter-dependent histories. A Formula 1 lapchart is not so random that it can be described by a collection of random walks, but neither is it completely ordered, in the sense of being a simple function of the starting line-up. It is, rather, a complex data set, which lies between order and randomness.

But let's have a closer look at what the key terms here actually mean. In complex systems theory, order and randomness are generally defined in terms of Kolmogorov complexity. This defines the complexity of a data set to equal the length of the smallest computer program capable of generating that data as output.

Any data which contains some sort of regularity or pattern, can be generated by an algorithm, and the greater the regularity, the more concise the algorithm is capable of being. Ordered data sets, (such as 'Vettel-Webber-Alonso' repeated sixty times), can be generated by very simple algorithms, and therefore have the smallest Kolmogorov complexity.

It might seem strange to define complexity in terms of the size of a computer program, but the basic idea is that ordered things can be succinctly described, whereas complex things require a more lengthy description; defining this lengthiness in terms of the size of a computer program is simply a way of defining an absolute standard.

A data set which cannot be generated by a computer program smaller than the size of the data set itself, is deemed to be random. Such a data set contains no patterns or regularities. The program which generates it has to contain the data set in its explicit form, hence the size of the program cannot be smaller than the size of the data set.

Now, Ladyman, Lambert and Wiesner have argued that Kolmogorov complexity actually provides a poor definition of complexity because it assigns high complexity to random data sets. This, they argue, fails to respect the intuition that complexity should lie between order and randomness.

However, whilst Ladyman et al proceed to consider other alternative definitions of complexity, it may be that for our purposes here, a simple qualification to the definition of Kolmogorov complexity will suffice.

Let us assert that data sets come in two basic types: those which are algorithmically compressible, and those which are not. Let us define those which are incompressible to be random. For those data sets which are compressible, let us define their complexity to be measured by the length of the shortest computer program capable of generating them.

Thus, complexity as defined here is a property of compressible data sets. (Complex systems are another matter, outside the scope of this article). Ordered data sets have low complexity, random data sets have no complexity at all, and complex data sets lie between them.

In this sense we can say that a Formula 1 lapchart from 2011 is a complex data set, which lies between order and randomness, and a 2011 lapchart has greater complexity than a 2010 lapchart. In this context, the greater complexity could be measured by the greater word-count required to provide a complete description of the course of a race, and the greater length of time required to write such an account.