[This is a two parts post, as I realised while writing it that it was growing without control.]

As a roboticist I spend most of my time dealing with robots. I program them, debug them, watch them perform some tasks. I might be surprised and thrilled by things people could judge as boring, or I might not get excited at all by robotic performances that others will claim as giant leaps towards a true AI. And, as I am a scientist too, I have a defect: sooner or later I will want to quantify my data. This includes the wow factor of a robot.

Imagine you are observing a robot performing a dance. At the beginning you will be caught by curiosity and you will not get your eyes off the robot. But after some time, you will notice that the robot is following a pattern. The dance movements will always be the same, no matter how much naturally random the programmer has tried to let them appear. After some time you have spent watching it, a robot will be no more interesting than a washing machine. Can we quantify this effect?

Computational theory comes to our help. In the sixties a quite clever Russian guy published a few papers about a theory which is as simple and elegant as it is powerful. After the name of the author, it is called Kolmogorov Complexity. Without digging into details, according to this theory the description length of a string in a given programming language is the complexity of that string. If the description is longer than the string itself, we’d better give up and use the string as its description.

For example, a huge pile of dishes might be a complex task for one not lucky enough to have a dishwasher (the best robot ever!), but from a computer point of view it is simply described by "2^326 dishes stacked in an uncomfortable equilibrium". On the other side the outcome of the matches of the World Cup is hardly described by an algorithm, and the list of these results is its best description. And this might explain why all of the Bayesian models and accurate simulations failed to predict the outcome of the World Cup, in spite of the efforts of the brightest minds in the world.

Obviously there are bad news: the Kolmogorov Complexity is not computable. That is, there will never be a program that receives as input an arbitrary string and gives as an output the complexity of that string. Before you leave the room shaking your head in despair, you might want to know that there is a trick: the Kolmogorov Complexity is strongly related to the entropy, which in turn is related to the compressibility of a string. I can see everybody rushing back to their seats with a light bulb shining over their heads: the Lempel-Ziv algorithm is the Swiss army knife of complexity.

To cut a long story short, the following magic lines in Python will produce a reasonable approximation to the Kolmogorov Complexity of a string s:

```
import zlib
def kolmogorov(s):
l = float(len(s))
compr = zlib.compress(s)
c = float(len(compr))
return c/l
```

This is called complexity per symbol, and it is between 0 and 1, where 1 means that the string is incompressible, or very complex. Being an approximation means that it works only for very big strings. Let’s see some examples.

The complexity of a string of several random numbers is high:

```
arr = rand(1000)
kolmogorov(arr.tostring())
Out: 0.94303749999999997
```

While it comes with no surprise that the complexity of a string of all zeros is low:

```
arr = zeros(1000)
kolmogorov(arr.tostring())
Out: 0.00125
```

An interesting result comes when we mix the two strings:

```
arr = hstack((rand(1000),zeros(1000)))
kolmogorov(arr.tostring())
Out: 0.47612500000000002
```

What about Gaussian numbers? If we use a high variance the result is not different from uniform numbers:

```
arr = randn(1000)
kolmogorov(arr.tostring())
Out: 0.96350000000000002
```

But we can see the complexity becoming lower when we shrink the Gaussian width:

```
arr = 0.5 + randn(1000)*0.0001
kolmogorov(arr.tostring())
Out: 0.80937499999999996
```

These are still random numbers, but the much smaller range makes the string of them far less complex. Below is a plot of the complexity as a function of the standard deviation. It can be seen that the smaller the width of the Gaussian, the lower the complexity of the string.

In the next post I will describe how to use the Kolmogorov Complexity to measure the wow effect of a robot behaviour.

=-=-=-=-=*Powered by **Blogilo*

Chris

July 10, 2010 at 5:57 pm

Nice explanation.

I am looking forward to the next installment. When “measuring” the complexity of a robot behaviour, I presume that you have to stack for example the (x,y) pose into a string and then carry out the above? So repetitiveness is the same as complexity in this case? To what extent does it matter what compression algorithm you use in the approximation?

Lorenzo Riano

July 10, 2010 at 8:49 pm

You can calculate the complexity of several aspects of a robot behaviour. (x,y) is a possibility, but another is to calculate the complexity of the string formed by all the inputs the robot receives and the outputs it produces. This holds for a reactive behaviour only, as you could see a robot staying still but carrying on several “mental” processes. So it depends on what you want to calculate the complexity of.

Regarding the second question, it depend on the compression algorithm you use. But according to Kolmogorov theory, all the complexities you could calculate with all the machines are equivalent up to a constant. So, the complexity of a single string is not really meaningful, but comparing the complexities of several strings is. For example you can shift the plot above right or left, but the shape of the plot matters.

Lance James

April 13, 2012 at 12:18 am

I checked out your kolmogorov’s complexity – the only issue depending, if one uses for a measurement of a short string, zlib has header that will add to the overhead and it will turn out inaccurate. For use I believe you’re fine, but just a warning to not just cut and paste this code for general measurement of all strings.

Lorenzo Riano

April 13, 2012 at 9:07 am

You’re right! Kolmogorov’s complexity is an asymptotic measure anyway, so you really need long strings to use it.