After months of waiting, I welcome you to the final part of my humble introductory mini-series about evolutionary computation. And today's topic is Genetic Programming
Discovering functions
Recall the example of finding the minimum of the function from the first part of this mini-series? We had a function we didn't know anything about (actually, we did, but that was just for the sake of showing what we can do when we know) and we wanted to find a minimum of that function. Now, let's look at a different problem: we don't have the function at all, only a handful of \(x, y\) pairs, and we want to find out what the function actually looks like. How to do it?
If we restrict ourselves to a specific set of functions, e.g. linear functions, we can use specialized methods like (multiple) linear regression that will just throw the function at us given the data. Period. However, this has the problem of restricting to only a specific set of functions. If the "secret" function was actually a linear function, it would be the best we could do but if it wasn't, we would get errorneous function.
We could take a different approach and use more complicated machine learning methods like Artificial neural networks (ANN) that have the universal approximation property (simplified: given enough neurons they are, in principle, capable of representing almost any function with arbitrarily low error). However, their problem is that they can be immensely complex and if one wanted to use it as a mathematical expression, it would be mess.
As you might already have guessed, we are going to use.. Evolutionary algoritm! Well, we know how to evolve binary strings and real numbers... how do we use it to find a function? Well, it is very complicated. You could design some algorithm that would translate binary string to a function but it would probably be very messy and not efficient, not speaking about the fixed length of the binary strings which would be a major problem. However, there is a representation that can be used... trees!
Tree representation
A mathematical expression can be described like a tree. Let's take some made-up expression like $$ (x + \sin{y})^2 - \cos{3x} $$ Where is the tree in there? It's easy - the operators (i.e. addition, mutiplication...) and functions (i.e. sin, cos...) are inner nodes and the variables (i.e. x, y...) and constants (i.e. numbers) are leaves of the tree. The corresponding tree for the expression above looks like this:
Evolution of trees
So now that we have a representation for math expressions, how do we actually evolve it? Well, pretty much in the same way as in classical GA - by selection, recombination and mutation. Selection is identical - the better solutions (in this case those that have lower error) are preferred more than the worse ones. Recombination, i.e. crossover, and muation are the more interesting ones.
Crossover
There are more types of crossover but I'm going to speak about the most basic type - the subtree crossover. This crossover works in the following way:
- Pick a random node \( N_1 \) in the first parent
- Pick a random node \( N_2 \) in the second parent
- Exchange the subtrees rooting in \( N_1 \) and \( N_2 \) between the two parents
Mutation
As there is subtree crossover, there is subtree mutation too. Again, it is based around mangling subtrees. However, contrary to the crossover, in mutation we don't exchange subtrees between multiple solutions but we modify only one. The most basic subtree mutation looks like this:
- Randomly pick a node
- Throw away that node (and its subtree) and replace it with a new randomly generated subtree
Demo
This is a little demo of GP in action. The task is to find the function based on a bunch of samples (the red points; the sought function is the dashed curve). The actual function that we want the GP to find is $$ f(x) = 5 \sin{x^2} $$ ATTENTION: you might find that the algorithm "runs away" with bigger and bigger and messier and messier functions. I suggest you stop the algorithm and start again. This algorithm is really not tuned very well, it's just a demo.
Not just functions
We can evolve functions. But one could say “Fine, but where's the programming in that?” and she'll be totally right. So far we can make mathematical expressions. But the tree representation goes beyond that. All you need to do is to represent a program like a tree. And it is quite simple. Let's imagine that we control an ant in a 2D grid world. The ant can turn left, turn right and decide what to do based on whether there is a piece of food just ahead of it. And here is what a program in this domain might look like: As you can see, the only difference is that the branching point (or non-terminal) is not an operator but a decision-making point and the leaves are not variables and constants but actions. In fact, there could be many more non-terminals, e.g. cycles, function calls etc.
Closure property
All the Genetic Programming I introduced above is called Koza-style GP because it is the idea of John R. Koza which he published in his book [1]. For this type of GP to work a so called closure property must hold. Closure property holds if an output of every terminal and non-terminal is acceptable by every non-terminal as an input. That menas that whatever our nodes output, it must be possible to chain it with an arbitrary node.
If our non-terminal set is made of plus, minus and multiplication and our terminal set of variables and numbers, closure property holds - however we add, subtract or multiply numbers, we always get a number that can be furter added to, subtracted from or multiplied. However, if we extended our non-terminal set by a non-terminal such as if that would take three children - a condition, a result when condition is true and a result when condition is false - then the closure property does not hold - the condition requires a boolean value but that is different from numbers. We would have to employ some rule that would define true and false in terms of numbers.
In fact, it would be enough to just add division to the non-terminal set. Yes, it would still produce only numbers but what if the second argument to a division resulted to zero? We can't divide by zero so division cannot accept zero as its second argument. You see, it is sufficient that a non-terminal doesn't accept some of the values even though they are of the same type.
Summary
In this long-awaited (I hope :)) post I described a little bit of what is known as Genetic Programming. It is basically a way of evolving mathematical expressions, computer programs or basically any tree-like structure using the principles of genetic algorithms. It might (or might not) seem as a brilliant technique able to solve anything... but don't let yourself be fooled. Finding functions and programs is really hard and most of the time this vanilla GP doesn't work very well. But it's a start and it can solve at least some easy problems.
Regarding further posts I'm going to publish here, I don't have any clear plan. I suppose I might dive into specific algorithms or things I do or something totally different. We'll see :).
^[1] Koza, John R. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press, 1992.