After a long waiting, it's here! To seamlessly continue from the previous part, I prepared a little demo of a genetic algorithm. What we have here is a so-called Facility location problem, or a variation of it.
There is a bunch of cities (50 in our case) and we want to supply the cities with goods so we must open some factories. We must decide how many factories we are going to open and in which cities. We won't explicitly assign which cities are to be supplied by which factories but a city will be assigned to the nearest factory (imagine the factories have unlmitted capacity). The catch: we want to minimize the total cost:
- every opened factory costs us some fixed amount of money
- delivering goods from a factory to a city costs us an amount of money equal to the (euclidean) distance from the factory to the city
So how do we represent a solution to our problem? Well, we'll use a binary string, as in proper GA, with the number of bits equal to the number of cities. Then \(1\) in position \(n\) means that the \(n\)-th city has a factory and vice versa, e.g. a string like 001000110
would mean that the third, seventh and eighth city would have a factory while the first, second, fourth, fifth, sixth and ninth wouldn't. Now, give it a try!
Real-valued Evolution
So far we have spoken about a genetic algorithm that uses binary strings. This means that our solutions are inherently discrete. But what if we wanted to really find the minimum of the function from the part one?
The „standard“ way
The first way of dealing with real number is to just express them using a binary standard, e.g. the IEEE 754 single-precision floating-point format. But this has problems. Imagine numbers 1.763 and -2.611. These would look like this (e.g. using this tool):
1.763 = 00111111111000011010100111111100
-2.611 = 11000000001001110001101010100000
00111
111111000011010100111111100
11000
000001001110001101010100000
turns into
00111
000001001110001101010100000
= 3.9840699e-05
11000
111111000011010100111111100
= -115539.96875
-3.252161001305617e+19
, 3.591135600494495e-39
and NaN
. Bit-flip mutation is going to have similar problems.
The good way
Much better way than sticking to the binary representation is just use the real numbers. Then, if we have a one-dimensional problem (like in searching for the minimum of that function), the genotype is just one real number. This brings a lot of changes in the crossover and mutation operators.
As we have seen, just crossing over the binary representation is bad. Therefore we need to find a better way of combining two individuals. We would like the offspring(s) to be „close“ to the parents, but be different from them. Since we are in the real numbers, we can use the power of probability and random sampling! One of the simplest real-valued crossover is a so-called arithmetic crossover, which is basically a weighted mean of two (or more!) parents with randomly chosen weights. Or more formally:
The mutation is easier. One of the most used way is to just add a randomly generated number (or vector if our genotype is more numbers) from a distribution of our choice. The most used distributions are the uniform distribution (where we need to specify the bounds) and the normal distribution (where we need to specify the variance).
Next...
The final part of this introductory mini-series, which will finally get us to what I'm actually doing, is going to be big. It is going to be about Genetic Programming. So stay tuned!
No comments:
Post a Comment