Monday, July 20, 2015

Evolution in a Computer, Pt. 2

After a long waiting, it's here! To seamlessly continue from the previous part, I prepared a little demo of a genetic algorithm. What we have here is a so-called Facility location problem, or a variation of it.

There is a bunch of cities (50 in our case) and we want to supply the cities with goods so we must open some factories. We must decide how many factories we are going to open and in which cities. We won't explicitly assign which cities are to be supplied by which factories but a city will be assigned to the nearest factory (imagine the factories have unlmitted capacity). The catch: we want to minimize the total cost:

every opened factory costs us some fixed amount of money
delivering goods from a factory to a city costs us an amount of money equal to the (euclidean) distance from the factory to the city

Clearly, we cannot try every combination of cities with a factory as it's $2^{50} \approx 1.13 \cdot 10^{15}$ combinations. That would took almost 36 years if checking one solution took one microsecond. Yes, this problem is NP-hard. Instead, we are going to use a... Genetic Algorithm!

So how do we represent a solution to our problem? Well, we'll use a binary string, as in proper GA, with the number of bits equal to the number of cities. Then $1$ in position $n$ means that the $n$-th city has a factory and vice versa, e.g. a string like 001000110 would mean that the third, seventh and eighth city would have a factory while the first, second, fourth, fifth, sixth and ninth wouldn't. Now, give it a try!

Clicking Start starts the genetic algorithm from beginning with the current map. Clicking New map generates a new (random) map. Clicking Stop during the optimization stops it. You can edit the cost of opening a facility, size of population (must be positive even number), probability of crossover and mutation (must be between 0 and 1) and size of tournament (must be between 2 and size of population). The evolution runs indefinitely (i.e. until you stop it). The world is 100 by 50 units. This thing is written in Dart and you can find the source here.

Real-valued Evolution

So far we have spoken about a genetic algorithm that uses binary strings. This means that our solutions are inherently discrete. But what if we wanted to really find the minimum of the function from the part one?

The „standard“ way

The first way of dealing with real number is to just express them using a binary standard, e.g. the IEEE 754 single-precision floating-point format. But this has problems. Imagine numbers 1.763 and -2.611. These would look like this (e.g. using this tool):

1.763 = 00111111111000011010100111111100
-2.611 = 11000000001001110001101010100000

And now imagine what would happen when we crossed them over using single-point crossover after the 5th bit:

00111111111000011010100111111100
11000000001001110001101010100000
turns into
00111000001001110001101010100000 = 3.9840699e-05
11000111111000011010100111111100 = -115539.96875

One could say "Well, that escalated quickly!" and indeed, these problems arise all the time. If we explore all the possible crossovers (there are 32 of them resulting in 64 numbers) they include gems like -3.252161001305617e+19, 3.591135600494495e-39 and NaN. Bit-flip mutation is going to have similar problems.

The good way

Much better way than sticking to the binary representation is just use the real numbers. Then, if we have a one-dimensional problem (like in searching for the minimum of that function), the genotype is just one real number. This brings a lot of changes in the crossover and mutation operators.

As we have seen, just crossing over the binary representation is bad. Therefore we need to find a better way of combining two individuals. We would like the offspring(s) to be „close“ to the parents, but be different from them. Since we are in the real numbers, we can use the power of probability and random sampling! One of the simplest real-valued crossover is a so-called arithmetic crossover, which is basically a weighted mean of two (or more!) parents with randomly chosen weights. Or more formally:

$$ \mathbf{x}^{Offs} = \sum_{i=1}^{N_{Pa}} \alpha_i \mathbf{x}_i $$ where $\sum_{i=1}^{N_{Pa}} \alpha_i = 1, \alpha_i \in (0, 1)$; $\alpha_i$ are picked randomly for each offspring.

And there are many other crossover operators, e.g. BLX-α, SBX, PCX...

The mutation is easier. One of the most used way is to just add a randomly generated number (or vector if our genotype is more numbers) from a distribution of our choice. The most used distributions are the uniform distribution (where we need to specify the bounds) and the normal distribution (where we need to specify the variance).

Next...

The final part of this introductory mini-series, which will finally get us to what I'm actually doing, is going to be big. It is going to be about Genetic Programming. So stay tuned!

Saturday, July 18, 2015

GECCO - days 3, 4 and 5

I'm very sorry I didn't post updates on GECCO as it went. I was quite busy at the conference as well as tired at the end of the day(s). So I'll sum up the three days of the main conference in this one post.

Day 3

It was the first day of the main conference, i.e. the part where the full papers were presented in a 25 minutes long talks (20 + 5 for questions and discussion). It started with a keynote by Ricard Solé from Universitat Pompeu Fabra entitled „Re-designing Nature with Synthetic Biology: from Artificial Ants and Tissues to a New Biosphere“. He talked about designing new organisms to do useful stuff like fixing cracks in concrete or repairing damaged land. To be honest I didn't understand most of it very much but it was interesting.

Here are two interesting talks I have visited. They are not the only ones but these I find the most interesting and having the highest impact on my own research.

Building Predictive Models via Feature Synthesis

A great talk given by Una-May O'Reilly, from MIT, introduced a new approach for building predictive models. It is based on Genetic Programming and LASSO, a linear regression method that penalizes coeffitients size. I still have to read the paper but from what was told during the talk, but this is really a step forward in evolutionary machine learning as it is supposed to be fast and produce good models.

Examining the „Best of Both Worlds“ of GE

This talk I would call „a bad day for GE“. GE, or Grammatical Evolution, is an algorithm for Genetic Programming which happens to be the one I use very extensively (I promise I will write about this! Really! Sometimes...). In his talk, Peter A. Whigham has shown that the GE is not as good algorithm as it is thought of. In fact he has shown that it is only a little bit better than random search. If one wants to use grammar-based GP approach he recommends to use Context-free Grammar Genetic Programming (CFGGP), an approach older than GE.

Day 4

The fourth day, or the second day of the main conference, started with a keynote by Kate Smith-Miles from the Monash University, entitled „Visualising the Diversity of Benchmark Instances and Generating New Test Instances to Elicit Insights into Algorithm Performance“. The title sounds scary but it really is not that. Kate described the problem of current algorithm research procedure:

Propose a new algorithm which is (usually) a minor variation on the existing ones.
Measure the performance on this, this ... and this benchmark.
Report that my new algorithm is overall a little bit better than the state of the art.
Publish a paper.

She argued that this approach can only show strengths of the algorithm and not its weaknesses, and she raised a question whether the benchmarks that are commonly being used are really that diverse. She continued on with her talk by demonstrating a new approach for performance measurement of algorithms by computing features of the problems, projecting this space into 2D and there visualising the areas where particular algorithms are the most effective and where lay their weaknesses.

Interactively Evolving Compositional Sound Synthesis Networks

An interesting talk at the Digital Entertainment Technologies and Arts track given by Amy Hoover. The research she presented was about using CPPNs evolved by NEAT to produce sound synthesisers. All this happens through an interactive evolution, i.e. the user manually selects which „solutions“ are good. After all, you can try the very product of the research here. http://bthj.is/breedesizer/

An Efficient Structural Diversity Technique for Genetic Programming

The first of the three best paper nominees in the Genetic Programming track. The talk itself was not that great but the presented research, or the idea behind it, is brilliant. It was about enhancing diversity in GP, but with a very very simple yet powerful mechanism: you record first (few) levels of the trees, called genetic markers, and you record a „density“, or how many individuals have the same genetic marker, and you then use some multi-objective paradigm to optimize both the fitnes and the density.

Genetic Programming with Epigenetic Local Search

Second of the best paper nominees. The talk was about an epigenetic search in GP. In biology, epigenetics is a subfield of genetics which looks at the changes in gene expression other than the changes in the DNA (very simplified). Here it meant the ability of parts of the genotype to be simply switched off. Since this would be very difficult to do with classical tree GP, the Lee Spector's Push language was used. The genotypes simply carry a bit for each element in the genotype which simply means whether this element is to be used or not. The epigenetic local search then plays with these bits.

Memetic Semantic Genetic Programming

The last of the best paper nominees and actually the one that won (though I voted for the first one). The research was focused on GP for boolean algebra (and extended for finite algebras). The „trick“ was to analyze where an individual is wrong and repairing the subtrees that cause it to be wrong, using a static library of trees generated once at the start.

Day 5

The last day of the conference. It started with a „ceremony“ where best paper winners were presented, as well as the winners of competitions and other awards. Then the next GECCO place was presented, which is going to be Denver, Colorado, USA.

After the ceremony there was the last keynote, given by Manuel Martín-Loeches from Complutense University of Madrid entitled „Origins and Evolution of Human Language: Saltation or Gradualism?“. And as the title says, it was about the evolution of the human language. It was very interesting and the message is that Manurel Martín-Loeches stands by the theory that the language was a result of gradual evolution by small mutations, rather than one (or a few) big macro-mutation(s).

The last session of GECCO did not have any talks that were very relevant for my research so I looked mostly for things that were just interesting...

Novelty Search for Soft Robotic Space Exploration

An interesting talk about using Novelty Search for evolving soft robots for space exploration. The research was focused on evolving robots made of soft materials of varying properties for exploration of other bodies, like the Moon or Mars. It used indirect encoding by means of evolving CPPNs that produce the robot instead of evolving the robot directly. The CPPN told whether a voxel is to be filled with the material and if yes then with which material. The presentation was quite funny because the presenter showed us videos of the evolved robots moving, some of which being very funny.

Final summary

This was my first GECCO and also my first conference of such magnitude. In my opinion the conference ran very smoothly, there were no problems. I successfuly presented both my papers, one at the Student Workshop and the second one in the regular poster session. I had a chance to talk with other researchers and students which was quite nice. To sum up, I consider GECCO 2015 to have been great and I hope it was not the last one for me.

Monday, July 13, 2015

GECCO - day 2

„Hola!“ as they say here in Madrid. The day two of GECCO 2015 is over. Well, still not the GECCO itself but the workshops and tutorials before. I presented my student poster today and I would like to mention two of the tutorials I attended.

Evolutionary Computation: A Unified Approach

Given by Kenneth A. De Jong, it was a nice and simple tutorial about the general aspects of evolutionary computation. The main goal of the tutorial was to give a broad overview of various characteristics and aspets of evolutionary algorithm, evolution strategies, genetic algorithms etc. and extract all of these into a „unyfing“ field called Evolutionary Computation. The tutorial was very basic and I can't say I was taught something new, but Kenneth's overall presentation put all the things into a single perspective which was very good and excellent to sort thoughts out.

Expressive Genetic Programming

Given by Lee Spector (GitHub), this was also a nice tutorial heavily targeted at the Push programming language designed for genetic programming. The language is, one could say, weird: everything is on stacks, including the program itself, everything reads from stacks and pushes to them or somehow mangles with them. It is designed with minimal syntax constraints. It is for a reason - so that random programs can exist and be valid. It's worth giving a look. There are many implementations of Push, like Clojush which is in, you guessed it, Clojure! I decided I'll try to implement it in Haskell and name it Hush :). I already created a GitHub repo, but there is nothing there yet.

Stay tuned for the report from today which is the third day and also the first day of the main conference. It is (almost) filled with interesting talks, unfortunately some of them I will have to miss (because there are other ones taking place at the same time slot). Stay tuned...

Saturday, July 11, 2015

Hello from Madrid! GECCO - day 1

Hello everyone from Madrid, Spain! First of all, it is quite hellish weather here - no clouds whatsoever and close to 40°C (I won't tell you Farenheits because I hate that!) in the shadow. At least there is some wind blowing... Anyway, today was the first day of GECCO, or more precisely the workshops and tutorials before the main conference. My main focus was the Student Workshop where I presented my paper which I won't write about yet because the Evolution in Computer, pt. 3 is not out yet. As a matter of fact, pt. 2 is not out yet too for which I'm very sorry (but it's almost ready to be released, maybe I'll post it while still in Madrid, stay tuned).

Low or No Cost Distributed Evolutionary Computation

One of the very few tutorials I had time to attend was this one given by JJ Merelo of Universidad de Granada.

The @gecco2015 talk on "low or no cost distributed evolutionary algorithms" http://t.co/ixkMNaeeRs (+ last minute changes) /cc @jormaral
— JJ Merelo (@jjmerelo) July 11, 2015

It was about how to use (almost) free (to you) resources to do your evolutionary computation job. He claimed to use JavaScript because it is a standard, it is everywhere and you can write both client and server app (through node.js) and that you should to it in JavaScript because in the end you must write both the client and server so why not do both in the same language. Good point, but he probably does not know about Dart. That one has all the attributes of JavaScript he presented and it is not **** :). By the way, if you want to know more about Dart and things around, look at a blog of a friend of mine, Jana Moudrá, who is also a Google Developer Expert on Dart.

Back to the tutorial. The main point was to distribute the evolutionary computation using pool EA. Unfortunately, he didn't talk about that, instead he talked about the technical realization of such distributed computation. The idea is to use web browsers, which are the new operating systems, of ordinary people to run parts of your computation. In other words, turn people into computing power by „convincing“ them to visit your page that contains the client code that will make them take part in the distributed computation. The second main idea is to use (almost) free services to run the server part (he mentioned Heroku and OpenShift). The third idea is to tell about this to people using social networks (he mentioned Twitter) to get as many „clicks“ as possible. But not to smuggle the dirty computation hidden somewhere in an otherwise innocent page, but be open, honest and give „something“ in return. That something is to be knowledge, e.g. code on github.

Should I rate the tutorial, I would say I'm a little bit disappointed. I expected more about the algorithmic part, but on the other hand the idea of using web browsers of ordinary people is brilliant.

That concludes the day one at the Genetic and Evolutionary Computation Conference or GECCO in Madrid, Spain. Stay tuned for the next day.