Genetic Programming in C#, LambdaGp

I’ve been a fan of Artificial Intelligence for a long time and I’ve been worked both with Neural Networks and Genetic Programming. Since I’ve migrated to C# I’ve lost access to all my old tools, so I decided to start work on a C# based Genetic Programming engine. I call it LambdaGp, because it makes extensive use of Lambda expressions.

After about four hours of coding (this is my fourth GP engine, so I do have some ideas about the design from the start) I have an engine that can solve my first demo problems.

I hope to publish my LambdaGp engine on CodePlex sometime in the future, but here are my two first test problems.

Symbolic regression of a constant

Applying symbolic regression to find a function that evolves to a constant may seem like overkill, but it’s a time-honored “Hello World” test-application in GP circles. Well, I always use it anyway. I’ve stated the problem as; find an expression that evaluates to a specific value, using the functions +,-,/,- and the constant terminals 1, 3 and 5

        private static Population SetupFindConstantTest(double targetValue)
        {
            Population population = new Population();
            population.GenomeDefinition.AddFunctions(
                new Add(),
                new Mul(),
                new Div(),
                new Sub(),
                new Constant(1),
                new Constant(3),
                new Constant(5));
            population.GenomeDefinition.MaxDepth = 3;
            population.PopulationSize = 750;

            population.FitnessFunction =
                ind =>
                Math.Abs(targetValue - ind.Evaluate());
            return population;
        }

Note how the fitness function simply returns the absolute distance between what the expression evaluates to, and the sought value. A perfect hit would return zero.

Below I’ll show how I use the population generated above, but when I ask it to find an expression that evaluates to 13,5 it goes through these iterations;

Generation=1,
New champion:
(-
  (/
    (* 5 3)
    (/ 3 3))
  (+
    (/ 5 5)
    (/ 3 5))),
Fitness=0,25
.
.
.
Generation=43,
New champion:
(-
  (* 3 5)
  (/ 3
    (- 5 3))),
Fitness=0,09
Reached target fitness!
Done

In other words, 3*5-3/(5-3). The reason that the fitness isn’t zero (lower fitness is better) is because of “parsimony pressure”; larger expressions have lower fitness. Fitness is punished depending on the size of the found expression.

Symbolic regression of an actual function

My second demo problem is finding an expression that matches a lambda function, for instance (5 * x^2) + (x * 3) + 5.

You’ll notice how I’ve used lambda for the fitness function instead of using a delegate. I think it’s readable enough, but not everyone would agree. But I could easily use a method delegate instead.

The fitness function iterates from –2 * pi to + 2 * pi in steps of 0.3, and calculates the expected value and the value that the expression suggests. If then keeps a running sum of squared errors. Again, a perfect solution would return zero. Anyway, this method struggled like crazy – to find my equation. Turns out I forgot to add multiplication, a fairly useful function…

The Ephemeral function is like a constant, only it randomizes new constants every time it’s used, in the range –5 to +5.

With multiplication, the new C# code looks like this;

private static Population SetupFindFunctionTest(Func<double, double> func)
        {
            Population population = new Population();
            population.GenomeDefinition.MaxSize = 30;
            population.GenomeDefinition.MaxDepth = 5;
            population.GenomeDefinition.AddFunctions(
                new Add(),
                new Div(),
                new Sub(),
                new Mul(),
                new Pow(),
                new Ephemeral(-5, +5));

            Variable x = population.GenomeDefinition.AddFunction(new Variable("x", 0.1));
            population.GenomeDefinition.MaxDepth = 5;
            population.PopulationSize = 2000;
            population.MaxGenerations = 500;

            population.FitnessFunction =
                ind =>
                {
                    x.Value = -2 * Math.PI;
                    double errorSum = 0;

                    while (x.Value < 2 * Math.PI)
                    {
                        double targetValue = func(x.Value);
                        double error = targetValue - ind.Evaluate();
                        errorSum += error * error;

                        x.Value += 0.3;
                    }

                    return errorSum;
                };

            return population;
        }

And it comes up with this code;

Generation=309,
New champion:
(+ x
  (+ 0,019
    (+
      (+
        (+ x x)
        (+ 4,977
          (*
            (+ x x)
            (+ x x))))
      (* x x)))),
Fitness=0,210671999999998

Again, the solution is actually “perfect”, it’s only parsimony pressure that gives it a non-zero fitness.

Symbolic regression of Sinus

To give it a trickier problem, I’m going have it try find sinus. This a complicated function to solve by symbolic regression – the Taylor expansion is infinite and the first iterations are quite significant.

sin(x) = X - X3/ 3! + X5/ 5! - ... + (-1)(n+1) * X(2*n-1)/ (2n-1)!

This proves to be much harder, as expected. I re-wrote the fitness function to report average squared error instead of the summed squared error, this gives a better feel for the actual result. I found the Taylor expansion for sinus here, and on that page you can also find the information that the fifth power expansion is accurate between –pi/2 to +pi/2 – and since we’re testing from –pi*2 to +pi*2, we need a solution of a higher power than five!

After a while, it closes in and finally finds this solution

Generation=516,
New champion:
(/
  (+ -10,02609
    (* x x))
  (-
    (-
      (/ x -4,28915)
      (*
        (/ -4,64585 x)
        (+ 1,84747 -4,15677)))
    (/
      (/
        (*
          (* x x)
          (/ x -6,95817))
        (/ -4,73401 x))
      (+
        (/ x -4,54048)
        (*
          (/ -3,55989 x)
          (* 1,65336 -1,95395)))))),
Fitness=0,00918493349840388 (before parsimony 0,00918493349840388)

This can of course be further reduced, but I’ll leave that as an exercise for the reader…

For completeness, the code to execute population run looks like this;

public static void Main(string[] args)
        {
            Population pop =
                //SetupFindConstantTest(13.5);
                //SetupFindFunctionTest(x => (x * x * 5) + (x * 3) + 5);
                SetupFindFunctionTest(Math.Sin);

            pop.AfterGeneration +=
                (sender, e) =>
                {
                    if (e.DifferentFromPreviousGeneration)
                    {
                        Console.WriteLine(
                            "Generation={0}, \n\rNew champion: \n\r{1}, \n\rFitness={2} (before parsimony {3})",
                            e.GenerationCount,
                            e.Champion.ToPrettyString(),
                            e.Champion.Fitness,
                            e.Champion.FitnessBeforeParsimony);
                    }
                    if (e.TargetFitnessReached)
                    {
                        Console.WriteLine("Reached target fitness!");
                    }
                };

            while (pop.GenerationCount < pop.MaxGenerations && pop.TargetFitnessReached == false)
            {
                pop.RunGeneration();
                if (Console.KeyAvailable)
                {
                    ConsoleKeyInfo key = Console.ReadKey();
                    if (key.KeyChar == 'R' || key.KeyChar == 'r')
                    {
                        pop.ResetPopulation();
                    }
                }
            }

            Console.WriteLine("Done");
            Console.ReadKey();
        }