15,041,025 members
Articles / Artificial Intelligence / Machine Learning
Article
Posted 5 Jun 2016

74.5K views
12 bookmarked

# Introduction to Genetic Algorithms with Python - Hello World!

Rate me:
A hands-on, step-by-step introduction to machine learning with genetic algorithms using Python.

## Hello World!

### Guess my number

Let’s begin by learning a little bit about genetic algorithms. Reach way back in your memories to a game we played as kids. It is a simple game for two people where one picks a secret number between 1 and 10 and the other has to guess that number.

```Is it 2?  No

Is it 3?  No

Is it 7?  No

Is it 1?  Yes```

That works reasonably well for 1..10 but quickly becomes frustrating or boring as we increase the range to 1..100 or 1..1000. Why? Because we have no way to improve our guesses. There’s no challenge. The guess is either right or wrong, so it quickly becomes a mechanical process.

```Is it 1?  No

Is it 2?  No

Is it 3?  No

Is it 4?  No

Is it 5?  No

...```

So, to make it more interesting, instead of no let’s say higher or lower.

```1?  Higher

7?  Lower

6?  Lower

5?  Lower

4?  Correct```

That might be reasonably interesting for a while for 1..10 but soon you’ll increase the range to 1..100. Because people are competitive, the next revision is to see who is a better guesser by trying to find the number in the fewest guesses. At this point the person who evolves the most efficient guessing strategy wins.

However, one thing we automatically do when playing the game is make use of domain knowledge. For example, after this sequence:

```1?  Higher

7?  Lower```

Why wouldn’t we guess 8, 9, or 10? The reason is, of course, because we know that those numbers are not lower than 7. Why wouldn’t we guess 1? Because we already tried it. We use our memory of what we’ve tried, our successes and failures, and our knowledge of the domain, number relationships, to improve our guesses.

A genetic algorithm does not know what lower means. It has no intelligence. It does not learn. It will make the same mistakes every time. It will only be as good at solving a problem as the person who writes the code. And yet, it can be used to find solutions to problems that humans would struggle to solve or could not solve at all. How is that possible?When playing a card game inexperienced players build a mental map using the cards in their hand and those on the table. More experienced players also take advantage of their knowledge of the problem space, the entire set of cards in the deck. This means they may also keep track of cards that have not yet been played, and may know they can win the rest of the rounds without having to play them out. Highly experienced card players also know the probabilities of various winning combinations. Professionals, who earn their living playing the game, also pay attention to the way their competitors play…​ whether they bluff in certain situations, play with their chips when they think they have a good hand, etc.

Genetic algorithms use random exploration of the problem space combined with evolutionary processes like mutation and crossover (exchange of genetic information) to improve guesses. But also, because they have no experience in the problem domain, they try things a human would never think to try. Thus, a person using a genetic algorithm may learn more about the problem space and potential solutions. This gives them the ability to make improvements to the algorithm, in a virtuous cycle.

What can we learn from this?

Technique: The genetic algorithm should make informed guesses.

Now let’s see how this applies to guessing a password. We’ll start by randomly generating an initial sequence of letters and then mutate/change one random letter at a time until the sequence of letters is "Hello World!". Conceptually:

pseudo code

```_letters = [a..zA..Z !]
target = "Hello World!"
guess = get 12 random letters from _letters
while guess != target:
index = get random value from [0..length of target]
guess[index] = get 1 random value from _letters```

If you try this in your favorite programming language you’ll find that it performs worse than playing the number guessing game with only yes and no answers because it cannot tell when one guess is better than another.

So, let’s help it make an informed guess by telling it how many of the letters from the guess are in the correct locations. For example "World!Hello?" would get 2 because only the 4th letter of each word is correct. The 2 indicates how close the answer is to correct. This is called the fitness value. "hello world?" would get a fitness value of 9 because 9 letters are correct. Only the h, w, and question mark are wrong.

### First Program

Now we’re ready to write some Python.

#### Genes

We start off with a generic set of letters for genes and a target password:

Python
```geneSet = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
target = "Hello World!"```

#### Generate a guess

Next we need a way to generate a random string of letters from the gene set.

Python
```import random

...

def generate_parent(length):
genes = []
while len(genes) < length:
sampleSize = min(length - len(genes), len(geneSet))
genes.extend(random.sample(geneSet, sampleSize))
return ''.join(genes)```

`random.sample` takes `sampleSize` values from the input without replacement. This means there will be no duplicates in the generated parent unless `geneSet` contains duplicates, or `length` is greater than `len(geneSet)`. The implementation above allows us to generate a long string with a small set of genes while using as many unique genes as possible.

#### Fitness

The `fitness` value the genetic algorithm provides is the only feedback the engine gets to guide it toward a solution. In this problem our `fitness` value is the total number of letters in the guess that match the letter in the same position of the password.

Python
```def get_fitness(guess):
return sum(1 for expected, actual in zip(target, guess)
if expected == actual)```

#### Mutate

We also need a way to produce a new guess by mutating the current one. The following implementation converts the parent string to an array with `list(parent)` then replaces 1 letter in the array with a randomly selected one from `geneSet`, and then recombines the result into a string with `''.join(genes)`.

Python
```def mutate(parent):
index = random.randrange(0, len(parent))
childGenes = list(parent)
newGene, alternate = random.sample(geneSet, 2)
childGenes[index] = alternate \
if newGene == childGenes[index] \
else newGene
return ''.join(childGenes)```

This implementation uses an alternate replacement if the randomly selected `newGene` is the same as the one it is supposed to replace, which can save a significant amount of overhead.

#### Display

Next, it is important to monitor what is happening, so that we can stop the engine if it gets stuck. It also allows us to learn what works and what does not so we can improve the algorithm.

We’ll display a visual representation of the gene sequence, which may not be the literal gene sequence, its fitness value and how much time has elapsed.

Python
```import datetime

...

def display(guess):
timeDiff = datetime.datetime.now() - startTime
fitness = get_fitness(guess)
print("{0}\t{1}\t{2}".format(guess, fitness, str(timeDiff)))```

#### Main

Now we’re ready to write the main program. We start by initializing `bestParent` to a random sequence of letters.

Python
```random.seed()
startTime = datetime.datetime.now()
bestParent = generate_parent(len(target))
bestFitness = get_fitness(bestParent)
display(bestParent)```

Then we add the heart of the genetic engine. It is a loop that generates a guess, requests the `fitness` for that guess, then compares it to that of the previous best guess, and keeps the better of the two. This cycle repeats until all the letters match those in the target.

Python
```while True:
child = mutate(bestParent)
childFitness = get_fitness(child)

if bestFitness >= childFitness:
continue
display(child)
if childFitness >= len(bestParent):
break
bestFitness = childFitness
bestParent = child```

#### Run

Now run it.

sample output

```ftljCDPvhasn 1 0:00:00
ftljC Pvhasn 2 0:00:00
ftljC Pohasn 3 0:00:00.001000
HtljC Pohasn 4 0:00:00.002000
HtljC Wohasn 5 0:00:00.004000
Htljo Wohasn 6 0:00:00.005000
Htljo Wohas! 7 0:00:00.008000
Htljo Wohls! 8 0:00:00.010000
Heljo Wohls! 9 0:00:00.013000
Hello Wohls! 10 0:00:00.013000
Hello Wohld! 11 0:00:00.013000
Hello World! 12 0:00:00.015000```

Success! You’ve written a genetic algorithm in Python!

### Extract a reusable engine

Now that we have a working solution to this problem we will extract the genetic engine code from that specific to the password problem so we can reuse it to solve other problems. We’ll start by creating a new file named `genetic.py`.

Next we’ll move the `mutate` and `generate_parent` functions to the new file and rename them to `_mutate` and `_generate_parent`. This is how protected functions are named in Python. They will not be visible to users of the `genetic` library.

#### Generate and Mutate

Since we want to be able to customize the gene set used in future problems we need to pass it as a parameter to `_generate_parent`

Python
```import random

def _generate_parent(length, geneSet):
genes = []
while len(genes) < length:
sampleSize = min(length - len(genes), len(geneSet))
genes.extend(random.sample(geneSet, sampleSize))
return ''.join(genes)```

and `_mutate`.

Python
```def _mutate(parent, geneSet):
index = random.randrange(0, len(parent))
childGenes = list(parent)
newGene, alternate = random.sample(geneSet, 2)
childGenes[index] = alternate \
if newGene == childGenes[index] \
else newGene
return ''.join(childGenes)```

#### `get_best`

Next we’ll move the main loop into a new function named `get_best` in the `genetic` library file. Its parameters will include the functions it should use to request the fitness for a guess and to display (or report) each new best guess as it is found, the number of genes to use when creating a new sequence, the optimal fitness, and the set of genes to use for creating and mutating gene sequences.

Python
```def get_best(get_fitness, targetLen, optimalFitness, geneSet, display):
random.seed()
bestParent = _generate_parent(targetLen, geneSet)
bestFitness = get_fitness(bestParent)
display(bestParent)
if bestFitness >= optimalFitness:
return bestParent

while True:
child = _mutate(bestParent, geneSet)
childFitness = get_fitness(child)

if bestFitness >= childFitness:
continue
display(child)
if childFitness >= optimalFitness:
return child
bestFitness = childFitness
bestParent = child```

Notice that we call `display` and `get_fitness` with only one parameter - the child gene sequence. This is because we do not want the engine to have access to the target value, and it doesn’t care whether we are timing the run or not, so those are not passed to the function.

We now have a reusable library named `genetic` that we can access in other programs via `import genetic`.

#### Use the `genetic` library

Back in `guessPassword.py` we’ll define functions that allow us to take the candidate gene sequence passed by `genetic` as a parameter, and call the local functions with additional required parameters as necessary.

Python
```def test_Hello_World():
target = "Hello World!"

geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
startTime = datetime.datetime.now()

def fnGetFitness(genes):
return get_fitness(genes, target)

def fnDisplay(genes):
display(genes, target, startTime)

optimalFitness = len(target)
genetic.get_best(fnGetFitness, len(target), optimalFitness, geneset, fnDisplay)```

#### Display

Notice how `display` now takes the target password as a parameter. We could leave it as a global in the algorithm file but this allows us to try different passwords if we want.

Python
```def display(genes, target, startTime):
timeDiff = datetime.datetime.now() - startTime
fitness = get_fitness(genes, target)
print("{0}\t{1}\t{2}".format(genes, fitness, str(timeDiff)))```

#### Fitness

We just need to add `target` as a parameter.

Python
```def get_fitness(genes, target):
return sum(1 for expected, actual in zip(target, genes)
if expected == actual)```

#### Main

Speaking of tests, let’s rename `guessPassword.py` to `guessPasswordTests.py`. We also need to import the `genetic `library.

Python
```import datetime
import genetic```

Lastly, we’ll make sure that executing `guessPasswordTests `from the command line runs the test function by adding:

Python
```if __name__ == '__main__':
test_Hello_World()```

If you are following along in an editor like repl.it be sure to run the test to verify your code still works.

### Use Python’s `unittest` framework

Next, we’ll make `guessPasswordTests.py` compatible with Python’s built in test framework.

Python
`import unittest`

To do that we have to move at least the main test function to a class that inherits from `unittest.TestCase`. We also need to add `self` as the first parameter of any function we want to access as an instance method because it now belongs to the test class.

Python
```class GuessPasswordTests(unittest.TestCase):
geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!.,"

def test_Hello_World(self):
target = "Hello World!"

...
optimalFitness = len(target)
best = genetic.get_best(fnGetFitness, len(target),
optimalFitness, self.geneset,
fnDisplay)
self.assertEqual(best.Genes, target)```

The `unittest` library automatically executes each function whose name starts with "test", as long as we call its `main` function.

Python
```if __name__ == '__main__':
unittest.main()```

This allows us to run the tests from the command line and without displaying the output.

```python -m unittest -b guessPasswordTests
.
----------------------------------------
Ran 1 test in 0.020s```

OK

If you get an error like `'module' object has no attribute 'py'` you used the filename `guessPasswordTests.py` instead of the module name `guessPasswordTests`.

"Hello World!" doesn’t sufficiently demonstrate the power of our engine. Let’s try a longer password:

Python
```def test_For_I_am_fearfully_and_wonderfully_made(self):
target = "For I am fearfully and wonderfully made."
```

#### Run

```...
ForMI am feabaully and wWndNyfulll made. 33 0:00:00.047094
For I am feabaully and wWndNyfulll made. 34 0:00:00.047094
For I am feabfully and wWndNyfulll made. 35 0:00:00.053111
For I am feabfully and wondNyfulll made. 36 0:00:00.064140
For I am feabfully and wondNyfully made. 37 0:00:00.067148
For I am feabfully and wondeyfully made. 38 0:00:00.095228
For I am feabfully and wonderfully made. 39 0:00:00.100236
For I am fearfully and wonderfully made. 40 0:00:00.195524```

Outstanding.

### Introduce a `Chromosome` object

Next we’ll introduce a `Chromosome` object that has `Genes` and `Fitness` attributes.

genetic.py

Python
```class Chromosome:
Genes = None
Fitness = None

def __init__(self, genes, fitness):
self.Genes = genes
self.Fitness = fitness```

This makes it possible to pass those values around as a unit.

Python
```def _mutate(parent, geneSet, get_fitness):
index = random.randrange(0, len(parent.Genes))
childGenes = list(parent.Genes)

...

genes = ''.join(childGenes)
fitness = get_fitness(genes)
return Chromosome(genes, fitness)```

Python
```def _generate_parent(length, geneSet, get_fitness):

...

genes = ''.join(genes)
fitness = get_fitness(genes)
return Chromosome(genes, fitness)```

Python
```def get_best(get_fitness, targetLen, optimalFitness, geneSet, display):
random.seed()
bestParent = _generate_parent(targetLen, geneSet, get_fitness)
display(bestParent)
if bestParent.Fitness >= optimalFitness:
return bestParent
while True:
child = _mutate(bestParent, geneSet, get_fitness)
if bestParent.Fitness >= child.Fitness:
continue
display(child)
if child.Fitness >= optimalFitness:
return child
bestParent = child```

We also make compensating changes to the algorithm file functions.

Python
```def display(candidate, startTime):
timeDiff = datetime.datetime.now() - startTime
print("{0}\t{1}\t{2}".format(
candidate.Genes, candidate.Fitness, str(timeDiff)))```

This reduces some double work in the display function.

Python
```class GuessPasswordTests(unittest.TestCase):

...

...

def fnDisplay(candidate):
display(candidate, startTime)

optimalFitness = len(target)
best = genetic.get_best(fnGetFitness, len(target),
optimalFitness, self.geneset,
fnDisplay)
self.assertEqual(best.Genes, target)```

### Benchmarking

Next we need to add support for benchmarking to `genetic `because it is useful to know how long the engine takes to find a solution on average and the standard deviation. We can do that with another class as follows.

genetic.py

Python
```class Benchmark:
@staticmethod
def run(function):
timings = []
for i in range(100):
startTime = time.time()
function()
seconds = time.time() - startTime
timings.append(seconds)
mean = statistics.mean(timings)
print("{0} {1:3.2f} {2:3.2f}".format(
1 + i, mean,
statistics.stdev(timings, mean)
if i > 1 else 0))```

That requires the following imports:

genetic.py

Python
```import statistics
import time```

You may need to install the `statistics` module on your system via:

`<code>python -m pip install statistics</code>`

Now we need to add a test to pass the function we want to be benchmarked.

Python
```def test_benchmark(self):

This benchmark test works great but is a bit chatty because it also shows the `display` output for all 100 runs. We can fix that by redirecting output to nowhere in the benchmark function.

genetic.py

Python
```import sys

...

class Benchmark:
@staticmethod
def run(function):

...

timings = []
stdout = sys.stdout
for i in range(100):
sys.stdout = None
startTime = time.time()
function()
seconds = time.time() - startTime
sys.stdout = stdout
timings.append(seconds)

...```

Also, we don’t need it to output every run, so how about outputting just the first ten and then every 10th one after that.

genetic.py

Python
```...

timings.append(seconds)
mean = statistics.mean(timings)
if i < 10 or i % 10 == 9:
print("{0} {1:3.2f} {2:3.2f}".format(
1 + i, mean,
statistics.stdev(timings, mean)
if i > 1 else 0))```

Now when we run the benchmark test we get output like the following.

sample output

```1 0.19 0.00
2 0.17 0.00
3 0.18 0.02

...

9 0.17 0.03
10 0.17 0.03
20 0.18 0.04

...

90 0.16 0.05
100 0.16 0.05```

Meaning that, averaging 100 runs, it takes .16 seconds to guess the password, and 68 percent of the time (one standard deviation) it takes between .11 (.16 - .05) and .21 (.16 + .05) seconds. Unfortunately that is probably too fast for us to tell if a change is due to a code improvement or due to something else running on the computer. So we’re going to change it to a random sequence that takes 1-2 seconds. Your processor likely is different from mine so adjust the length as necessary.

Python
```import random

...

def test_Random(self):
length = 150
target = ''.join(random.choice(self.geneset) for _ in range(length))

def test_benchmark(self):
genetic.Benchmark.run(lambda: self.test_Random())```

On my system that results in:

 average (seconds) standard deviation 1.46 0.35

### Summary

We built a simple genetic engine that makes use of random mutation to produce better results. This engine was able to guess a secret password given only its length, a set of characters that might be in the password, and a fitness function that returns a count of the number characters in the guess that match the secret. This is a good benchmark problem for the engine because as the target string gets longer the engine wastes more and more guesses trying to change positions that are already correct.

### Final Code

The final code is available in the zip file.

## Share

 Software Developer (Senior) United States
I am a polyglot programmer with more than 15 years of professional programming experience and author of Genetic Algorithms with Python. When learning a new programming language, I start with a familiar problem and try to learn enough of the new language to solve it. For me, writing a genetic engine is that familiar problem. Why a genetic engine? For one thing, it is a project where I can explore interesting puzzles, and where even a child's game like Tic-Tac-Toe can be viewed on a whole new level. Also, I can select increasingly complex puzzles to drive evolution in the capabilities of the engine. This allows me to discover the expressiveness of the language, the power of its tool chain, and the size of its development community as I work through the idiosyncrasies of the language.

 First Prev Next
 Another python newbie Member 1483116113-May-20 6:48 Member 14831161 13-May-20 6:48
 Hi Clinton, got your a few days ago and I'm hooked. It's great the way it goes straight into the code with examples, building on nice logical steps. Straight in to mutations in the first few pages! Also, I seem to be picking up Python much better with this book than other learning resources I've tried - even though as you said in the intro, the aim is to teach GA coding principles rather than specifically GA *Python* coding. Good stuff.
 Newbie to commenting Herbert H Dupree II5-Jun-16 19:24 Herbert H Dupree II 5-Jun-16 19:24
 Re: Newbie to commenting Clinton Sheppard6-Jun-16 3:18 Clinton Sheppard 6-Jun-16 3:18
 Re: Newbie to commenting Herbert H Dupree II6-Jun-16 17:47 Herbert H Dupree II 6-Jun-16 17:47
 Last Visit: 31-Dec-99 18:00     Last Update: 27-Sep-21 7:07 Refresh 1