In Part 1, I introduced the “weasel” program, gave an overview of its role as a teaching tool, and gave some analysis of random search, the procedure that is contrasted with cumulative selection. This time, I’ll be going into the behavior of “weasel” and some math relevant to “weasel” itself.
The original description of “weasel” by Richard Dawkins in “The Blind Watchmaker” laid out how the program operated. The essential elements of a “weasel” program are as follows:
- Use a set of characters that includes the upper case alphabet and a space.
- Initialize a population of $latex N$ $latex L$-character strings by copying with mutation a parent string formed by random assignments of characters from our character set.
- Identify the string closest to the target string in the population.
- If a string matches the target, terminate.
- Base a new generation population of size n upon copies of the closest matching string or strings, where each position has a chance of randomly mutating, based upon a set mutation rate.
- Go to step 3.
Note that I said “program” above and not “algorithm”. There is no guarantee that the program as described will halt, or terminate. While Dawkins did specify that $latex K = 27$ and $latex L = 28$, he did not share precisely what values he chose to use for $latex N$ and $latex \mu$. Because Dawkins did not see the need to discuss alternative methods for ensuring that the program would come to an end, one can reasonably infer that in his own runs of the program, he was using parameters of $latex N$ and $latex \mu$ that would result in termination of the program. He reported in “The Blind Watchmaker” that in three runs, the target string was matched in 43, 64, and 41 generations. Later, I’ll see about doing some forensics to see if a range of parameters can be estimated for Dawkins’ original runs. It might be the case that I can exclude whole ranges of parameters.
The following code is just 34 lines, three of them output statements, and does nothing particularly fancy, so it should be understandable with small effort on the reader’s part. It’s just straight-up structured programming, so there isn’t even object-oriented abstraction going on here.
import random # Get Pseudo-Random Number Generator (PRNG)
random.SystemRandom() # Seed PRNG
n = 250 # Set population size
t = “METHINKS IT IS LIKE A WEASEL” # Set target
b = ” ABCDEFGHIJKLMNOPQRSTUVWXYZ” # Set base pool
u = 1.0 / len(b) # Set mutation rate
print “PopSize=%d, MutRate=%f, Bases=%s, Target=%s” % (n,u,b,t)
p = “” # Initialize parent randomly
for ii in range(len(t)): # Make parent the same length as target
p += random.choice(b) # Add a randomly selected base to parent
print ” Parent=%s” % (p)
done = False # Assume we haven’t matched the target; we’ll be wrong once in 1e40 times
g = 0 # Initialize the generation count
while (done == False): # Keep going until a match is found or forever
pop =  # Previous population is cleared out
bmcnt = 0 # Initialize best match count
bc = “” # Initialize best candidate holder
for ii in range(n): # Over the population size, do this:
pop.append(“”) # Append a new blank candidate
mcnt = 0 # Initialize the match count
for jj in range(len(t)): # Over the candidate length, do this:
if (u >= random.random()): # Test for whether mutation happens here
pop[ii] = pop[ii][0:jj] + random.choice(b) # Add a mutant base
pop[ii] = pop[ii][0:jj] + p[jj] # Copy base from parent
if (pop[ii][jj] == t[jj]): mcnt += 1 # If matched to target, increment
if (mcnt > bmcnt): # If candidate matches more bases than best so far
bmcnt = mcnt # Set the best match count to current match count
bc = pop[ii] # Set the best candidate to the current candidate
if (mcnt == len(t)): # Check to see whether all bases match the target
done = True # When all match up, we are done
g += 1 # Increment the generation count
print “Gen=%05d, %02d/%d matched, Best=%s, Total=%06d” % (g,bmcnt,len(t),bc,g*n)
p = bc # Parent for next gen. is best candidate from this gen.
Here’s a sample output from a “weasel” run, generated by the Python code above:
PopSize=250, MutRate=0.037037, Bases= ABCDEFGHIJKLMNOPQRSTUVWXYZ,
Target=METHINKS IT IS LIKE A WEASEL
Parent=WAPXSPETTBEOUNRCUE AEDT BJPH
Gen=00001, 01/28 matched, Best=WAPXSPETTBEOUSRCUE AEDT BJPH, Total=000250
Gen=00002, 02/28 matched, Best=WAPXSPETTBE USRCUE AEDT BJPH, Total=000500
Gen=00003, 03/28 matched, Best=WATXSPETTBE USRCGE AEDT BJYH, Total=000750
Gen=00004, 04/28 matched, Best=WATXSPKTTBE USRCGE AEDT BJYH, Total=001000
Gen=00005, 05/28 matched, Best=DATXSPKTTBE USRCGE AADT BUYH, Total=001250
Gen=00006, 06/28 matched, Best=DATXSPKTTBE USRCGE AALT BSYH, Total=001500
Gen=00007, 07/28 matched, Best=GATXSPKTTBE USRCGE AALTEBSYH, Total=001750
Gen=00008, 08/28 matched, Best=WATXSPKTTBE USRCGE AALWEBSYX, Total=002000
Gen=00009, 09/28 matched, Best=WATXSPKTTBE USRCGEEAALWEBSYX, Total=002250
Gen=00010, 10/28 matched, Best=WATXSPKT BE USRCGEEAALWEBSYX, Total=002500
Gen=00011, 13/28 matched, Best=WATHSPKT BE US CGEE ALWEBSYX, Total=002750
Gen=00012, 14/28 matched, Best=WATHSPKT BE IS CGEE ALWEBSNX, Total=003000
Gen=00013, 15/28 matched, Best=WATHSPKT BE IS CGEE A WEBSNX, Total=003250
Gen=00014, 16/28 matched, Best=WATHIPKT BE IS CDEE A WEWSNX, Total=003500
Gen=00015, 17/28 matched, Best=WATHIOKT BT IS CDEE A WEWSVW, Total=003750
Gen=00016, 18/28 matched, Best=WATHIOKT BT IS ZDEE A WEWSVL, Total=004000
Gen=00017, 19/28 matched, Best=WATHIOKT BT IS LDEE A WEWSVL, Total=004250
Gen=00018, 20/28 matched, Best=WATHIOKT BT IS LIEE A WEWSVL, Total=004500
Gen=00019, 20/28 matched, Best=WATHIOKT BT IS LIEE A WEWSVL, Total=004750
Gen=00020, 20/28 matched, Best=WATHIOKT BT IS LIEE A WEWSVL, Total=005000
Gen=00021, 21/28 matched, Best=WATHINKT BT IS LIEE A WEWSVL, Total=005250
Gen=00022, 21/28 matched, Best=WATHINKT BT IS LIEE A WEWSVL, Total=005500
Gen=00023, 22/28 matched, Best=WATHINKT BT IS LIKE A WEWSVL, Total=005750
Gen=00024, 23/28 matched, Best=WETHINKT BT IS LIKE A WEWSVL, Total=006000
Gen=00025, 24/28 matched, Best=WETHINKT IT IS LIKE A WEWSVL, Total=006250
Gen=00026, 24/28 matched, Best=WETHINKT IT IS LIKE A WEWSVL, Total=006500
Gen=00027, 25/28 matched, Best=WETHINKS IT IS LIKE A WEWSVL, Total=006750
Gen=00028, 25/28 matched, Best=WETHINKS IT IS LIKE A WEWSVL, Total=007000
Gen=00029, 25/28 matched, Best=WETHINKS IT IS LIKE A WEWSVL, Total=007250
Gen=00030, 25/28 matched, Best=WETHINKS IT IS LIKE A WEWSVL, Total=007500
Gen=00031, 25/28 matched, Best=JETHINKS IT IS LIKE A WEWSVL, Total=007750
Gen=00032, 25/28 matched, Best=JETHINKS IT IS LIKE A WEWSVL, Total=008000
Gen=00033, 25/28 matched, Best=JETHINKS IT IS LIKE A WEWSVL, Total=008250
Gen=00034, 26/28 matched, Best=METHINKS IT IS LIKE A WEWSDL, Total=008500
Gen=00035, 26/28 matched, Best=METHINKS IT IS LIKE A WEWSDL, Total=008750
Gen=00036, 26/28 matched, Best=METHINKS IT IS LIKE A WEHSDL, Total=009000
Gen=00037, 26/28 matched, Best=METHINKS IT IS LIKE A WEHSDL, Total=009250
Gen=00038, 26/28 matched, Best=METHINKS IT IS LIKE A WEHSDL, Total=009500
Gen=00039, 27/28 matched, Best=METHINKS IT IS LIKE A WEHSEL, Total=009750
Gen=00040, 27/28 matched, Best=METHINKS IT IS LIKE A WEHSEL, Total=010000
Gen=00041, 28/28 matched, Best=METHINKS IT IS LIKE A WEASEL, Total=010250
The “Total” reported is the total number of candidate strings evaluated in the generations leading up to some candidate string matching the target at all bases. The thing to note is that this didn’t take the stupendous numbers of “tries” that we would expect for the random search case; it shows a relative improvement over random search of over thirty-six orders of magnitude in efficiency. The program runs in just a couple of seconds on my computer; I did not have to wait for the lifetimes of a great many universes to go by. The question of interest is just how “weasel” manages to improve things over random search.