Richard Dawkins provided a description of a program in his 1986 book, “The Blind Watchmaker”, that was meant to illustrate the difference between random chance and cumulative selection. Dawkins used a string taken from Shakespeare, “METHINKS IT IS LIKE A WEASEL”, and used that as a target. Simply randomly putting capital letters and spaces together, Dawkins noted, would not be expected to efficiently manage to generate the target, not until one had made a stupendous number of tries at it, so stupendous that it exceeds the capacity of trying not just in a human lifetime, but even in the age of the universe. That situation could be contrasted with one in which some principles taken from natural selection are applied to the program. When Dawkins did that, the program converged on his target string in from seconds to about an hour, depending on the language he wrote it in. The resulting program came to be called “weasel” after the string Dawkins used as a target.
As programs go, “weasel” is not complex. Dawkins meant it as a simple demonstration of a basic concept. Many people have been intrigued and implemented versions for themselves; Ian Musgrave maintains a page that links to many of those. In the interest of checking out just what size a dashed-off “weasel” program might be, I dashed one off in Perl, and I came up with 61 lines of code.
Nor is “weasel” meant to be taken as something fraught with significance. Dawkins himself had treated “weasel” as a throwaway program, and did not have the code anymore within a few years after “The Blind Watchmaker” was published.
Why, then, am I treating “weasel” to more attention and analysis? Primarily because the religious antievolutionist capacity to discomprehend, misunderstand, misrepresent, and just plain tell falsehoods about “weasel” is all out of proportion to its role in either evolutionary science pedagogy or evolutionary computation. A second reason is that understanding “weasel” puts one in a better position to understand instances of evolutionary computation that are not aimed solely at pedagogy, as “weasel” was. Conversely, it doesn’t make much sense to try to tackle non-pedagogical instances of evolutionary computation until one comprehends “weasel” well.
A Bit About Biology
Living organisms utilize chemistry to pass on heritable information from parent to daughter organisms. Most organisms, including all the ones appreciable to the unaided eye, use a chemical called “DNA” to do this. The important property for us in this particular essay is that the information in DNA is stored as sequences of four bases (A,T,C, or G). A parent organism has many DNA bases; the human genome has some 3 billion such bases. We speak of 3 billion bases, though chemically there are about 6 billion, since each base is matched by its complement, and having that complement makes it possible for DNA to do something extraordinary: it can be copied. The DNA comes apart in two strands, each of which has complementary bases paired up on a new strand each, and at the end of the process one has a nearly-perfect copy of the original. It is that nearly qualifier that makes things interesting, because nearly means that evolution happens as a result.
The copy process can introduce errors. The various ways in which the daughter DNA doesn’t exactly match the parent DNA because of copying error are called mutations. (There is another class of changes in daughter DNA compared to the parent DNA called recombination that I won’t go into here.) Mutations can be small, changing the value of a single base to a different base. Bases can be inserted, deleted, or segments flipped around in a process called inversion. Larger copying errors involve vast assemblages of DNA, and can even result in the entire genome of an organism being copied over twice or more. What biologists have determined through experimentation and confirmation of results tested and retested is that each mutational process has a characteristic rate in a particular species or lineage, and that mutations occur randomly. Mutations do not happen just where needed; they happen anywhere, and mostly either cause no noticeable difference or change something that worked well into something that doesn’t work as well. Mutations that produce a beneficial effect are relatively rare by comparison. Just because some part of the DNA is doing something that helps an organism carrying it function better does not mean that mutation cannot occur in that part of the DNA. These things have been known for many decades, and are a part of the core training any serious biologist receives in college.
So when Richard Dawkins was writing about the “weasel” program, something that he would have in mind would be this common knowledge of how biology operated when it came to heritable information. In order to make a point about cumulative selection versus random search, he should be credited with forming the description of the program to be maximally consistent with the basic facts of biology. We will be coming back to this point, because various of the misunderstandings religious antievolutionists have asserted require that Dawkins managed to get the biology horribly wrong. As we will see, though, those assertions have no basis in anything Dawkins wrote and themselves show the religious antievolutionists to have no grasp of the principles at work.
Stepping Through Weasel
Before getting into the many and various ways the religious antievolutionists manage to mangle “weasel”, the first thing to do is to take a tour of “weasel” and find out what it actually is.
The first thing to clarify is that it is a pedagogical tool, a computer science version of a napkin doodle. Its function is to give people an appreciation for how one aspect of natural selection differs from chance. (It is that distinction itself that causes so much trouble with religious antievolutionists, who mostly simply cannot accept that such a distinction can be made.) The aspect of interest for illustration via the “weasel” program is cumulative selection, the ability of a system with some properties like those found in biology to accumulate adaptive information. In the case of “weasel”, those properties are a population of candidates with particulate heritable information such that successive generations copy previous information with a certain error rate, and the information selected for copying between generations is picked due to a relative fitness difference as compared to other candidates in the population.
The second thing to note is that the “weasel” program has been effective at showing that the expectation one has of the efficacy of random search is nothing at all like what “weasel” delivers by applying parts of natural selection. The “weasel” program does so with an immediacy that any number of pages of prose may fail to impart to a reader. It is, literally, an illustration of evolution that takes place before one’s eyes, or at least a napkin-doodle type illustration that happens before your eyes. It doesn’t, and wasn’t intended to, faithfully reproduce all aspects of biological evolution. It was meant to show the difference between cumulative selection, a process based on selection of candidates with mutable inherited information, and random search. That limited aim it handles readily.
So, why is random search so ineffective? And just how ineffective is it? In order to evaluate how well “weasel” works, we need to have an understanding of just how badly other approaches work, and the canonical bad method for comparison is random search. The AI Jargon file had this down:
bogo-sort: /boh`goh sort/, n.
(var.: stupid-sort) The archetypical perversely awful algorithm (as opposed to bubble sort, which is merely the generic bad algorithm). Bogo-sort is equivalent to repeatedly throwing a deck of cards in the air, picking them up at random, and then testing whether they are in order. It serves as a sort of canonical example of awfulness. Looking at a program and seeing a dumb algorithm, one might say “Oh, I see, this program uses bogo-sort.” Esp. appropriate for algorithms with factorial or super-exponential running time in the average case and probabilistically infinite worst-case running time. Compare bogus, brute force.
A spectacular variant of bogo-sort has been proposed which has the interesting property that, if the Many Worlds interpretation of quantum mechanics is true, it can sort an arbitrarily large array in linear time. (In the Many-Worlds model, the result of any quantum action is to split the universe-before into a sheaf of universes-after, one for each possible way the state vector can collapse; in any one of the universes-after the result appears random.) The steps are: 1. Permute the array randomly using a quantum process, 2. If the array is not sorted, destroy the universe (checking that the list is sorted requires O(n) time). Implementation of step 2 is left as an exercise for the reader.
Let’s consider the probability that choosing a letter randomly matches with a specific letter. First, we’ll need to define several parameters.
The number of alternative forms that any position, or base, can take is
For the case where the bases are capital letters or a space,
The length of the target string is , and in the case where the target is, “METHINKS IT IS LIKE A WEASEL”,
The per-base mutation rate is how often a base is changed to a random alternative during copying from a parent string to a daughter string, and is .
The number of candidate strings that make up the population at any one time is .
To start “weasel”, a candidate string is generated using random bases. It is copied times with mutation according to the mutation rate to make the next generation. The string that matches the target string best, however slightly, is chosen as the best candidate string of this generation and becomes the parent of the following generation. The process of selecting the best candidate from a generation and basing the next generation on possibly mutated copies of it continues until either all the bases match or one gets fed up and stops the program.
We will also be interested in the number of correct and incorrect bases seen in the best candidate from a generation, and these will be and , respectively.
So let’s start with the random elements.
The probability that a randomly selected base will match the target string at that position is just , for “probability of a random correct base”. This probability falls as the number of alternatives increases. If we added all the lower-case letters, for example, would be just about half what it usually is for “weasel”.
On the other hand, the probability that a randomly selected base will fail to match the target string at that position is , for “probability of a random incorrect base”. This probability rises as the number of alternatives increases. Also, the probabilities just defined are complementary, so . There will be a lot of use of complementary probabilities as we continue the exploration of “weasel” math.
It will be useful to compare the performance of “weasel” with a random process. Instead of inheriting information from a parent string that already outperformed other strings, for the random process we will simply randomly assign an alternative to each base. Something important to note is that random search is stateless; there is no dependency on a prior candidate to be concerned about or include in the math. This makes the math easier for random search and other stateless approaches. But “weasel” is not stateless, making the math less tractable for generating simple figures of merit for its efficiency.
The probability that such a randomly composed candidate string will match the target, accomplishing that match in one step (and thus the term used by Dawkins, “single-step evolution”), is
When , this works out to one chance in about 1.197E+40 tries. That’s tough to deal with graphically. Here’s a graph that takes the natural logarithm of for values of on the X axis and values of on the Y axis. Zero on that scale is a value of one, or unity probability. Since however many bases taken from a pool with only one base in it will always match, the entire left-hand side is at .
Let’s spend a little time understanding the graph above. At the far left, all the values are zeroes. That’s because the left-hand side is where the number of alternative bases is 1, and thus the probability of getting that base as many times as you want is 1, and the log of 1 is 0. It is like the case where you flip a double-headed coin: no matter how many times you flip it, it is going to come up heads. It’s not a very interesting situation, so let’s consider the next-door neighbor, having two alternative bases. This is the binary case, and corresponds to flipping a fair coin. As “Length of Candidates” increases, it is like flipping a coin that many times and having it come up the way that you call it every time. So the value of the surface plotted at (2,28) is , a very small number (which is log-transformed in the graph). You’d expect to be flipping that coin 28 times in a row for several hundred million tries before you managed to call every flip correctly. Similarly, having six alternative bases would be like rolling a six-sided die some number of times and calling each roll correctly in that sequence. The value of the graph at (6,28) is , so you’d have to roll a die 28 times calling every roll for about a million million million tries just to get into the vicinity of having a 10,000 to 1 shot at having correctly called all the rolls in a try.
You can see that the color, and thus the value, at either (2,28) or (6,28) is relatively high, even though either event is improbable. That’s because we are plotting in a range that includes the value at (27,28) in the upper right, which is . Even the log-transformed value shows that however improbable the flipping of a coin 28 times or the rolling of a die 28 times with a pre-called outcome might be, getting 28 bases drawn from 27 alternatives in just the right order is far, far less probable when all that happens is a random try at it. Randomly getting to a state via coin flips, die rolls, or choosing among 26 letters and a space obviously can’t be expected to produce results efficiently. (By the way, we can now generate a probability for a try of bogo-sort actually working, assuming that both card value and suit value contribute to the ordering: . Just thought that you’d like to know.)
So, just how inefficient is random search? For the case where , if a computer could generate a billion candidates a second and evaluate them, we’d expect it to find one candidate with all correct bases once every 3.7E+23 years. The universe, by comparison, is about 1.3E+10 years of age. We can express the chance that we will find the target string in so many tries by random search in a probability, too. This one takes to be the number of tries in total, and says how likely it is to be able to guess the target using just random assignment:
When we make a candidate string from scratch randomly, perhaps some of the letters matched the target string. We can calculate how many matches we expect a randomly-generated string to have, as:
expected number of randomly generated bases matching the target
That pretty much takes care of the random search case. It’s hopeless as a means of getting to the target string, at least on a time scale that makes any sense to us. Next we’ll move on to how the “weasel” program does better.