Unacknowledged Errors in “Unacknowledged Costs” Essay
Back over the summer, William Dembski was talking up “Baylor’s Evolutionary Informatics Laboratory”, and one of the features there was a PDF of an essay critiquing the “ev” evolutionary computation program by Tom Schneider. Titled “Unacknowledged Information Costs in Evolutionary Computing”, the essay by Robert J. Marks and William A. Dembski made some pretty stunning claims about the “ev” program. Among them, it claimed that blind search was a more effective strategy than evolutionary computation for the problem at hand, and that the search structure in place was responsible for most of the information resulting from the program. The essay was pitched as being “in review”, publication unspecified. Dembski also made much of the fact that Tom Schneider had not, at some point, posted a response to the essay.
There are some things that Marks and Dembski did right, and others that were botched. Where they got it right was in posting the scripts that they used to come up with data for their conclusions, and in removing the paper from the “evolutionaryinformatics.org” site on notification of the errors. The posting of scripts allowed others to figure out where they got it wrong. What is surprising is just how trivial the error was, and how poor the scrutiny must have been to let things get to this point.
Now what remains to be seen is whether in any future iteration of their paper they bother to do the scholarly thing and acknowledge both the errors and those who brought the errors to their attention. Dembski at least has an exceedingly poor track record on this score, writing that critics can be used to improve materials released online. While Dembski has occasionally taken a clue from a critic, it is rather rarer that one sees Dembski acknowledge his debt to a critic.
In the current case, Marks and Dembski owe a debt to Tom Schneider, “After the Bar Closes” regular “2ndclass”, and “Good Math, Bad Math” commenter David vun Kannon. Schneider worked from properties of the “ev” simulation itself to demonstrate that the numbers in the Marks and Dembski critique cannot possibly be correct. “2ndclass” made a project out of examining the Matlab script provided with the Marks and Dembski paper to find the source of the bogus data used to form the conclusions of Marks and Dembski. vun Kannon suggested an easy way to use the Java version of “ev” to quickly check the claims by Marks and Dembski.
The “Unacknowledged Costs” paper was originally hosted on a web server at Baylor University as part of what Dembski called “Baylor’s Evolutionary Informatics Laboratory”. Since Baylor had no official link to the “Evolutionary Informatics Laboratory”, Baylor pulled the plug on the site. There seem to be many different off-campus servers that now host the content formerly on the Baylor server, but the confusion over web-hosting had at least one effect: there was a period when the PDF of the essay was readily accessible, but the associated Matlab script links were broken. I recall looking at the essay some time back, thinking that the conclusions seemed obviously bogus, but being stymied by the lack of access to the Matlab scripts.
“After the Bar Closes” commenter “2ndclass”, though, got the Matlab scripts and proceeded to do some analysis. Here at Panda’s Thumb and at “Good Math, Bad Math”, he hinted that there was a basic error in the code resulting in the odd numbers that Marks and Dembski had come up with. He also sent me email around that time, and we corresponded. “2ndclass” did not have Matlab to work with (a license costs several hundred dollars), so he had translated the program into C# to do some work on it. I pointed to the GNU Octave software, which is largely Matlab-compatible, and “2ndclass” was then able to work with both his translation and the original scripts. I made some comments on obtaining a better figure for the “p_s” value that is at the basis of the critique made in the Marks and Dembski essay, and “2ndclass” took that to the next level.
There are, it turns out, two major errors identified by “2ndclass” in the original Matlab script from Marks and Dembski, both of them associated with forming a vector “t” representing whether an offset within an “ev” genome should be “recognized” or not. Errors can be either false positives (“recognized” when it should not be) and false negatives (unrecognized when it should be). In the code from Marks and Dembski, they were obviously trying to set up a situation where 16 targets were to be recognized, and the remainder of the 131 offsets in the genome were to remain unrecognized. The first error has to do with initializing the “t” variable. Matlab has functions that will fill a vector or matrix with zeros or ones, unsurprisingly called “zeros” and “ones”. “t” was set to be all ones, and then a for-loop set each of the 16 target offsets to one. In order for that to work, “t” should have either been initialized to zeros first, or the for-loop should have set the target sites as zero (which would have entailed an inversion of the logic elsewhere, so that appears not to have been the intent). The second major error was that the “t” and “targ” vectors were incorrectly specified, leading to a row-vs-column mismatch; only the first index in “t” could possibly be altered by values in “targ”. The upshot was that the Marks and Dembski script happily churned out copious quantities of numbers that had nothing at all to do with the actual situation in the “ev” program. It’s a classic case of “Garbage in, garbage out”. Correcting these defects produced estimates of “p_s” many, many orders of magnitude smaller than the uncorrected script, obviating the claims that blind search was more effective than evolutionary computation and that “ev”‘s evolutionary computation contributed only a small fraction of the bits seen in the results.
One of the advantages of Matlab for developing scripts such as the one used by Marks and Dembski is that it is an interpreted language, and the variables used in the script can be examined in the environment after the script runs, or via use of the debugger. Matlab makes it easy to check intermediate results. However easy the tools make checking results, though, if a programmer isn’t interested in accuracy, then even easy checks (“Hmmm, maybe I should see what t loaded with values from targ actually looks like…”) will be ignored. And that is a large issue here. Why would Marks and Dembski, given a preliminary set of numbers at odds with much existing research, fail to scrutinize how they obtained those numbers? At least, the question is interesting when applied to Robert Marks, who does have a serious record of scholarship in computer science. Given the way they close their paper, there is more than a slight amount of irony:
There is a wider lesson to be learned from this case study. To maintain its integrity, the field of evolutionary computing needs to institute the following control: all computer simulations of evolutionary search must make explicit (1) a numerical measure of the inherent difficulty of the problem to be solved (i.e. the endogenous information) and (2) a numerical measure of the information contributed by the search structure toward solving the problem (i.e., the active information). This control acts as a safeguard against inflated claims for the information-generating power of evolutionary algorithms.
What controls might have kept Marks and Dembski from making inflated claims of their own remain unspecified.
“2ndclass” went first class, though, in giving Robert Marks the first chance to deal with the obviously erroneous claims made in the essay. Here is his email to Marks, released by permission:
Date: Wed, 26 Sep 2007 10:08:19 -0600
To: “Marks, Robert J.”
Subject: Serious error in the paper, “Unacknowledged Information Costs”Dr. Marks,
A few months ago I asked you some questions about your work in the Evolutionary Informatics Lab, and you graciously took the time to respond, which I appreciate.
The point of this email is to give you a heads up on a major problem in your response to Tom Schneider’s ev paper. The upshot is that your empirically-determined value for p_S is off by many orders of magnitude, and the cause for the discrepancy is a bug in Histogram.m. I’m telling you this in advance of the problem being reported by ID opponents, in case you would like to report the problem yourself first.
I have sent the following summary to others for verification, and I expect the problem to be reported shortly. Furthermore, there are easy ways to test your reported value for p_S that demonstrate it to be wrong, so I expect that others will eventually take notice. For instance, Tom Schneider’s site has a java GUI version of ev that allows you to change parameters such as population size. Changing the population size to 4390 should result in 10 perfect initial organisms if p_S = 1/439, but it doesn’t, even if you tweak the code so the target is the same as Schneider’s.
Obviously, this problem doesn’t invalidate the work of the EIL. The objective in reporting this problem will be to show that nobody has carefully read the paper in question, not even the ID proponents at Uncommon Descent who are making grandiose claims about the EIL.
Regards,
**************** START OF SUMMARY ***
This is quick response to a paper by William Dembski and Robert Marks, “Unacknowledged Information Costs in Evolutionary Computing: A Case Study on the Evolution of Nucleotide Binding Sites,” which is a response to Tom Schneider’s paper, “Evolution of Biological Information.” I explain why their empirically-determined value for p_S cannot possibly be correct, I point out the bug that led to their erroneous value, and I show how to come up with a better estimate.D&M’s reported value of .00228 for p_S should raise questions in the minds of readers. Why would a targeted evolutionary algorithm require 100 times as many queries as random sampling? (See the last paragraph in section 1 of D&M’s paper.) Why would random sampling be able to find the target in 439 queries while parallel random walks, starting at the target, are unable to find the target again in 64000 queries? (See the random walk w/o selection starting at generation 1000 in Figure 2a in Schneider’s paper.)
But the surest evidence of a problem is Figure 3 in D&M’s paper, which shows that the perceptron heavily favors vectors with very few binding sites or very few non-binding sites. D&M legitimately use this fact to explain why p_S is higher than 2^-131, but the same fact also puts a limit on how high p_S can be. For example, if every vector with 15 binding sites is more probable than any vector with 16 binding sites, then p_S for a vector with 16 binding sites can’t possibly be 1/439. There are {131 \choose 15} = 2e19 unique vectors with 15 binding sites, and it’s impossible for each of those vectors to have a probability of more than 1/439. p_S has to be lower than 2e-19 if every vector with 15 binding sites has a probability higher than p_S.
To find the actual value of p_S, we first look at the bug that resulted in the erroneous value. If we look at the vector t after running Histogram.m, we see that it consists of all 1’s, while the obvious intent of lines 25-27 is that it contain 1’s only at the targeted binding sites and zeros elsewhere. The problem is obviously in line 24. (In Octave, there is also a problem in constructing the vertical t vector with a horizontal targ vector. I don’t know if this is also the case for MATLAB.) Once these problems are fixed and we insure that t is constructed correctly, the resulting histogram shows a p_S of zero even if we run a billion generations, so it would seem that p_S is too small to be determined empirically.
But it can be roughly determined indirectly. The histogram of errors in Figure 2 of D&M’s paper is useful in that it accurately represents the case in which every site is a targeted binding site. By symmetry, this is equivalent to the histogram for the case in which there are NO targeted binding sites. In the latter case, every positive is a false positive, so the number of errors is the number of binding sites recognized by the perceptron. From the histogram, it appears that about .6% of the queries yield vectors with 16 binding sites, which can be verified by looking at histt(17) after running the script. There are {131 \choose 16} = 1.4e20unique vectors with 16 binding sites, so the average p_S for these vectors is .006/1.4e20 = 4e-23.
IF p_S for Schneider’s target vector is about average for all vectors with 16 binding sites, then it should be close 4e-23. That’s a big IF, but it’s a premise that can be tested in a scaled-down problem. I scaled the problem down to 32 potential binding sites, ran a billion random queries, and looked at the counts for only the outcomes that had exactly 4 binding sites. Upon looking at these vectors, sorted in order of frequency, a clear pattern emerged: The vectors in which the binding sites overlap significantly were found at both extremes (which differed by less than an order of magnitude), while the vectors with little or no overlap were found in the middle, very close to the average frequency. Assuming that this pattern holds when the problem isn’t scaled down, p_S for Schneider’s non-overlapping target vector should be pretty close to the average p_S of all vectors with 16 binding sites, which is 4e-23, or 2^-74.
Even if this estimate is off by a large margin, the true p_S is obviously much smaller than the value reported by D&M, which invalidates some of D&M’s conclusions and leaves others intact. The conclusion that Schneider’s evolutionary algorithm is less efficient than blind search is false. The fact remains that the perceptron introduces “active information”, which I find insightful, although the active information is much less than D&M claim (if my estimate for p_S is in the right ballpark, 57 bits as opposed to 122 bits).
*** END OF SUMMARY ***
The PDF of the essay was taken down from the “evolutionaryinformatics.org” site shortly thereafter. However, other than that amount of reaction to the news that the essay’s conclusions have a basis somewhat less stable than Birnam Wood, Marks has not taken advantage of the opportunity provided by “2ndclass” to publicly retract the findings.
Tom Schneider’s critique of the criticism, though, appeared to have less impact on Marks and Dembski. Schneider responded to various criticisms on August 3rd and 4th. Back on August 12, 2007, Schneider had this response on his “ev” blog:
2007 Aug 12. In Unacknowledged Information Costs in Evolutionary Computing: A Case Study on the Evolution of Nucleotide Binding Sites” Dembski claims that “Using repeated random sampling of the perceptron, inversion is therefore expected in 1/pS = 439 queries.” That is, according to Dembski random generation of sequences should give a solution in which there exists a creature with zero mistakes more quickly than the Ev program running with selection on. If this were true, then random generation of genomes would be more efficient than natural selection, which takes 675 generations for Rs to exceed Rf in the standard run. While anyone familiar with natural selection will immediately sense something wrong with Dembski’s numbers, it is reasonable to test his ‘hypothesis’. For the discussion of 2007 Aug 03 I used the running Ev program and counted each generation. This was a little unfair because there was only had one mutation per generation. This might make a tighter Rs distribution around zero bits because each generation is similar to the last one. Thus the estimate may be too high. A more fair test is to generate a completely new random genome each time. What is the distribution? For 100 independent generations, the mean was -0.02114 bits and the standard deviation was 0.26227 bits. This is essentially the same standard deviation as before! so the probability of getting 4 bits is, again, 4/0.26227 = 1.11×10-16. Considering 439 queries as 3 orders of magnitude, Dembski’s estimate is off by about 13 orders of magnitude.
While it is good that Marks and Dembski have begun the process of removing false claims about “ev” from their websites, that process is still incomplete, and leaves other commentary standing. For example, another paper still available on their website relies upon the bogus “ev” critique in making claims:
Using the information rich structure of ev, a random query will thus generate a successful result with the relatively large probability of pS = 0.00228. Interestingly, ev uses an evolutionary program to find the target solution with a smaller probability than is achievable with random search. Ev therefore takes poor advantage of the underlying search structure, induce negative active information with respect to it. With the adoption of the method sketched in this paper for assessing the information costs incurred by search algorithms, such improper claims for the power of evolutionary search would find their way into the literature less often.
As Schneider notes, Dembski has been criticizing Schneider’s work and person publicly based on the bogus data from the busted Matlab script. The process will not be complete until an effective retraction of the claims is made such that the people who heard the false information from Marks and Dembski are likely to also have heard of its demise, and any resulting publication notes the problem in the first attempt and credits those who put in the effort that Marks and Dembski did not.
Of course, looking at another of the papers that is listed on the EIL site, I noticed that it includes a clear error that I informed Dembski of long ago.
In fact, today is the seventh anniversary of the unregarded notification of that error. This is the standard for unacknowledged errors that Dembski has set. Time will tell as to whether Robert Marks will be an apt pupil…
Link above needs to be fixed by removing the html tag at the end.
Why indeed? This is symptomatic of people running a “research” program that has a pre-defined desired outcome. When a counter-intuitive (to the rest of the world) result is observed that’s consistent with the preconception, that’s a clear signal to look very very closely at the model. In 15 years of modeling market systems with evolutionary algorithms I’ve run onto a number of such outcomes — results consistent with what I wanted that were too good to be true. In every instance so far they were due to a bug in the model. It’s when data most closely agree with one’s preconceptions that they require the closest scrutiny.
It’s also often symptomatic of ignorance on the part of the people doing the “research.” Not knowing the substantive domain of inquiry (in this case, evolutionary models), they can’t detect when a result is at least superficially weird and therefore requires close examination.
RBH
In this case, though, both Marks and Dembski did know that the result was “superficially weird”. They took this, apparently, as due to their being the first people to approach evolutionary computation in a critical fashion, though any such pretension is ridiculous in the extreme. The way that they arrogantly cast themselves as stern schoolmarms hectoring badly behaving children supports this view.
Link fixed. Thanks for the note, guppy.