Jones, Luskin, and Text

2007/01/31 Wesley R. Elsberry

Casey Luskin has responded to my earlier bit about his equivocation, saying he is closing the debate. Not hardly, Casey. Let’s look at his latest reply…

Response to Wesley Elsberry

Wesley Elsberry attacks me as if I implied the study applies to the entire Kitzmiller ruling.

Casey’s reasoning before was based on citing a ruling that was about a case where the entire decision was provided by the lawyers for one of the parties and signed by the judge, while the DI “study” only took into account one section. It was precisely because the DI study did *not* consider the whole decision that I found Luskin’s citation of Anderson v. Bessemer City to be inappropriate.

(And Wesley asserts that only 38% of the whole ruling was taken from the plaintiffs’ findings of fact.)

Not so much “asserted” as “demonstrated”. Casey is welcome to produce a counter-demonstration to show that the results at this page are substantially inaccurate. Go ahead, Casey; the source files I used are all linked from this page.

But in fact, I stated upfront in my first post on this topic that “[t]he report covers only the section of the Kitzmiller opinion which purported to address the question of whether ID is science.”7 I have always been clear that our report did not apply to the entire Kitzmiller decision.

As explained above, this whinging is a red herring. The only purpose is to distract the reader from the plain fact that Casey’s citation before applied to whole decisions, and the KvD case wasn’t at all like that.

Wesley accuses me of â€œequivocating,â€ but there is a very good reason why special scrutiny should be given to Judge Jonesâ€™ section on whether ID is science:

Casey’s has a shell game in progress… Casey apparently hopes to confuse the reader about what was behind the charge of equivocation. Let me reiterate that…

What we have here is a clear case of equivocation on Luskin’s part. The term being used in two ways is “judicial copying”. Even the citation given by Luskin shows that the Third Circuit thinks of “judicial copying” as something different than what Luskin then offers.

Third Circuit version of “judicial copying”: “verbatim adoption of a party’s proposed findings of fact and conclusions of law”

Luskin’s version of “judicial copying”, though, is broad enough to cover the current point of discussion, Judge Jones’s decision in Kitzmiller v. Dover Area School District. That means that Luskin is talking about a situation where the judge’s decision had about 38% of its text taken from proposed findings of fact.

These are clearly two very different uses of “judicial copying”.

Got that? The equivocation was in using “judicial copying” in two very different senses as if they were the same thing.

Here’s Casey’s next bit:

As I explained in the first sentence of my media backgrounder, â€œThe section on whether ID is science is the most celebrated and expansive portion of the Kitzmiller opinion, which Judge Jones hoped would have an impact on future courts. As constitutional law scholar Stephen Gey said, â€˜the critique of ID and science is the most important part of the Kitzmiller opinion . . .â€™â€8 Our report is interested in the important section on whether ID is science, not the other sections. Wesleyâ€™s accusation falls flat.

Not hardly; far better weaseling would be needed to escape that charge. So far, nothing has been done to ameliorate or explain any conceivable means by which Casey’s previous text as it was given was *not* equivocation. As I noted above, it is precisely because the DI study did not consider the whole text that we have the situation as it stands. Repetitively saying that they did not consider the whole text does not magically make anything better about it.

Moreover, I never denied that the case law I cite deals with entire rulings, but as I will argue, the policies underlying judicial disapproval of large-scale copying of entire rulings can be extracted and applied here.

I never asserted that Casey “denied” some property of his citation. Pseudo-aggrieved put-uponness noted; it isn’t very becoming, though.

A new argument could certainly be deployed, but whether the new argument is valid or flawed makes no difference to the fact that my criticism of Casey’s previous outing as given was spot-on.

This seems appropriate since the section on whether ID is science is the most important section of the ruling, which would presumably be considered for citation by future courts. As the study showed, 90.9% of the section on whether ID is science was taken in a verbatim or near-verbatim fashion from the ACLU. As will be discussed below, analogical legal reasoning and application of the policies underlying disapproval of judicial copying should make that statistic a cause for concern.

There is a basic problem here: the premise is false. The DI “study” is a sloppy, subjective hack job whose accuracy is nowhere near good enough to deliver three significant digits. My algorithm is much, much better and has no subjective component, and I only claim it as good to two significant digits. The section on whether ID is science is not “90.9%” due to the plaintiff’s proposed findings of fact. The actual figure as I calculated it is 66%, using the same parameters of analysis that I used before in examining versions of an article by Stephen C. Meyer and in examining drafts of Of Pandas and People. (If Casey wants to assert a generic false conservatism to my approach, that would imply that the actual proportion of copying was *higher* in those other cases than I reported as well.) Even when I used more liberal parameters of 5 words in a run and up to 2 skipped words, the match level only rose to 70%. Casey, again, is welcome to demonstrate any significant departure of my results from actual results; I have provided the complete set of matches found and the source files used to derive those matches.

Further, the DI “study” failed to consider the other direction: how much of the plaintiffs’s proposed findings of fact “Is ID science?” section did Judge Jones use, and how much did he discard? I did that analysis and found that only 48% of the plaintiffs’s proposed findings of fact in that section made it into Judge Jones’s decision. Again, I have shown my work and provided all the information necessary to check my results.

The fact that Judge Jones did *not* adopt verbatim the section that Casey claims now is his sole interest, that concerning “is ID science?”, and even rejected slightly over half of it, argues strongly against any assertion that Jones’s reliance upon the plaintiffs’s proposed findings of fact in that section was excessive or otherwise worthy of the disapproval of higher courts. That is, in fact, the figure that is directly relevant to Casey’s argument:

The policy arguments are clear: large-scale judicial copying is disapproved because it can lead to errors, promulgated by overzealous lawyers, becoming incorporated directly into a ruling because the judge did not adequately scrutinize the lawyersâ€™ claims.

The proportion of what Judge Jones adopted (48%) and what he rejected (52%) from the plaintiffs goes directly to the claim of “large-scale judicial copying”. Consideration of that figure is conspicuous by its absence in Luskin’s argumentation. It is, in fact, Luskin who fills the role of “overzealous lawyer” in this instance, diverting attention from the actual state of affairs and continually making reference to a figure, “90.9%”, that is both erroneous in magnitude and which gives a false impression of precision. When real scholars engage in “analogical reasoning”, they take note not only of points of analogy, but also points of disanalogy. An example of real scholarship using analogy would be Charles R. Darwin’s “Origin of Species”, where Darwin painstakingly noted not only the obvious points of analogy, but also points of disanalogy.

In order to make things crystal clear, I’ve also generated output showing the complete text of the sections on whether ID is science, where the first two columns show the subject text’s unmatched and matched portions. This allows one to get a feeling at a glance for just where and how much has been copied and how much is different material in a document; it is superior to the DI study’s method of showing only what they consider to be matches in that way. First, there is the view of the “is ID science?” section of Jones’s decision. Second, there is the view of the “is ID science?” section of the plaintiffs’s proposed findings of fact. Remember, neither the DI nor Luskin have wanted to talk about the view you get by looking at that second comparison, the one that shows clearly that Judge Jones did not simply accept the section from the proposed findings of fact as his own, and that Jones only signed with review and consideration of both the proposed findings and the evidentiary record.

Now, let’s consider what Casey urges us to consider as “errors” that Judge Jones passed upon in using the plaintiffs’s proposed findings of fact.

Pandas indicates that there are two kinds of causes, natural and intelligent, which demonstrate that intelligent causes are beyond nature.

So what does OPAP have to say?

In the world around us we observe two classes of things: natural objects, like stars and mountains, and man-made creations, such as houses and computers. To put this into the context of origins, of how things arose, we see things resulting from two fundamentally different causes: natural and intelligent.

[…]

How do we decide whether something is the result of natural processes or intelligent causes? Most of us do it without even thinking. We see clouds and we know, based on our experience, they are the result of natural causes. No matter how intricate the shapes may be, we know that a cloud is simply water vapor shaped by the wind and the temperature. On the other hand, we may see something looking very much like a cloud that spells out the words “Vote for Smedley.” We know that, even though they are white and fluffy like clouds, the words cannot be the result of natural causes. Why not? Because our experience — and that of everybody else — tells us that natural causes do not give rise to complex structures such as a linguistic message.

[…]

What kind of intelligent agent was it? On its own, science cannot answer this question; it must leave it to religion and philosophy. But that should not prevent science from acknowledging evidences for an intelligent cause origin wherever they may exist. This is no different, really, than if we discovered life did result from natural causes. We still would not know, from science, if the natural cause was all that was involved, or if the ultimate explanation was beyond nature, and using the natural cause.

and

Darwinists object to the view of intelligent design because it does not give a natural cause explanation of how the various forms of life started in the first place. Intelligent design means that various forms of life began abruptly, through an intelligent agency, with their distinctive features already intact – fish with fins and scales, birds with feathers, beaks, and wings, etc.

All that the DI offers to assert “error” here is to pull in some quotes from the other end of the OPAP book:

Contrary to the claim made by Judge Jones (and the ACLU), Of Pandas and People insists that science cannot detect the â€œsupernatural.â€ It can merely determine whether a cause is intelligent. Whether that intelligent cause is inside or outside of nature is a question that cannot be addressed by science according to the book. These points are made clear in the following passages from the text ignored by Judge Jones:

…scientists from within Western culture failed to distinguish between intelligence, which can be recognized by uniform sensory experience, and the supernatural, which cannot. Today, we recognize that appeals to intelligent design may be considered in science, as illustrated by the current NASA search for extraterrestrial intelligence (SETI)… Archaeology has pioneered the development of methods for distinguishing the effects of natural and intelligent causes. We should recognize, however, that if we go further, and conclude that the intelligence responsible for biological origins is outside the universe (supernatural) or within it, we do so without the help of science.8 (emphasis added)

The idea that life had an intelligent source is hardly unique to Christian fundamentalism. Advocates of design have included not only Christians and other religious theists, but pantheists, Greek and Enlightenment philosophers and now include many modern scientists who describe themselves as religiously agnostic. Moreover, the concept of design implies absolutely nothing about beliefs and normally associated with Christian fundamentalism, such as a young earth, a global flood, or even the existence of the Christian God. All it implies is that life had an intelligent source.9 (emphasis added)

It appears that the DI utilizes the same technique of “correcting” errors that were never made as Casey uses above. Maybe that is no coincidence. Jones’s entire decision has no instance of “detect” being used within it, so how Jones is supposed to be in error about detecting the supernatural is a complete mystery. All that the DI’s favored quotes show is that so far as OPAP goes it delivers an inconsistent philosophy concerning what is and is not natural, not that Jones or the plaintiffs were in error to say that intelligent cause is distinguished and asserted to be different from natural cause: OPAP really and truly does say that. The same confusion the DI has about OPAP’s inconsistency means that the second claimed “error” is no error on the part of either the plaintiffs or the judge. What the DI has discovered, apparently, is that the Thomas More Law Center had refrained from the irrelevancies that the DI would have engaged in, for while the statements from OPAP showing the existence of a contradistinction between natural and intelligent cause were part of the trial testimony, TMLC failed to bring up the inconsistent parts of OPAP from far, far later in OPAP during cross-examination. Perhaps the TMLC thought it better to accept what had been revealed than to explicitly bring to the court’s attention the fact that their supposedly scientific textbook couldn’t even manage to keep how it treated classes of causes straight.

The third asserted “error” is as follows:

Professor Behe has written that by ID he means “not designed by the laws of nature”

(page 29-30 of online version)

Yet the trial testimony by Behe shows that there was no error.

[253]Q Could you open Darwin’s Black Box, which is plaintiff’s exhibit 647.

[254]A What page?

[255]Q I’m sorry. Page 193.

[256]A 193, thank you.

[257]MR. ROTHSCHILD: Matt, could you highlight on page 193, the first paragraph.

[258]BY MR. ROTHSCHILD:

[259]Q Could you read that paragraph, Professor Behe?

[260]A Can I read from the book here?

[261]Q Yes, please.

[262]A Okay. “There is an elephant in the roomful of scientists who are trying to explain the development of life. The elephant is labeled intelligent design. To a person who does not feel obliged to restrict his search to unintelligent causes, the straightforward conclusion is that many biochemical systems were designed. They were designed not by the laws of nature, not by chance and necessity, rather, they were planned. The designer knew what the systems would look like when they were completed, then took steps to bring the systems about. Life on earth at it’s most fundamental level, in it’s most critical components, is the product of intelligent activity.”

[263]Q They were designed not by the laws of nature, correct, Professor Behe?

[264]A That is correct.

Not only does Behe’s book have the indicated meaning, Behe even specifically confirmed the meaning used in the decision. I doubt that Casey could convince an appeals court of an error there when the trial record plainly shows that what appears in the decision was attested to during the trial and was never put into doubt by other testimony.

The most contentious of the asserted “errors” claimed by the DI is as follows:

In fact, on cross-examination, Professor Behe was questioned concerning his 1996 claim that science would never find an evolutionary explanation for the immune system. He was presented with fiftyeight peer-reviewed publications, nine books, and several immunology textbook chapters about the evolution of the immune system; however, he simply insisted that this was still not sufficient evidence of evolution, and that it was not â€œgood enough.â€ (23:19 (Behe)).

(page 78 of online version)

What the DI objects to here is not the meaning of what appears in the decision, since even they would have little argument with everything up to the final comma. (Behe quibbles about distinguishing “Darwinism” from evolution; see below.) If the steatement ended there, they would have almost nothing to kvetch about. It is the appearance of “not good enough” as though it was Behe’s own wording of his rejection of the material, when Behe did not assent to the use of the phrase in the questioning. There is no question that Behe did insistently reject the idea that the accumulated articles, chapters, and books could contain explanations of the evolution of the immune system that would meet his idea of what such research should look like. That, despite the fact that he testified that he had not read and was not familiar with many of the resources.

[134]Q. Is that your position today that these articles aren’t good enough, you need to see a step-by-step description?

[135]A. These articles are excellent articles I assume. However, they do not address the question that I am posing. So it’s not that they aren’t good enough. It’s simply that they are addressed to a different subject.

[136]Q. And I’m correct when I asked you, you would need to see a step-by-step description of how the immune system, vertebrate immune system developed?

[137]A. Not only would I need a step-by-step, mutation by mutation analysis, I would also want to see relevant information such as what is the population size of the organism in which these mutations are occurring, what is the selective value for the mutation, are there any detrimental effects of the mutation, and many other such questions.

It is clear that “not good enough” was to Eric Rothschild the same thing as requiring a step-by-step description, something that Behe does confirm thereafter that he will require, among other things. This is an important point, because of what follows the asserted “error” in the decision:

We find that such evidence demonstrates that the ID argument is dependent upon setting a scientifically unreasonable burden of proof for the theory of evolution.

Michael Behe’s response?

Again, as I made abundantly clear at trial, it isnâ€™t â€œevolutionâ€ but Darwinism â€” random mutation and natural selection â€” that ID challenges. Darwinism makes the large, crucial claim that random processes and natural selection can account for the functional complexity of life. Thus the â€œburden of proofâ€ for Darwinism necessarily is to support its special claim â€” not simply to show that common descent looks to be true. How can a demand for Darwinism to convincingly support its express claim be â€œunreasonableâ€?
The 19th century ether theory of the propagation of light could not be tested simply by showing that light was a wave; it had to test directly for the ether. Darwinism is not tested by studies showing simply that organisms are related; it has to show evidence for the sufficiency of random mutation and natural selection to make complex, functional systems.

That’s a long-winded version of, “Is not!”, nothing more. Behe dismisses research sight unseen as not rising to the standard that Behe will accept, and falsely characterizes the scientific literature as having no relevant articles. The plaintiffs and Judge Jones were right in their conclusion.

Casey Luskin and the DI are desperately grasping at straws to impugn the Kitzmiller decision. No argument appears to be too fallacious, no cherry-picking will be eschewed, and no consideration given that the TMLC simply did not make the case that the DI would have to have in the trial record to make its arguments work. In the commentaries by Behe and the DI staff in which they try to gainsay the decision, they often rely upon assertions not in evidence in the trial record, as when Luskin tried to take Jones to task over the issue of whether ID has peer-reviewed research making a case for the intelligent design of any biological system. Luskin listed off several publications, none of which met the criteria Jones discussed. Further, the Behe and Snoke paper was a topic of discussion in trial testimony, and even its lead author demurred from counting it as a paper that gave evidence concerning the intelligent design of a biological system. The list also either overlaps or is a subset of the list included in the amicus curiae brief rejected by Judge Jones on the grounds that it was improper for the DI to append the text of an expert report of a witness who was withdrawn from the case. Judge Jones had to decide the case based upon what was actually admitted as evidence and what was testified to by the experts in the case. This limitation on jurisprudence seems to cause a real headache for Luskin and the DI, at least going on how often in their arguments they would have preferred Jones to have ignored that stricture.

The fact of the matter is that the plaintiffs’s proposed findings of fact do not generally suffer from major defects, and Judge Jones was right to rely heavily upon them in his decision concerning whether ID is science and elsewhere. We will not see a higher court disapprove of the decision since the issue will not be taken to a higher court. The DI speculations about whether such disapproval would have ensued are simply another manifestation of their long-term case of “sour grapes” concerning the outcome. And I will assert that given another court case involving “intelligent design”, if those critical of “intelligent design” use the groundwork provided by the Kitzmiller decision that they are likely to prevail and be upheld in review by higher courts as well. Consider it a prediction.

Update (2019-02-22): Changed out-of-date link references.

It has come to my attention that some people haven’t kept track of the numbers for similarity for text comparisons.

There are a few points to make.

First, in order to make a precise statement of number of words copied between two texts, one would need to specify at a minimum the smallest run of words that will count toward that measure. In the trivial case, with a minimum run length of one, every text is copied 100% verbatim from an unabridged dictionary (assuming correct spelling and no neologisms). We are not interested in the trivial case, though, and that makes things more interesting. If we search for a run length of, say, ten words (which is my default), we can be pretty confident that any two texts that share one or more such runs of words it is because of a shared source, either one from the other or both from a third text. When one looks for such exact matches, one could generate a number with precision and only have one parameter to qualify that with. Any two long but unrelated texts in the same language are likely to share a preponderance of runs of 1 word (it is like the dictionary comparison mentioned earlier), a substantial number of 2-word runs, and decreasing numbers of matches at 3, 4, 5, etc. word runs. By the time one gets to considering 5 words or more in a row, the absolute number of matches between unrelated texts should be close to zero. So even when considering completely verbatim copying, one can have analyses that report differing percentages of matches in a subject text simply because of the length of the run being considered to constitute a match.

Second, when one wants to find more than just simple verbatim copying, one will have to make more choices, and that means more potential variation in results. Words may be changed within a run, words may be inserted, or words may be deleted. “Fourscore and seven long years ago” should be recognized as having been derived from the Gettysburg Address despite having an inserted word not present in the original. One can choose how many words at a time might have been changed, inserted, or deleted and still cause a determination that a match has occurred. This number has to be strictly less than the minimum run length being considered. When I analyse for 10 word runs, I have settled upon up to 4 words as potentially being changed, skipped, or deleted.

Third, making this a matter of algorithm rather than eyeballing removes the subjectivity from the analysis. I can choose my parameters, but once I’ve done that, that’s it. The number that pops out for one text being considered as the source of another is fixed given that choice of parameters. This is a Good Thing.

Fourth, when a percentage is reported, it depends critically upon which way that the analysis was run. The number one obtains to answer the question, “How much of reference text A was copied from subject text B?” is by no means the same number as one gets by asking, “How much of reference text B was copied from subject text A?” At the extreme end, consider a chapter from Moby Dick and the whole book as texts. The chapter is 100% copied from the book, but the book is only, say, 4% copied from the chapter.

Fifth, I like seeing a side-by-side comparison of texts as well as getting the numbers. That is why I have always provided such views as supplements to the summary numbers when I’ve done this sort of text comparison. The present instance is no exception, notwithstanding the apparent inability of some naysayers to notice and follow the provided links.

OK, so here is a list of numbers and what they mean for stuff being discussed here.

90.9% — This is the number the Discovery Institute has settled on as representing how much of the KvD decision’s section on whether ID is science (let’s call it “KvD-IDsci” for short) was copied from the plaintiffs’s proposed findings of fact section dealing with the same topic (ppfof-IDsci). How did they get that? Somebody at the DI eyeballed it and said that’s close enough. If they tasked someone else to do it again, there is no guarantee that the number would remain the same.

70% — the proportion of copied text to uncopied text *100 in KvD-IDsci taken from ppfof-IDsci when using parameters of a minimum of 5 words in a run and up to 2 words being changed, skipped, or deleted. This is a liberal matching criterion.

66% — the proportion of copied text to uncopied text *100 in KvD-IDsci taken from ppfof-IDsci when using parameters of a minimum of 10 words in a run and up to 4 words being changed, skipped, or deleted. This is a conservative matching criterion, and the standard one I use for text matching.

48% — the proportion of text copied by KvD-IDsci to text not copied there *100 in ppfof-IDsci, using parameters of a minimum of 10 words in a run and up to 4 words being changed, skipped, or deleted. This one has confused at least one person, who seems to have thought that this was another number applied to the analysis of KvD-IDsci. Instead, this number indicates how much of ppfof-IDsci was used by Judge Jones, not how much of KvD-IDsci came from there.

38% — the proportion of copied text to uncopied text *100 in the KvD decision taken from the plaintiffs’s proposed findings of fact when using parameters of a minimum of 10 words in a run and up to 4 words being changed, skipped, or deleted. The whole ruling has quite a bit of text that did not come from the PPFOF.

3 thoughts on “Jones, Luskin, and Text”

mark

2007/01/31 at 9:16 pm

The York Dispatch carried a version of Luskin’s rant (19 December 2006). A week earlier, the Dispatch editorial was “‘Design’ folks lose poorly” and a number of other articles in the York and other Pennsylvania papers similarly spoke of the Discovery Institute in less-than-admirable terms. In a letter to the editor of the Dispatch responding to Luskin’s whine (reproduced here), I noted there were several things Luskin failed to mention in his opinion piece that were at least as important as what he did mention. Furthermore, I noted, it is heartening to see that many editors and reporters “do indeed understand the Discovery Institute’s tactics.”
C.E. Petit

2007/01/31 at 9:17 pm

It’s pretty obvious to me that Luskin is not a litigator. The whole point of a well-drafted set of proposed findings of fact and conclusions of law is that they persuasively — but fairly — summarize the evidence as it relates to the pleadings. Some lawyers do go overboard; they’re the ones who have little credibility with judges… and get the favorable results they obtain at trial overturned on appeal.

What Luskin has apparently forgotten is the concept of restricted choice. Civil litigation begins the the language of the complaint. That begins to restrict what a judge can say in response, and how. As the parties put in routine motions, evidence, and posttrial motions, the judge’s expression gets further restricted; after all, if a judge’s opinion does not clearly come from the pleadings and evidence before him/her, the judge’s opinion will be overturned on appeal (and judges don’t like that very much).

Luskin is too used to being able to write in a factual vacuum. Commentators not bound to the advocacy of a client’s position enjoy this freedom, one that is denied to litigators… and judges.
AustringerPost author

2007/02/02 at 7:24 am

It has come to my attention that some people haven’t kept track of the numbers for similarity for text comparisons.

There are a few points to make.

First, in order to make a precise statement of number of words copied between two texts, one would need to specify at a minimum the smallest run of words that will count toward that measure. In the trivial case, with a minimum run length of one, every text is copied 100% verbatim from an unabridged dictionary (assuming correct spelling and no neologisms). We are not interested in the trivial case, though, and that makes things more interesting. If we search for a run length of, say, ten words (which is my default), we can be pretty confident that any two texts that share one or more such runs of words it is because of a shared source, either one from the other or both from a third text. When one looks for such exact matches, one could generate a number with precision and only have one parameter to qualify that with. Any two long but unrelated texts in the same language are likely to share a preponderance of runs of 1 word (it is like the dictionary comparison mentioned earlier), a substantial number of 2-word runs, and decreasing numbers of matches at 3, 4, 5, etc. word runs. By the time one gets to considering 5 words or more in a row, the absolute number of matches between unrelated texts should be close to zero. So even when considering completely verbatim copying, one can have analyses that report differing percentages of matches in a subject text simply because of the length of the run being considered to constitute a match.

Second, when one wants to find more than just simple verbatim copying, one will have to make more choices, and that means more potential variation in results. Words may be changed within a run, words may be inserted, or words may be deleted. “Fourscore and seven long years ago” should be recognized as having been derived from the Gettysburg Address despite having an inserted word not present in the original. One can choose how many words at a time might have been changed, inserted, or deleted and still cause a determination that a match has occurred. This number has to be strictly less than the minimum run length being considered. When I analyse for 10 word runs, I have settled upon up to 4 words as potentially being changed, skipped, or deleted.

Third, making this a matter of algorithm rather than eyeballing removes the subjectivity from the analysis. I can choose my parameters, but once I’ve done that, that’s it. The number that pops out for one text being considered as the source of another is fixed given that choice of parameters. This is a Good Thing.

Fourth, when a percentage is reported, it depends critically upon which way that the analysis was run. The number one obtains to answer the question, “How much of reference text A was copied from subject text B?” is by no means the same number as one gets by asking, “How much of reference text B was copied from subject text A?” At the extreme end, consider a chapter from Moby Dick and the whole book as texts. The chapter is 100% copied from the book, but the book is only, say, 4% copied from the chapter.

Fifth, I like seeing a side-by-side comparison of texts as well as getting the numbers. That is why I have always provided such views as supplements to the summary numbers when I’ve done this sort of text comparison. The present instance is no exception, notwithstanding the apparent inability of some naysayers to notice and follow the provided links.

OK, so here is a list of numbers and what they mean for stuff being discussed here.

90.9% — This is the number the Discovery Institute has settled on as representing how much of the KvD decision’s section on whether ID is science (let’s call it “KvD-IDsci” for short) was copied from the plaintiffs’s proposed findings of fact section dealing with the same topic (ppfof-IDsci). How did they get that? Somebody at the DI eyeballed it and said that’s close enough. If they tasked someone else to do it again, there is no guarantee that the number would remain the same.

70% — the proportion of copied text to uncopied text *100 in KvD-IDsci taken from ppfof-IDsci when using parameters of a minimum of 5 words in a run and up to 2 words being changed, skipped, or deleted. This is a liberal matching criterion.

66% — the proportion of copied text to uncopied text *100 in KvD-IDsci taken from ppfof-IDsci when using parameters of a minimum of 10 words in a run and up to 4 words being changed, skipped, or deleted. This is a conservative matching criterion, and the standard one I use for text matching.

48% — the proportion of text copied by KvD-IDsci to text not copied there *100 in ppfof-IDsci, using parameters of a minimum of 10 words in a run and up to 4 words being changed, skipped, or deleted. This one has confused at least one person, who seems to have thought that this was another number applied to the analysis of KvD-IDsci. Instead, this number indicates how much of ppfof-IDsci was used by Judge Jones, not how much of KvD-IDsci came from there.

38% — the proportion of copied text to uncopied text *100 in the KvD decision taken from the plaintiffs’s proposed findings of fact when using parameters of a minimum of 10 words in a run and up to 4 words being changed, skipped, or deleted. The whole ruling has quite a bit of text that did not come from the PPFOF.

Comments are closed.