Biologic Institute Releases “Stylus”

2008/06/05 Wesley R. Elsberry

The Biologic Institute, the Seattle research arm of the Discovery Institute (not to be confused with the older North American Biologic Institute of Boca Raton, FL, lab for blood chemistry analysis and medical supplier), has a paper on PLOSone announcing a software package, Stylus. Stylus is supposed to do something relative to investigating protein chemistry by simulating evolution of, um, Chinese characters.

The authors, Douglas Axe, Brendan Dixon (he of the Microsoft moneybags), and Philip Lu, know that they are basing their whole effort here on analogy. Given the far-fetched proposition that something useful about evolution of protein chemistry can be learned by examining Chinese characters, one should give them props for making about as forceful an argument by analogy as yet has been seen from religious antievolutionists. You see, certain Chinese characters bear a resemblance to certain protein folds, if one orients the protein in question in just the right way and projects it onto a 2D space. (You knew “METHINKS IT IS LIKE A WEASEL” would be relevant in here somewhere, right?) And since Chinese is a language, and various people have talked about DNA and proteins being language-like, surely it can’t be a stretch to think that there is some deep connection between Chinese character space and protein space… OK, I think that’s enough sarcasm.

The primary fault in this paper, IMO, is that their analogy presumes that protein space is as rugged and, perhaps more importantly, as discontinuous as a Chinese character space. This is by no means established, though they try to give that appearance via citation of Axe’s previous studies.

We have no expectation that Chinese character sets should be evolvable from scratch and represent a spannable set. The significant question for the Stylus team isn’t whether they can do something with Chinese characters, but whether this has any bearing on the issue of whether observed proteomics are in fact spannable given genetic operators. Meaning of Chinese characters doesn’t offhand seem to be a good stand-in for protein function, not least because of the subjective nature of the former and objective nature of the latter.

Now, I have another issue… the Stylus software itself. The Biologic Institute has a SourceForge project set up for this, and one can get the software from there via SVN (sorry, no package for Stylus as of yet):

svn co https://biologicstylus.svn.sourceforge.net/svnroot/biologicstylus

The software is specifically stated to run on FreeBSD, and I have a FreeBSD 6.3 desktop. I should be in good shape.

The README gives some requirements for compiling, and a few “pkg_add” instructions take care of that:

pkg_add -r libxml2
pkg_add -r swig
pkg_add -r python

But the “bootstrap” configuration script provided errors out with a complaint about a bad substitution for “OSTYPE”. The help output for the script doesn’t seem to mention OSTYPE at all. I have an email off to the supplied address at the Biologic Institute to see if someone there has further details to offer.

(Given that Axe is elsewhere critical of Avida, I’ll note that at least one can obtain and compile Avida if one follows the instructions.)

Update: Reed Cartwright pointed out that they have hard-coded “make” in their scripts, and “make” differs between FreeBSD and Linux. So I downloaded Stylus to my Xubuntu, loaded the specified prerequisites, and still got the same set of errors on running “bootstrap”.

Update: I got a very nice note from Brendan Dixon in response to my inquiry on difficulty getting the program going. I’ll see if I can make progress based on that. One thing he notes is that despite what it says on SourceForge, he hasn’t installed Stylus on a FreeBSD system. So I’ll concentrate on getting it running on my Xubuntu install first.

hmmmm,

I’ve been thinking about this paper for a couple days now and seeing this is where the sensible discussion is taking place – here is my 2-pence-worth.

Well, at least the ID folk are making a start. I’m sure we’re all itching to take the simulations they produce for a spin (Austringer – have you managed to install it yet?), correct the erroneous assumptions and then show a working evolutionary simulation (just like the Behe and Snoke paper in 2004). Bring on the next manuscript!.

The solution space they’re going to simulate is HUGE… and they’re only going to populate it with ~5000 functional solutions. This is a major problem. I think they show that the decay of ‘proficiency’ around fitness optima (a character) seems reasonably simulated. But if they only put 5000 such ‘islands of function’ in the whole solution space, there’s almost no way you could get from one island to the next easily. I think this what they’ll try and demonstrate.

We have to argue that the solution space they’re simulating is grossly under-populated with function. Because:

(a) Alleles of the same enzyme (separated by 2 or 3 key amino acids e.g. ABO glycosyltransferase) can have different catalytic abilities – suggesting that solution space either has a reasonable clustering of functional solutions.

(b) We can show reasonably short journeys in the solution space when new functions are developed (e.g. nylonase, antibiotic resistance, etc) – suggesting that the entire solution space is well populated.

(c) Simulating a solution space with a static finite distribution of function is, of course, a bit pointless as potentially any protein could be functional given the right setting.

(d) There will be certain concepts that have only one character in Chinese. In contrast while biological organisms may use certain functional solutions for historical reasons (e.g. RuBisCO in plants) this does not prove that there are not other possible solutions that could also do the job as well (or better).

e.g. Humans use haemoglobin to transport Oxygen (the ‘one-true-sequence’ of haemoglobin being the subject of previous creationist arguments from improbability), but similar functions are being carried out by hemocyanins in Octupi and hemerythrin in other marine invertebrates.

(e) The authors are being a bit sneaky by penalizing the growth of proteins. I would like to know whether people think that the penalty for growth they are imposing is fair. In principle as a random protein sequence grows it will actually represent a larger proportion of the solution space of smaller proteins.

e.g. a simple linear model –

if we imagine a functional solution needs a precise sequence of 50 amino acids, if we examine random 50 amino acid polypeptides for function – only one potential permutation will works.

However, if we examine random 1000 amino-acid proteins each have multiple (hundreds of) 50 aa sub-sequences that could be the correct permutation. This applies in 2 and 3 dimensional objects as well – and maybe even more significant at these higher dimensions.

There, I feel better.

24 thoughts on “Biologic Institute Releases “Stylus””

Lou FCD

2008/06/05 at 5:57 pm

I can make a killer shadow puppet dog, and with a little work I could cause it to evolve into a hawk.

There’s got to be a paper in that.
sparc

2008/06/05 at 8:41 pm

It may have escaped your notice but PLOSone offers the opportunity to add comments on their papers. Just click “start a discussion” over there.
Mike O'Risal

2008/06/06 at 3:34 am

I’m not a protein person, but I’ve read the paper and can’t for the life of me figure out what a practical application of any of it would be. It seems to me entirely superficial stuff; argument from analogy, as you say, and moreover argument for an entirely unclear end. It seems more like a party trick than a scientific paper.
Sili

2008/06/06 at 6:45 am

Off-topic-ish:

Hasn’t there been some frequency analysis or something done on DNA to show that it doesn’t behave like a language?
AustringerPost author

2008/06/06 at 7:19 am

Konopka and Martindale pointed out that (1) conformance to Zipf’s law wouldn’t establish something as language-like and (2) disputed the notion that either coding or non-coding regions of DNA could be said to make a good fit to Zipf’s law. It’s about as brief and understated a takedown as one might expect in the literature.
Torbjörn Larsson, OM

2008/06/06 at 11:49 am

Thanks for posting this.

I don’t want to make much fuss about this paper, as it is probably still born as you imply. It isn’t a creationist paper (but it could lead to one) and it isn’t fruitful biology. But I will quote a comment from ERV’s post:

I’ve read the paper after coming across your post here this morning, and as a protein person I think I can say that there are definitely many problems I see with this paper, many of which are found in the underlying assumptions and statements made by Axe as if they are fact.

While it has been a common assumption (with lots of evidence to back it up) that protein fold space is discrete, at least in the way that we classify it (SCOP, CATH) this is by no means settled and there have been some arguments made for fold space being continuous. […]

There is also the huge problem of whether this comparison by analogy bears any resemblance what so ever to reality, and I would charge that it doesn’t. […]

For one proteins are not rigid characters, they move and flex in biological reality and the modeling we do of them as rigid structures (or with limited flexibility) is a simplification borne of computational necessity.
However we do try and account for it, or minimize it as much as possible. In my opinion Stylus does no such thing. It also ignores the reality of multi-functional proteins and multi-functional folds. [Daniel Gaston.]
Torbjörn Larsson, OM

2008/06/06 at 12:02 pm

And since you are analyzing the software, I have some comments of mine own. [My linux machine is down (hardware) and I haven’t had time to more than compare paper and code.]

I’m no biologist, but I have to question a simulation that purports to be between homology, but then assumes that any deviation from the target means decreased fitness. (Which is hidden by a small fuzzification factor purportedly due to ‘change in environment’ to give a few folds with higher fitness. See the paper, and its fig 18.)

It is as if the whole protein is an active site. But doesn’t other parts of the protein contribute to homology as well?

Also I don’t understand the part of the fitness function that has a “cost” function for codons, presumably to avoid that proteins gets longer. (Which sucks for creationists if proteins can be multifunctional, but also presumably sucks for the simulation.)

The “cost” is AFAIU referenced to a growth energy cost, not a primary selection cost. (See the appendix and the implementation in genome.cpp, Function: doScoring.) And it also constrain the functional fold space that they purport to (fully?) explore. But are there “costs” involved in selection, or is it exclusively function that counts as I’ve been reading it?
Torbjörn Larsson, OM

2008/06/06 at 12:12 pm

Errata: I believe the fuzzification to mask the fitness decrease assumption is only specified in the appendix.

Btw, seems to me fitness costs are attributed to the preceding variation when it affects fitness.
So maybe it makes sense to aggregate it with the selection after all.

Still seems rather odd to me to think energy costs of growth would always affect fitness in a proportional manner. Maybe a biologist could put this paper down as “not general” on that account alone. Of course, being an analogy blew that already.
Sili

2008/06/06 at 2:09 pm

Thank you, that was exactly what I vaguely remembered. – I think I saw it as an offhand comment on a paper using DNA to label documents years ago.
Steven Dunlap

2008/06/06 at 8:11 pm

Meaning this only as constructive criticism, the co-founder of modern science, Francis Bacon, spent most of book II of “Advancement of Learning” explaining why analogies do not belong inside of a scientific theory. They can be useful and persuasive when simplifying a theory for a non-scientist, but have no weight of proof in themselves.

I’m reminded of the ‘Butterfly alphabet” poster I have on my wall. There exist so many species of butterflies that given the same principle illustrated by the million monkeys analogy we can see some form of each of the 26 letters of the Roman alphabet on the wings of 26 out of millions of species. Very pretty, but so what?
Scott

2008/06/07 at 12:03 am

“Cost” function assumptions? “Fuzzification”? I haven’t looked at the code yet, but it sounds to me on the face of it like someone is setting up an argument for front-loading “evolutionary” selection. “See? My evolutionary program has selection criteria designed by humans. Therefore all evolutionary programs have designed criteria. Therefore all of life is designed.”

Maybe I’m just too cynical, but I’m a’see-un a foundation for conflating “function” with “design” here.

However, to critique the article…

“However, while these [lattice] models incorporate actual sequences and structures (albeit non-biological ones), they incorporate no actual functions—relying instead on largely arbitrary structural criteria as a proxy for function. In view of the central importance of function to evolution, and the impossibility of incorporating real functional constraints without real function, it is important that protein-like models be developed around real structure–function relationships.”

They complain about “arbitrary” assignments of “function”. Yet what do they choose as an alternative? The arbitrary “meanings” that humans have assigned to certain marks on paper. I’ve only read the abstract, but the entire notion seems to fall apart before they’re half way through. The meanings of characters in human writing are about as arbitrary and symbolic as one can imagine.

Now, it’s certainly true that human languages and writing systems have themselves evolved (at many levels), it doesn’t look like that’s what they are talking about here. Maybe I’ll learn more as I struggle through this.
ellazimm

2008/06/07 at 12:26 am

It does sound like tea leaves and tarot I’m afraid.
AustringerPost author

2008/06/07 at 11:30 am

There’s a comment on PLOSone for the paper now, pointing out that the “I Ching” and DNA have been the subject of, if anything, more rigorous analysis for analogies.
Skeptigirl

2008/06/07 at 9:13 pm

I would like to share something the Discovery Institute has recently been excited about. It may have absolutely nothing to do with this research, but then again, I can’t help wondering given that the DI sponsors the research in question. I’ve posted a similar comment on Pharyngula.

I was at the Discovery Institute a couple weeks ago listening to a little talk that the public was invited to. While the discussion was about something else, the speaker mentioned a new discovery that they were all excited about.

You can find the gist of it here:
http://the-spyglass.blogspot.com/2008/05/parable-of-laminin.html
“The Parable of Laminin”

There is a particular protein which is vaguely cross shaped which is involved in “the structural scaffolding in almost every animal tissue”. It would seem that seeing this random shape had great meaning as a secret symbol ‘God’ left for us to discover as evidence of ‘His Creation’. Perhaps the DI folks are hoping to eventually find more secret Bible Code in protein shapes.

Funny thing is, when you saw the actual images of the proteins, they were pretty unimpressive. But when the speaker showed the diagrammatic version on a slide the crowd oohed and awed.
NP

2008/06/07 at 9:51 pm

It’s just the ID camp trying to lead the evidence where they want it to go. If they can try and pass off the analogy of the DNA code being a language as also being more than just an analogy, they will then claim that the analogy of a designer is grounded in fact.

When I stumbled across the paper in PLoS ONE, my first reaction was “WTF?”
The Ancient Mariner

2008/06/08 at 8:05 pm

Re: #14–that’s not exactly the point. No one’s seriously offering the chemical diagram of laminin as evidence of anything in any scientific sense. It’s something that those who believe in Christ find meaningful, and those who believe in anything else (or in nothing at all) find meaningless. It’s all in your perspective.
Jud

2008/06/09 at 9:38 am

While “we have no expectation that Chinese character sets should be evolvable from scratch,” doubtless the current character sets and the glyphs within those sets evolved. (For an English analogy, find an image of the Declaration of Independence and look at the “s” in “When in the course of human events….”)

It is possible that there are fewer constraints on the evolution of these glyphs (illegibility, confusing similarity to other glyphs in the character set) than there are biophysical constraints on functional protein conformation. Thus any model that considers the glyphs within Chinese character sets to be discrete and unchanging is fundamentally flawed, I would think. How can one tell whether or not the program has come up with a glyph that will be perfectly understandable to a 30th-century Chinese-reader (or whatever the 30th-century equivalent of a Chinese-reader might be)?
AustringerPost author

2008/06/09 at 10:28 am

While “we have no expectation that Chinese character sets should be evolvable from scratch,” doubtless the current character sets and the glyphs within those sets evolved. (For an English analogy, find an image of the Declaration of Independence and look at the “s” in “When in the course of human events….”)

That’s pretty much just equivocation. Biological evolution is constrained in ways that cultural evolution, such as writing systems undergo, is not.

It is possible that there are fewer constraints on the evolution of these glyphs (illegibility, confusing similarity to other glyphs in the character set) than there are biophysical constraints on functional protein conformation.

And then again, it is possible that the opposite situation applies. The way to resolve that is with research on proteomics, not speculative analogies with writing systems.

Thus any model that considers the glyphs within Chinese character sets to be discrete and unchanging is fundamentally flawed, I would think.

I see that the supply of straw is still plentiful. A model of writing system evolution that postulated discrete and fixed properties for each and every character would indeed be lacking, but nobody has suggested that, so bringing it up seems an irrelevancy.

How can one tell whether or not the program has come up with a glyph that will be perfectly understandable to a 30th-century Chinese-reader (or whatever the 30th-century equivalent of a Chinese-reader might be)?

And why would anyone care about such an assertion? I know I don’t.

This program is likely to figure in future claims that certain Chinese Han character ‘families’ are discrete and unreachable from one another, therefore one should be suspicious that the same result applies to protein families. Such a statement would simply be begging the question, as the desired conclusion is implicit in the selection of the analogy.
Pantrog

2008/06/09 at 1:00 pm

hmmmm,

I’ve been thinking about this paper for a couple days now and seeing this is where the sensible discussion is taking place – here is my 2-pence-worth.

Well, at least the ID folk are making a start. I’m sure we’re all itching to take the simulations they produce for a spin (Austringer – have you managed to install it yet?), correct the erroneous assumptions and then show a working evolutionary simulation (just like the Behe and Snoke paper in 2004). Bring on the next manuscript!.

The solution space they’re going to simulate is HUGE… and they’re only going to populate it with ~5000 functional solutions. This is a major problem. I think they show that the decay of ‘proficiency’ around fitness optima (a character) seems reasonably simulated. But if they only put 5000 such ‘islands of function’ in the whole solution space, there’s almost no way you could get from one island to the next easily. I think this what they’ll try and demonstrate.

We have to argue that the solution space they’re simulating is grossly under-populated with function. Because:

(a) Alleles of the same enzyme (separated by 2 or 3 key amino acids e.g. ABO glycosyltransferase) can have different catalytic abilities – suggesting that solution space either has a reasonable clustering of functional solutions.

(b) We can show reasonably short journeys in the solution space when new functions are developed (e.g. nylonase, antibiotic resistance, etc) – suggesting that the entire solution space is well populated.

(c) Simulating a solution space with a static finite distribution of function is, of course, a bit pointless as potentially any protein could be functional given the right setting.

(d) There will be certain concepts that have only one character in Chinese. In contrast while biological organisms may use certain functional solutions for historical reasons (e.g. RuBisCO in plants) this does not prove that there are not other possible solutions that could also do the job as well (or better).

e.g. Humans use haemoglobin to transport Oxygen (the ‘one-true-sequence’ of haemoglobin being the subject of previous creationist arguments from improbability), but similar functions are being carried out by hemocyanins in Octupi and hemerythrin in other marine invertebrates.

(e) The authors are being a bit sneaky by penalizing the growth of proteins. I would like to know whether people think that the penalty for growth they are imposing is fair. In principle as a random protein sequence grows it will actually represent a larger proportion of the solution space of smaller proteins.

e.g. a simple linear model –

if we imagine a functional solution needs a precise sequence of 50 amino acids, if we examine random 50 amino acid polypeptides for function – only one potential permutation will works.

However, if we examine random 1000 amino-acid proteins each have multiple (hundreds of) 50 aa sub-sequences that could be the correct permutation. This applies in 2 and 3 dimensional objects as well – and maybe even more significant at these higher dimensions.

There, I feel better.
AustringerPost author

2008/06/09 at 6:48 pm

No, I have not managed to get Stylus running on anything yet. I have FreeBSD 6.3, Xubuntu Hardy Heron, and Mac OS X Leopard platforms that I’ve tried it out on (accounting for 3 out of 7 “reads” of the SVN repository at SourceForge last time I checked).

Of those, the Mac OS X install got the furthest, displaying a fair amount of the usual autoconfiguration verbosity before erroring out about underquoted aclocal stuff. I think that the list of dependencies is not yet complete in the installation instructions.
Torbjörn Larsson, OM

2008/06/10 at 8:37 am

“Cost” function assumptions? “Fuzzification”? I haven’t looked at the code yet, but it sounds to me on the face of it like someone is setting up an argument for front-loading “evolutionary” selection.

Yes, or at least the old “there are in cases evolutionary costs, so all mutation is degrading” strawman.

Well, at least Axe et al started with a functional protein just as an organism would have, instead of Dembski et al “it is improbable that organisms would have a functional protein”.

In principle as a random protein sequence grows it will actually represent a larger proportion of the solution space of smaller proteins.

Thanks, I hadn’t figured out all the consequences of the cost function yet. Obviously this is another zinger.

the subject of, if anything, more rigorous analysis

Ca Ching!
Torbjörn Larsson, OM

2008/06/10 at 8:51 am

Funny thing is, when you saw the actual images of the proteins, they were pretty unimpressive. But when the speaker showed the diagrammatic version on a slide the crowd oohed and awed.

Why didn’t they use the cross in tRNA to include all life? Oh, I guess animals are of higher “baramins”.

Commenter shcrodinger says on The Big View:

While I have no desire to disparage anyone’s beliefs, I would like to say a few things about this issue. The cross diagram is just that, a diagram, not a photograph of the molecule. Diagrams of the bond formation of many different types of molecules consist of many different types of geometrical shapes including; triangles, pyramids, hexagons, tetrahedrons, octahedrons, as well as linear shapes including crosses. In fact, crosses are very common in diagrams showing bond formation. With such a geometrical grab bag to reach into, just about anyone of any belief should be able to find something of interest. A more realistic representation is in the structural diagram which shows the true bonding angles which are usually at multiples of about sixty degrees. These are referred to as “xo” drawings because of the resemblance to those letters. As a connoisseur of good cognac, I take great comfort in that symbolism, particulary since the entire family of alcohol molecules can be diagrammed as crosses! In reality the laminin molecule in question here has a hexagonal shape with six bonding places and is more similar to water molecules than the cross you see in the structural diagram. It also seems to me that the [/font][FONT=’Verdana’,’sans-serif’]asymmetrical leg in the cross diagram is slightly exaggerated but I may be wrong about that. Anyway, I must cut short my comment as I have a tennis match to attend. Did you ever notice how the strings of the racket form crosses, wherever the strings cross? I wonder what that means? [My bold, not his.]

If laminin is actually hexagonal in structure, as is ice, does that mean it is actually evidence for that Wicca precepts works?
David Utidjian

2008/06/17 at 4:05 am

Wesley,

Seems that Stylus was updated today (June 17 2008). Must have been in the early hours since I dled it at 5:24 EST.

It seems to build and run OK on my Fedora 7 linux system. I used the:

./bootstrap
./build

which completed without incident. Then I ran the ‘test’ trial with:

./stylus -e -r — -g 52DC.gene -p simple.xml -u ./sample/,./sample/

Which produces a bunch of ‘report’ files. Some of these are in html format and can be viewed in a browser.

Pretty uninteresting to me so far since I don’t know how to really test the program and why it might even be interesting. I will have to read up on it a bit more ;-)

I am interested in your results and the results of others.

-DU-
Lars

2008/12/24 at 11:34 am

“Of those, the Mac OS X install got the furthest, displaying a fair amount of the usual autoconfiguration verbosity before erroring out about underquoted aclocal stuff.”

I’m not sure what you you’re saying about aclocal, but do you have automake installed? (aclocal comes with automake)

I had trouble with the “xmllib2” requirement mentioned in the README, but eventually inferred that xmllib is a synonym for libxml.

Then when running bootstrap, I got an error saying “macro `AM_PATH_XML2′ not found in library”, which I’m told (http://osdir.com/ml/finance.quickfix.user/2007-01/msg00003.html) means I need libxml2-dev.
After installing libxml2-dev, that error is gone, but I still get “./bootstrap: 119: Bad substitution”.
And there I’m stuck. I’ll try the sourceforge stylus help forum.

Comments are closed.