Wednesday, September 27, 2006

Amateur Science

A couple weeks ago, I was talking to someone about the research I did in grad school. Briefly, my research involved computer-aided design of antibodies. While describing my research, I realized that the things I worked on using a relatively powerful computer and high-end software a mere 5-10 years ago could now be accomplished on a cheap home computer, most of it using free software.

I've often thought about going back to grad school, but this realization pointed me to an interesting possibility. Why not try some computational research on my own, self-funded, to see if I could get a research paper published?

The first thing I thought to try was something I had thought of at the end of my time in grad school. One of the big mysteries left in biology is how proteins fold. If proteins simply sampled the possible shapes which they could fold into one at a time, it's said that it would take longer than the age of the universe for a typical protein to fold. But within a cell, proteins fold into their proper shape within milliseconds, or, at longest, hours. And the shape of proteins is important; it's what determines what they do in cells, and what proteins do is what determines what living things do. Another important note on all this is that the sequence of amino acids in the protein determines the final shape of the protein.

One thing that's likely to happen during protein folding is that certain parts of the protein quickly fold, and this quick folding brings other parts closer together, which then interact to fold the protein into its final shape. So, I thought it would be interesting to search through the known protein structures, and see if any short sequences of amino acids --say, combinations of 3 amino acids--tend to have a standard shape in these known proteins. With 20 possible amino acids, and 3 positions, that comes out to 20 * 20 * 20 = 8000 possible 3-amino-acid combinations. It's a lot of things to check, but that's what computers are good at.

So, before starting on this project, I decided to check PubMed to see what had been done since I looked into this in 2000 or so. And, of course, in December of 2002, somebody published a paper about exactly what I was planning to do.

So, that one was out (well, mostly; I still think I might give it a try, but just as an exercise in programming and to test the replicability of their data). While I was at PubMed, I decided to poke around and see if any other ideas fell out.

So then I started thinking about evolution. I started thinking about the human genome. The human genome contains about 3 billion DNA base pairs. New DNA gets into the human genome through two routes, as far as we know: duplication of existing DNA (through things like copying errors or transposons), or incorporation of viral DNA into the genome (something called lysogeny). By tracing the relationships between these genes, using the same techniques we use to trace relationships between genes in different organisms, it should be possible to trace the evolutionary history of every gene in the human genome (with the possible exceptions of the virally introduced genes and genes that diverged too long ago for us to recognize their relationship). I thought that would be an interesting thing to try.

So, of course, did several other people. For example, this mob worked together to map human chromosome 18. Others have done similar things on other parts of the human genome. Not only had people beat me to the punch again, but the job was way harder than my computer is likely to be able to handle.

So... I'm still not sure exactly what I'm going to look into. I still plan to do this, but I've decided the first step is to read a bit more to catch up. If you know of any computational biology work that it might be interesting to look into, feel free to tell me about it in the comments.

Friday, September 01, 2006

Mysteries of the Explained

One Monday, a chemist was researching the properties of a new explosive. He weighed it carefully, ignited it, and then weighed the product. He was astonished to find that the product weighed more than the starting materials.

"I must have missed something," he said. "Certainly this result is not enough to overturn the well established atomic theory of matter."

He soon realized that he had forgotten to account for the mass of the air, and everyone agreed that it was prudent for him to re-examine his work.

The next day, a physicist was studying transmission of light through a new substance. When he completed his experiments, it seemed that the light was coming out of the substance before it had gone in.

"I must have missed something," he said. "Certainly this result is not enough to overturn Maxwell's electromagnetic field theory."

It took him some time, but finally he found an error in an equation. Everyone agreed that it was prudent for him to re-examine his work.

On Wednesday, a biologist was studying the genome of a bacterium. He was amazed to find that the genome had more similarity to a certain species of fungus than it did to other bacteria, even though he had expected it to be a typical bacterium.

"I must have missed something," he said. "Certainly this result is not enough to overturn the well established theory of evolution through natural selection."

"Just-so-stories!" screamed one onlooker. "The bacterium must have been designed!" shouted another.

Why oh why oh why does the study of biology get such special attention? Scientists make mistakes. More importantly, scientists, even smart ones, are not always right. When they predict something based on a well established theory, and that something turns out to be false, it is prudent to re-examine their work to see what might be wrong before chucking the well established theory. Of course, if it isn't possible to explain the evidence in the framework of the well established theory, or if the explanations require twists and turns that can be more simply explained by some other theory, than even a well established theory can be overturned (for example, it was well known in the 19th century that light propagated through a medium called the "luminiferous aether," but then the Michelson-Morley experiment showed that the aether did not exist, and a new theory had to replace aether).

Evolution is supported by piles and piles and piles of evidence, from molecular biology (both DNA comparisons and protein comparisons) to paleontology (the fossil record isn't complete, thanks to the way fossilization works, but everything in it supports descent with modification) to direct experimental observation (not to mention years of artificial selection, which is just a form of natural selection, in which humanity takes the role of the environment). And, as creationists strangely like to use as an argument, elements of the theory of evolution through natural selection are tautologous, or, as m-w.com puts it, "true by virtue of its logical form alone." Of course the organisms that are more likely to pass on their genes pass on their genes more than the organisms that aren't.

So, when we find a single organism that does something weird, of course biologists attempt to explain it--not explain it away as creationists like to say, but explain it--by fitting it into the framework of natural selection, by figuring out what selective advantage its weirdness gives it. Until someone finds something much, much stranger than anything we've found so far, natural selection is by far the best explanation we have for the diversity-yet-clear-relatedness of life.