Genes, Macromolecules, -&- Computing are related by Strange Loops

And the strange flavour of AI work is that people try to put together long sets of rules in strict formalisms which tell inflexible machines how to be flexible. ---Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid
Computerise god, it's the new religion.
Program the brain, not the heartbeat. ---Black Sabbath, Computer God

A deoxyribsose nucleic acid (DNA) molecule is a double-stranded chain of nucleic acids, held together by phospho-diester and hydrogen bonds, acting as the brain of a cell. A Central Processing Unit (CPU) is an interaction of flip-flops, being triggered on and off by variations in voltage, acting as the brain of a computer. From the first scrutiny, it is hard to imagine that the two systems could have anything in common. But appearances can be deceiving.

One can think of coding in a mathematical sense as the process by which a given object is converted into a number so number theoretic operations can be performed on the encoded form of the object. Coding, in a genetic context, is the process by which DNA is converted into protein by transcription and translation; some of these encoded proteins are used to synthesise new strands of DNA.

Douglas Hofstadter, in his book Gödel, Escher, Bach: An Eternal Golden Braid, discusses the relationship between the coding of numbers into numbers (so mathematical operations can be performed on them), and the coding of DNA into protein (so the protein can in turn produce more DNA), in terms of a "Strange Loop". In simple computerese, a Strange Loop is just a self-referential or recursive construct.

A proverbial can of worms has been opened. If one pauses to think about the intricacies of the genetic encoding scheme, it can be noticed that there are several instances of Strange Loops that occur. One exquisite example is given in Gödel, Escher, Bach, where Hofstadter presents the recursive nature of a palindromic sequence of DNA. A palindrome is a strand of DNA where one strand reads the same as the opposite strand in the reverse direction, i.e., a strand of the form:

A C G C G T 
| | | | | |
T G C G C A

One can visualise a stack to represent the above palindrome. All one has to do is just push the nucleotides of one strand on the stack. The other strand is the sequence obtained when the stack is popped.

This might not seem like a big deal, but it shows the underlying mathematical beauty behind a strand of DNA that most geneticists take for granted. The fact that is really interesting is that these palindromes are the sites where restriction enzymes (enzymes that restrict or chop up DNA) act and slice nucleic acid. In nature, these enzymes destroy foreign DNA that invades the cell. In the lab, they are used extensively for genetic engineering (to engineer new strands of DNA by cutting up two different strands of DNA and combining them). Therefore, these sites are essential for the survival of an organism. It goes to show that life itself is based on a simple recursive concept, not only because of the nature of palindromic sequences, but also because of the self-referential nature of transcription-translation-replication processes.

There are several other instances of genetic and computing concepts coming together: An organism's genome can be uniquely encoded into a binary number (the individual bits could represent expression or inactivation of genes, presence or absence of amino acids and/or nucleotide base pairs, etc.). Arbitrary operations can then be performed on these binary numbers (calculation of hamming distances to indicate genetic diversification, for instance). Enzymes and substrates can be encoded as binary functions which represent the activation or inactivation of an enzyme in the presence of a given substrate. The possibilities are endless and mind boggling. One should realise that the concept of coding pervades throughout the genetic system and in turn throughout the organism itself. This is particularly noticable when we apply formal language concepts to DNA strands. A "string" of DNA can be formally specified by a grammar, and that string can be parsed according to a set of productions, just as one would parse a natural or programming language string!

I claim that the recognition of Strange Loops in biological systems, and reproducing them within a computer, will result in artificial intelligence (AI) and artificial life. I firmly believe that true AI can be achieved by simulating the cell at the most molecular level. Current AI methods involve a deterministic system to simulate non-determinism that is seen in humans. Instead, we should come up with an encoded form of DNA that can be specified in a computer and have exactly the same properties that the DNA in our cells have. We then provide a means of transcription and translation (replication should occur automatically due to its recursive nature). We now have what we know as the earliest beginnings of life (the primordial germ cell) in a closed system (the computer). The rest is up to the information contained in the strand of DNA that we have encoded. This process, if made to happen, will result in the production of proteins and replication of DNA and the individual cell, to give rise to more cells. Eventually, if the information contained in the original template strand of DNA is complex enough (what better template exists than your own?), then I believe this will give rise to sentient thought.

Reading this, people will think I have visions of grandeur and that I live in a world of science fiction (considering what I've suggested is a computing clone of a human). People may also consider this unrealistic since I have neglected to mention several difficulties that have to be overcome before what I suggest can be possible. Those difficulties are for future students of computational genomics to surmount. It should be kept in mind that this is mostly a hypothetical issue at this point; one that leads to a great deal of abstract thought in both computing science and genomics.

A hundred years ago, people would have scoffed at the idea of a mere automaton beating a human at a game like checkers. That is not the case now. As society becomes more technologically advanced, the ever increasing interface between human and computer will become more intense, where further interface will have to occur at a molecular biological level.


Genes, Macromolecules, -&- Computing || Pseudointellectual ramblings || Ram Samudrala || me@ram.org