The Protein Folding Problem

The gene is the basic unit of heredity. Composed of DNA, genes carry the genetic code or imprint (genotype) that is used to describe the appearance and behaviour of an organism (phenotype). How does the specification contained in a linear strand of DNA get converted, to decide various characteristics such as the colour of skin, eyes, and hair?

The DNA in a gene is expressed by first being transcribed to messenger RNA (mRNA) and this message is then translated to form amino acid sequences that are the building blocks of proteins. These proteins are then the carriers of the message contained in the DNA: i.e., a certain pigment, which is a protein, is responsible for skin colour; another protein is responsible for eye colour. Hemoglobin, which gives the red colour to our blood cells and functions as an oxygen carrier, is also a protein. Various other pigments, enzymes, hormones, etc., are all proteins. The gene is indirectly responsible for the phentotype, but it is the corresponding protein that actually results in characteristics of physical appearance. Proteins are therefore the basic unit of life, and an understanding of their structure and function is necessary to understand how life works. In fact, proteins are necessary to replicate the DNA that produced it in the first place! This observation gives way to some very beautiful mathematical concepts.

We understand this transcription-translation process rather well, and a lot of it can be modelled on the computer. However, when the linear amino acid sequence is formed from mRNA, it folds up, in a matter of seconds or minutes, to form a three dimensional (3D) structure (there are exceptions to this generalism). This is the functional protein. This protein now interacts three dimensionally with other proteins (lock and key arrangements, etc.) and this interaction mediates the functions of the organism. In fact, the 3D interactions between proteins and substrates is essentially the organism. We cannot completely understand (any predictions about) the phenotype of the organism without knowing the 3D structure of the proteins in a genome.

Much effort (about forty years worth) has been expended trying to understand how proteins fold up in nature. The goal is to fold up proteins from amino acid sequences which are easy to obtain (these days, the entire genome sequence for several organisms is available) into correct 3D structures (which are very few in number compared to the number of amino acid sequences), theoretically (using a computer to do the actual folding steps). We are not very close to realising this goal, and so the Protein Folding problem remains one of the most basic unsolved problems in computational biology.

Solving the folding problem has enormous implications: exact drugs can be designed theoretically on a computer without a great deal of experimentation. Genetic engineering experiments to improve the function of particular proteins will be possible. Simulating protein folding can allow us to go forward with the modelling of the cell. A detailed description of the mathematic/computing nature of life, of the protein folding process, and its relevance, is given in an incomplete book that goes by the title Genes, Macromolecules, -&- Computers are Related by Strange Loops.

And the strange flavour of AI work is that people try to put together long sets of rules in strict formalisms which tell inflexible machines how to be flexible. ---Douglas Hofstadter

Samudrala Computational Biology Research Group (CompBio) || Ram Samudrala || me@ram.org