Protein folding

Proteins don't have a folding problem. It's we humans that do.

The protein folding problem has not yet been solved. I will try to include information here that relates to this area in this page. At this point, it would be useful to interject that there are two problems here: one is predicting/describing how nature folds up proteins and the other is predicting the structure of sequence in a engineering sense (this includes homology modelling and threading methods). I think they are highly interconnected (you can't have one without the other), but it is clear that these engineering-type techniques have had some amount of limited success in predicting structure from sequence. While it would be nice to understand how nature does it, accurately predicting the structure of an amino acid sequence is all that is needed for this work to move forward.

The other issue is protein structure determination by experimental methods. If we knew the structures to all protein sequences, then our "protein folding" algorithm will simply be a one-to-one mapping. But this isn't the case. Even if it were, we need to be able to determine how (directed and undirected) mutations in amino acids affect protein conformations (and subsequently their function) on the fly. So at the very least, a structure prediction algorithm that works reasonably well is necessary for progress.

My approach to the problem is to consider the set of amino acids as individual interacting agents whose interactions are mediated by some (physical/stereochemical) rules, and the structures produced in the folding pathway is a self-consistent set of interactions that evolve over time (on the millisecond scale). To this end, I've developed a graph theoretic approach which represents protein structures as cliques in a graph. Each clique represents a self-consistent set of interactions. I find the native structure by taking the clique with the lowest energy.

At the second meeting on the critical assessment of structure prediction (CASP2), this method demonstrated some success (especially compared to the other models and CASP1).

At CASP3, we used an on-lattice exhaustive enumeration technique to roughly capture the topology of small sequences (60 residues) in ab initio prediction. Our work in CASP1-3 represents some of the earliest and most successful predictions in the protein folding problem for the most difficult cases.


Genes, Macromolecules, -&- Computing || Pseudointellectual ramblings || Ram Samudrala || me@ram.org