Chou–Fasman method
The Chou–Fasman method is an empirical technique for the prediction of tertiary structures in proteins, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman.[1][2][3] The method is based on analyses of the relative frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein structures solved with X-ray crystallography. From these frequencies a set of probability parameters were derived for the appearance of each amino acid in each secondary structure type, and these parameters are used to predict the probability that a given sequence of amino acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 50–60% accurate in identifying correct secondary structures,[4] which is significantly less accurate than the modern machine learning–based techniques.[5]
Amino acid propensities
The original Chou–Fasman parameters found some strong tendencies among individual amino acids to prefer one type of secondary structure over others. Alanine, glutamate, leucine, and methionine were identified as helix formers, while proline and glycine, due to the unique conformational properties of their peptide bonds, commonly end a helix. The original Chou–Fasman parameters[6] were derived from a very small and non-representative sample of protein structures due to the small number of such structures that were known at the time of their original work. These original parameters have since been shown to be unreliable[7] and have been updated from a current dataset, along with modifications to the initial algorithm.[8]
The Chou–Fasman method takes into account only the probability that each individual amino acid will appear in a helix, strand, or turn. Unlike the more complex GOR method, it does not reflect the conditional probabilities of an amino acid to form a particular secondary structure given that its neighbors already possess that structure. This lack of cooperativity increases its computational efficiency but decreases its accuracy, since the propensities of individual amino acids are often not strong enough to render a definitive prediction.[5]
Algorithm
The Chou–Fasman method predicts helices and strands in a similar fashion, first searching linearly through the sequence for a "nucleation" region of high helix or strand probability and then extending the region until a subsequent four-residue window carries a probability of less than 1. As originally described, four out of any six contiguous amino acids were sufficient to nucleate helix, and three out of any contiguous five were sufficient for a sheet. The probability thresholds for helix and strand nucleations are constant but not necessarily equal; originally 1.03 was set as the helix cutoff and 1.00 for the strand cutoff.
Turns are also evaluated in four-residue windows, but are calculated using a multi-step procedure because many turn regions contain amino acids that could also appear in helix or sheet regions. Four-residue turns also have their own characteristic amino acids; proline and glycine are both common in turns. A turn is predicted only if the turn probability is greater than the helix or sheet probabilities and a probability value based on the positions of particular amino acids in the turn exceeds a predetermined threshold. The turn probability p(t) is determined as:
where j is the position of the amino acid in the four-residue window. If p(t) exceeds an arbitrary cutoff value (originally 7.5e–3), the mean of the p(j)'s exceeds 1, and p(t) exceeds the alpha helix and beta sheet probabilities for that window, then a turn is predicted. If the first two conditions are met but the probability of a beta sheet p(b) exceeds p(t), then a sheet is predicted instead.
References
- Chou PY, Fasman GD (1974). "Prediction of protein conformation". Biochemistry. 13 (2): 222–245. doi:10.1021/bi00699a002. PMID 4358940.
- Chou PY, Fasman GD (1978). "Empirical predictions of protein conformation". Annu Rev Biochem. 47: 251–276. doi:10.1146/annurev.bi.47.070178.001343. PMID 354496.
- Chou PY, Fasman GD (1978). "Prediction of the secondary structure of proteins from their amino acid sequence". Advances in Enzymology and Related Areas of Molecular Biology. Adv Enzymol Relat Areas Mol Biol. Advances in Enzymology - and Related Areas of Molecular Biology. 47. pp. 45–148. doi:10.1002/9780470122921.ch2. ISBN 9780470122921. PMID 364941.
- Kabsch W, Sander C (1983). "How good are predictions of protein secondary structure?". FEBS Lett. 155 (2): 179–82. doi:10.1016/0014-5793(82)80597-8. PMID 6852232. S2CID 41477827.
- Mount DM (2004). Bioinformatics: Sequence and Genome Analysis (2nd ed.). Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. ISBN 978-0-87969-712-9.
- Chou PY, Fasman GD (1974). "Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins". Biochemistry. 13 (2): 211–222. doi:10.1021/bi00699a001. PMID 4358939.
- Kyngas J, Valjakka J (1998). "Unreliability of the Chou–Fasman parameters in predicting protein secondary structure". Protein Eng. 11 (5): 345–348. doi:10.1093/protein/11.5.345. PMID 9681866.
- Chen H, Gu F, Huang Z (2006). "Improved Chou–Fasman method for protein secondary structure prediction". BMC Bioinformatics. 7 (Suppl 4): S14. doi:10.1186/1471-2105-7-S4-S14. PMC 1780123. PMID 17217506.