FFFEAR fragment library 1.0


fffear fragment library - Library of representative 9 residue protein fragments


A small library of common 9 residue protein fragments identified by cluster analysis of a large representative subset of the PDB chosen using the FSSP sequence homology database.


  • K. Cowtan (2001), to be published.


The fragment library contains `representative' search models taken from structures in the PDB. Maximum likelihood search targets are also provided for 9-residue helices at various resolutions. The representative fragments were selected by performing cluster analaysis of all possible fragments in a representative subset of the PDB, chosen using the FSSP sequence homology database taking into consideration the structure detemination method and resolution. The clustering was performed on the basis of the values of the eigenparameters of the CA distance matrix elements. The all-atom fragments within each cluster were then subjected to cluster analysis to identify the densest subcluster, from which a representative fragment was selected. The empirical fragments are therefore representative rather than average stuctures.

Maximum Likelihood Targets

The maximum likelihood targets are suitable for the location of fragments in maps at lower resolutions and in poorly phased maps (e.g. SIR/SAD). Maximum likelihood targets are provided for a 9 residue helical fragment at resolutions from 4.0 to 8.0 Angstroms. The files are as follows:


There is also a model, ml-helix-9.pdb, which is an average coordinate model from the same set of fragments from which the likelihood targets were devised. This model may be supplied on XYZIN to provide a file of output fragments for visualisation or use in ffjoin.

Note: These files are standard CCP4 maps with both the mean and standard deviation of the density packed into a single number according to the following formula: map=0.001*(float(nint(1000.0*mean))+stddev) i.e. the mean density is truncated to 3 decimal places, and the standard deviation, which must be less than 1, is divided by 1000 and added to it. Software for this purpose is available from the author.

Search Models

All fragments are truncated to poly-ALA, except for the turns which are poly-GLY, since most turns depend on a GLY residue.

The following empirical fragments are included in release 1.2:

Helix motif. The representative helix shows regular geometry. This cluster is very compact, with all the members very close to the representative. Using the current similarity metrics, no clear distinction was observed helices of different pitch.
Beta strand motif. The beta strands form a broad cluster with considerable variation from the representative structure. Curved stands are more common than the averaged `theoretical' strand provided with 1.0. The empirical strand selected by cluster analysis is rather arbitrary in this case.
Turn motif (a). The turn cluster spilts into two approximately equal clusters. The two motifs are probably close enough to be used interchangably at lower resolutions, but the differences may become noticable at higher resolutions.
Turn motif (b).
Helix-strand junction motif. Most common `unconventional' motif.

The following theoretical fragments were provided with earlier releases of fffear:

Theoretical 10 residue alpha helix based on inspection of the Ramachandran plot.
Theoretical 10 residue beta strand based on inspection of the Ramachandran plot.
Theoretical 5 residue alpha helix based on inspection of the Ramachandran plot.
Theoretical 5 residue beta strand based on inspection of the Ramachandran plot.

The frequencies of the empirical fragments in the database subset are as follows:

Fragment type Frequency 
(exc. overlaps)
emp-helix-9 5074 854
emp-strand-9 775 495
emp-turn_*-9 101 100
emp-helixend-9 397 397
Whole database 11068 n/a

Extended helices and strands give multiple matches at 1-residue displacements along the chain. The frequency excluding overlaps gives the size of the maximal non-overlapped set, which is probably more helpful for most purposes.


Kevin D. Cowtan, Department of Chemistry, University of York


  1. K. Cowtan (2001), to be published.
  2. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P.E.Bourne (2000) Nucleic Acids Research 28, 235-242. The Protein Data Bank.
  3. L. Holm, C. Sander (1996) Science 273, 595-602. Mapping the protein universe.
  4. T. Oldfield (1992) J. Mol. Graphics 10, 247-252. SQUID - A program for the analysis and display of data from crystallography and molecular-dynamics.


fffear, ffjoin