
|
FFFEAR fragment library 1.0
NAME
fffear fragment library
- Library of representative 9 residue protein fragments
SYNOPSIS
A small library of common 9 residue protein fragments identified by
cluster analysis of a large representative subset of the PDB chosen
using the FSSP sequence homology database.
REFERENCE
- K. Cowtan (2001), to be published.
DESCRIPTION
The fragment library contains `representative' search models taken
from structures in the PDB. Maximum likelihood search targets are also
provided for 9-residue helices at various resolutions. The
representative fragments were selected by performing cluster analaysis
of all possible fragments in a representative subset of the PDB,
chosen using the FSSP sequence homology database taking into
consideration the structure detemination method and resolution. The
clustering was performed on the basis of the values of the
eigenparameters of the CA distance matrix elements. The all-atom
fragments within each cluster were then subjected to cluster analysis
to identify the densest subcluster, from which a representative
fragment was selected. The empirical fragments are therefore
representative rather than average stuctures.
Maximum Likelihood Targets
The maximum likelihood targets are suitable for the location of
fragments in maps at lower resolutions and in poorly phased maps
(e.g. SIR/SAD). Maximum likelihood targets are provided for a 9
residue helical fragment at resolutions from 4.0 to 8.0 Angstroms. The
files are as follows:
resolution | file |
4.0A | ml-helix-9-4.0.max |
5.0A | ml-helix-9-5.0.max |
6.0A | ml-helix-9-6.0.max |
7.0A | ml-helix-9-7.0.max |
8.0A | ml-helix-9-8.0.max |
There is also a model, ml-helix-9.pdb, which is an average
coordinate model from the same set of fragments from which the
likelihood targets were devised. This model may be supplied on XYZIN
to provide a file of output fragments for visualisation or use in
ffjoin.
Note: These files are standard CCP4 maps with both the mean
and standard deviation of the density packed into a single number
according to the following formula:
map=0.001*(float(nint(1000.0*mean))+stddev) i.e. the mean
density is truncated to 3 decimal places, and the standard deviation,
which must be less than 1, is divided by 1000 and added to
it. Software for this purpose is available from the
author.
Search Models
All fragments are truncated to poly-ALA, except for the turns
which are poly-GLY, since most turns depend on a GLY residue.
The following empirical fragments are included in release 1.2:
- emp-helix-9.
-
Helix motif. The representative helix shows regular geometry. This
cluster is very compact, with all the members very close to the
representative. Using the current similarity metrics, no clear
distinction was observed helices of different pitch.
- emp-strand-9
-
Beta strand motif. The beta strands form a broad cluster with
considerable variation from the representative structure. Curved
stands are more common than the averaged `theoretical' strand provided
with 1.0. The empirical strand selected by cluster analysis is rather
arbitrary in this case.
- emp-turn_a-9
-
Turn motif (a). The turn cluster spilts into two approximately equal
clusters. The two motifs are probably close enough to be used
interchangably at lower resolutions, but the differences may become
noticable at higher resolutions.
- emp-turn_b-9
-
Turn motif (b).
- emp-helixend-9
-
Helix-strand junction motif. Most common `unconventional' motif.
The following theoretical fragments were provided with earlier
releases of fffear:
- theor-helix-10
-
Theoretical 10 residue alpha helix based on inspection of the Ramachandran plot.
- theor-strand-10
-
Theoretical 10 residue beta strand based on inspection of the Ramachandran plot.
- theor-helix-5
-
Theoretical 5 residue alpha helix based on inspection of the Ramachandran plot.
- theor-strand-5
-
Theoretical 5 residue beta strand based on inspection of the Ramachandran plot.
The frequencies of the empirical fragments in the database subset
are as follows:
Fragment type | Frequency | Frequency (exc. overlaps)
|
---|
emp-helix-9 | 5074 | 854
| emp-strand-9 | 775 | 495
| emp-turn_*-9 | 101 | 100
| emp-helixend-9 | 397 | 397
|
Whole database | 11068 | n/a
|
---|
Extended helices and strands give multiple matches at 1-residue
displacements along the chain. The frequency excluding overlaps gives
the size of the maximal non-overlapped set, which is probably more
helpful for most purposes.
AUTHOR
Kevin D. Cowtan, Department of Chemistry, University of York
email: cowtan@ysbl.york.ac.uk
REFERENCES
- K. Cowtan (2001), to be published.
- H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P.E.Bourne (2000) Nucleic Acids Research 28, 235-242.
The Protein Data Bank.
- L. Holm, C. Sander (1996) Science 273, 595-602.
Mapping the protein universe.
- T. Oldfield (1992) J. Mol. Graphics 10, 247-252.
SQUID - A program for the analysis and display of data from crystallography
and molecular-dynamics.
SEE ALSO
fffear, ffjoin
|