RESTRAIN (CCP4: Supported Program)


restrain - refinement program including restraints, rigid body refinement, non-crystallographic symmetry, atomic and group isotropic, anisotropic and TLS thermal parameters, group and coupled occupancies etc.


restrain XYZIN foo_in.brk TLSIN foo_in.tls HKLIN foo_in.mtz XYZOUT foo_out.brk TLSOUT foo_out.tls HKLOUT foo_out.mtz
[Keyworded input]


                      RESTRAIN version 4.6


                      STRUCTURE AMPLITUDES
                      INTERATOMIC DISTANCES
                         GROUP PLANARITY

                         with respect to

                      OVERALL SCALE FACTOR
                     BULK SOLVENT PARAMETERS
                       ATOMIC COORDINATES


Major Contributors:
N Borkakoti
S A Butler
H P C Driessen
M I J Haneef
G W Harris
B Howlin
G Khan
R Laskowski
A J Morffew
D S Moss
A Sali
I J Tickle
Laboratory of Molecular Biology
Department of Crystallography
Birkbeck College, Malet Street
London WC1E 7HX, UK

Contact: Ian Tickle (


      1. The options available
      2. Initial refinement from MIR and MIRAS models
      3. Initial refinement from molecular replacement models
      4. Initial refinement of macromolecule-ligand complexes
      5. Refinement at intermediate resolution
      6. Group isotropic and anisotropic thermal parameters
      7. Individual atomic anisotropic thermal parameters
      8. Occupancy refinement
      1. Structure amplitude weighting
      2. Phase weighting
      3. Energy weighting
      4. Thermal parameter restraint weighting
      1. Description of control data
      2. List of steering data
      3. Full description of the steering data
      1. Description of the data records in the TLSIN file



RESTRAIN is a computer program for the least-squares refinement of protein and nucleic acid structures using X-ray or neutron single crystal diffraction data.  It incorporates facilities for
  • restrained geometry refinement
  • restrained isotropic & anisotropic thermal parameters
  • rigid body refinement
  • non-crystallographic symmetry refinement
  • use of amplitude and phase observations
  • individual anisotropic thermal parameters
  • group anisotropic thermal parameters (TLS)
  • disordered bulk solvent corrections
  • accumulation of full normal matrix for e.s.d.'s

The design and implementation follow papers by Waser (1963), Rollett (1969), Moss (1981), Moss & Morffew (1982), Haneef et al. (1985) and Driessen et al. (1989).

The function minimised is of the form:

M =  SUM [w(f) (|Fo| - G.|Fc|)2]
   + SUM [w(p) (PHIo - PHIc)2]
   + SUM [w(d) (d(t) - d(c))2]
   + SUM [w(b) (b(o) - b(min))2]
   + SUM [w(U) delta-U2]
   + SUM [w(Ua) delta-Ua2]
   + SUM [w(v) V]
   + SUM [w(c) (d(t) - d(c))2]            (1)
w(f) = weight for structure amplitude,
Fo = observed structure amplitude,
G = scale factor,
Fc = calculated structure amplitude,
w(p) = weight for phase,
PHIo = estimated phase (from isomorphous and/or anomalous data),
PHIc = calculated phase,
w(d) = weight for restrained distance,
d(t) = target interatomic distance,
d(c) = calculated interatomic distance,
w(b) = weight for non-bonded interactions,
b(o) = observed distance between two non-bonded atoms,
b(min) = minimum distance allowed for such atoms,
w(U) = weight for isotropic thermal parameter difference,
delta-U = isotropic thermal parameter difference for restrained atoms,
w(Ua) = weight for anisotropic thermal parameter difference,
delta-Ua = along-bond component of anisotropic thermal parameter difference for restrained atoms,
w(v) = weight for planarity restraints,
V = mean square deviation from best plane of a planar group of atoms,
w(c) = weight for chirality restraints.

The non-bonded interaction is only operational when b(o) < b(min) and chirality restraints are applied as distance restraints along the edges of chiral tetrahedra. Equation (1) may be written as a function of three terms: M = M(a) + M(b) + M(c).  M(a) is the first term and is the one conventionally found in crystallographic least-squares procedures.  M(b) is the second term which allows the use of estimates of phases from isomorphous and/or anomalous data.  M(c) is the sum of the remaining terms and represents pseudo-potential energy terms.

The function M may be minimised with respect to a selection of the following parameters:

  • overall scale factor
  • overall atomic isotropic parameter
  • overall atomic anisotropic parameter
  • bulk solvent parameters
  • atomic coordinates
  • rigid body rotations and translations
  • non-crystallographic symmetry operators
  • atomic and group isotropic thermal parameters
  • atomic and group anisotropic thermal parameters
  • group TLS tensors
  • atomic, group and coupled occupancies

Although RESTRAIN has been written primarily for refinement of macromolecular structures, the use of a user defined dictionary for interatomic and planar restraints and other options allows the user to specify additional interatomic restraints and planes, and means that virtually any structure can be refined by the program. The program at present uses a four-Gaussian expansion of scattering factors (INTERNATIONAL TABLES FOR X-RAY CRYSTALLOGRAPHY, Vol. IV).  Coefficients for this expansion suitable for X-ray or neutron diffraction may be read from the dictionary.

The program is completely general and may be used for any number of reflections in any space group.  The program can be used for any size of problem.  The number of atoms which may be refined is only limited by the available memory of the computer used.  Array sizes are increased by a global change of the relevant variables in PARAMETER statements in an INCLUDE file (, followed by re-compilation of the source file.

At Birkbeck College this program has been used for refinement of protein and nucleic acid structures using X-ray or neutron diffraction data.  It has generally been used in conjunction with model building using an interactive graphics system.  The program has been set up so that the input/output interfaces easily with the graphics model building program O (Jones 1991) and FFT programs.  Coordinate files have the standard PDB format.  Reflection input files may be either formatted or unformatted (CCP4 MTZ).


RESTRAIN has been written in standard FORTRAN 77 (ANSI X3.9-1978) with the sole exception of the INCLUDE facility for inserting the common blocks in the individual subroutines.  The program has been designed to take advantage of vector or scalar processing computers.  To obtain the highest speed, space group specific versions of SG0001 have been written for some of the most common space groups.  However, not all options of RESTRAIN are possible when using them (NCS, anisotropic and TLS).  There is no difference in the steering parameters when using them.  Currently there are subroutines available for:
P2 & C2 (nos. 3 & 5)
P21 (4)
P222 (16), C222 (21), F222 (22) & I222 (23)
P2221 (17) & C2221 (20)
P21212 (18)
P212121 (19) & I212121 (24)
P42 (77)
P41212 (92)
P42212 (94)
P43212 (96)
P61 (169)
P65 (170)
P6122 (178)
P6522 (179)

Note that some of these subroutines (the monoclinic and orthorhombic ones) may be used for a higher symmetry space group provided it is a super-group with the same origin.  For example P213 is a super-group of P212121 with no origin shift.

User friendliness of input/output has been an important criterion in the design of RESTRAIN.  No preparation programs need be used.  The authors have endeavoured to print sensible error messages on job failure, and to intercept lethal input.  Any suggestions for improvement will be welcome.



In order to run RESTRAIN you will need either 3, 4 or 5 input files.  These are listed below together with the names by which they are referenced in the RESTRAIN output.
   File                   File name        Explanation

 - Script with control      none           section 3.1
   and steering data
 - Dictionary              DICTION         section 3.2
 - Coordinates              XYZIN          section 3.3
 - Optional group           TLSIN          section 3.4
   thermal parameters
 - Optional reflections  HKLIN or REFIN    section 3.5

Alternatively you may have the control and steering data in an input file separate from the job script.

Care must be taken in preparing the coordinates for refinement.  After each polymer chain a TER record must be inserted.  This includes breaks in the chain due to one or more missing residues.  The residues need not be numbered sequentially and the residue labels may contain non-numeric characters at any position.  However, to maintain compatibility with the PDB standard format it is advisable to restrict the use of non-numeric characters to alphabetic characters, and then only in the last character position (residue insertion code).

The C-terminal residue of a protein chain may have an extra O (carboxyl) or N (amide) atom, but it must be put in a separate residue (CAR or CAM) with the atom label OXT or NXT.  All atoms not contained in chains must be supplied as HETATMs.  The atom labels in a residue must correspond exactly (i.e. in case and justification) with the supplied dictionary, and there must be no missing or extra atoms.  Missing atoms can be dealt with by temporarily renaming the residue (e.g. for missing protein side-chain rename to GLY or ALA).  Extra disordered atoms must be supplied after the TER record as HETATMs; extra distance restraints will have to be supplied for these atoms.

The PDB file may contain either Uiso's or Biso's, but the appropriate steering parameter must be specified (ISO=true and BINPUT=false or true respectively).  The file may also contain anisotropic U's in the standard PDB format.

After previous refinement and extensive rebuilding you may want to reset large U or B values for atoms incorrectly positioned before rebuilding (e.g. U > 0.8 or B > 64A2) to more reasonable starting values (e.g. U=0.2 or B=16A2).

The atomic coordinates in the polymer chains need not be ordered in each residue in the same way as the atoms in the residue are ordered in the dictionary.  If they are not they will be re-ordered and the output file of atomic coordinates will then be produced in dictionary order for subsequent cycles.  Alternatively, set TESTIN=true and ORDER=true to use the program to order and analyse the file without carrying out any refinement.

After each run (1 or more cycles) 1 or more output files will be created:

      File                  Filename       Explanation

 - coordinates               XYZOUT        section 4.2
 - TLS parameters            TLSOUT        section 4.3
 - reflections           HKLOUT or REFOUT  section 4.4
 - normal matrix             MATOUT        section 4.5

Furthermore the listing of the run (section 4.1) will have to be examined closely, since the steering data may need to be updated for the next run, especially G, U, SB1 and SB2 (section 3.1.3).  You should update the cycle number CYCNO, so that you keep track of how many cycles you have done, and later relate this to the R-factor. 

If you are refining NCS parameters you will need to supply updated parameters.  You may also want to change the weighting coefficients for the reflections WF(i), section 3.1.3.  All the required parameters are always printed at the end of every log file whenever new values have been computed; these can be pasted into the steering data ready for the next run.  Refined coordinates and group thermal parameters can be read back in by the program without modification.  In order to obtain output reflections define HKLOUT or REFOUT (section 3.1.1).

The input that is necessary and the sections that are relevant to you depend on the application for which you intend to use RESTRAIN.  There are basically two categories:

  • regularising the geometry of a structure according to the dictionary where no reflection data are used;
  • refining a structure where reflection data are used, and restraints are optional.


If you wish to regularise the geometry of your model structure, omit the definition of HKLIN or REFIN, or set FREF=false, which disables structure factor refinement.  Regularisation may be useful after heavy rebuilding of a coordinate set, and may point to gross errors (look at the weighted differences between calculated and observed distances), which need manual correction on the graphics before continuing.  Usually a few cycles suffice before reflections are used.  You may have to raise MFACR to 0.2 or 0.3 to assist the solution of the normal equations, bearing in mind that the larger the value of MFACR the smaller the shifts in your output coordinates will be.


For refinement against X-ray or neutron diffraction data, use the default FREF=true .  Again the input you need will depend on the application for which you intend to use RESTRAIN.  The most common categories are outlined below.  If your category does not appear, or you are not sure what you want to do, seek assistance.

2.3.1 The options available.

The following options are available, either separately or in combination to refine a set of coordinates from low to high resolution.  However note that some combinations do not make sense, and will cause abnormal program termination, for example if both RIGID coordinate groups and UISO/UANISO/TLS thermal parameter groups are defined, the thermal parameter groups must be completely contained within the coordinate groups, otherwise application of the refined RIGID body rotations and translations to the thermal parameters would destroy the correlations within the thermal parameter group.

refining an overall thermal parameter (ISO=false, ISOREF=false);
reading individual atomic isotropic thermal parameters without refining (ISO=true, ISOREF=false);
refining individual atomic isotropic thermal parameters (ISOREF=true);
refining group isotropic, anisotropic or TLS parameters for selected groups;
refining individual atomic anisotropic thermal parameters (ANISO).

refining rigid body rotations and translations: CONSTRAINED REFINEMENT (RIGID=true + RIGID group specification(s));
refining rigid body rotations and translations and individual atomic positions: CONSTRAINED-RESTRAINED REFINEMENT (RIGID=false + RIGID group specification(s));
refining non-crystallographic symmetry: CONSTRAINED REFINEMENT (RIGID=true + NCSYMM specification(s));
refining individual atomic positions with averaging of non- crystallographic symmetry: CONSTRAINED-RESTRAINED REFINEMENT (RIGID=false + NCSYMM specification(s));
refining individual atomic positions: RESTRAINED REFINEMENT (RIGID=false, no RIGID groups).

refining atomic, group and coupled occupancies.

2.3.2 Initial refinement from MIR and MIRAS models.

You will normally start by setting ISO=false to get an overall thermal parameter U and scale factor G.  At this stage you may still want to include the MIR or MIRAS phases in the refinement. Set PHAS=true and make sure that the input reflection file contains these phases.  However, if your low-resolution model is reasonable, you may not want to use these data.

Unless phasing extends to a resolution of better than 3Å you may find that progress should begin by breaking the structure up into rigid body segments and refining these as strictly rigid bodies. Set RIGID=true and specify RIGID groups; structure outside the rigid groups will not be refined.  Such segments may be as small as one residue or one side chain.

If the bonds between such segments become seriously disrupted during rigid body refinement, then those parts may have to be rebuilt on a graphics system; otherwise the structure may be annealed by restrained refinement.  Remember that refinement cannot usually correct errors which are larger than one third of the high resolution cut-off.

Regions of the structure which are more highly disordered may have to be omitted initially if maps show no clear main chain density.  In this case the structure will have to be broken up into extra chains with TER records at the end of each chain.  If the main chain density is clear and the side chain is unclear or the sequence at this point is uncertain then the residue should be treated as ALA or GLY in the case of proteins.  Remember that the number of atoms in a residue in the coordinates must correspond with the number of atoms in the residue of that name in the dictionary.

Initially the data-parameter ratio will be unfavourable and the normal matrix for the positional parameters ill-conditioned.  In the first cycles at low resolution you will normally get large shifts.

2.3.3 Initial refinement from molecular replacement models.

In the case where only one molecule is present in the asymmetric unit it is best to start by refining the six rigid body parameters from the molecular replacement by using RIGID=true and the RIGID specification to delineate the molecule.  After convergence it may be possible to break the structure up into large chunks, e.g. in the case of domains.  See sections 2.3.2 and 2.3.5 for further information.

In the case where more than one molecule is present in the asymmetric unit, one may want to proceed as with one molecule. However, it is possible at low to intermediate resolution to save on time and parameters by refining the structure making use of non-crystallographic symmetry and then only to rebuild one molecule on the graphics before further refinement.  There are two modes to deal with non-crystallographic symmetry.

The sole purpose of MODE 1 is to enable the refinement of the relative positions of up to 14 identical molecules (or subunits) in the crystallographic asymmetric unit by applying rigid body refinement.  MODE 1 is usually used in the earlier stages of refinement in which case the transformations relating the molecules may come from molecular replacement studies.

The orthogonal coordinates of one molecule are supplied along with the transformations operating on these coordinates which generate the coordinates of up to 14 molecules. 

Set RIGID=true and define one or more RIGID bodies as before, and the molecules will then be refined as independent rigid bodies.  Output will be the refined coordinates of the generated molecules, and the refined transformations, which should be input to the next cycle.

Note that the program will not notify you if the same molecule is generated twice.  This may happen if a dimer is supplied and also generated.  You therefore must make sure that only one molecule and the correct transformation are used by the program.

The purpose of MODE 2 is to assist a user who has more than one molecule (or subunit) in the asymmetric unit and who wishes to refine these molecules while imposing the condition that they are structurally identical.  This is useful in the earlier stages of refinement (possibly after the use of MODE 1) as it saves having to manually adjust the coordinates of more than one molecule. 

Input is the same as for MODE 1 except that RIGID=false and RIGID specifications are absent (see above).  The transformations supplied are used as extra "equivalent positions" and the refinement produces an asymmetric unit where the molecules are identical and tend to an average of the real molecules.

The coordinates of only one molecule are written out and the same transformations are supplied for subsequent cycles.  As in MODE 1, it is important to make sure that you do not generate a molecule that has already been read in.

See sections 2.3.2 and 2.3.5 for further information.

2.3.4 Initial refinement of macromolecule-ligand complexes.

When small errors in isomorphism are present, it may be useful to refine the protein in CONSTRAINED mode before difference Fouriers are calculated.  Set RIGID=true and use RIGID groups.  Use only an overall Uiso.  After difference Fouriers and building in the ligand, it may be advisable to refine the ligand and the macromolecule in CONSTRAINED-RESTRAINED mode by setting RIGID=false and defining RIGID groups (see above).  See sections 2.3.2 and 2.3.5 for further information.

2.3.5 Refinement at intermediate resolution.

How to proceed at intermediate resolution has already been discussed partially in sections 2.3.3 and 2.3.4.  Generally it may be still useful to do some cycles of CONSTRAINED-RESTRAINED refinement before proceeding to RESTRAINED refinement only.  Set RIGID=false and use RIGID records to delineate "rigid" bodies.  This will accelerate convergence.  Finally RESTRAINED refinement is obtained by removing the definitions of any RIGID groups.

It may now be useful to refine individual isotropic thermal parameters.  When these are already present in the input coordinates they can be used and refined using ISO=true together with BINPUT=false (if Uiso's are present in the PDB file), or BINPUT=true (the default, if B's are present), together with the default ISOREF=true.  When not present in your input coordinate data set, use ISO=false and ISOREF=true in the initial run.  Having ISO=true and ISOREF=false will merely indicate that you want to read isotropic thermal parameters, but not refine them.  This can be useful for molecular replacement models.  In order to get meaningful isotropic thermal parameters it is usually necessary to include data higher than 3Å resolution.  Note that MFACR (see section 3.1.3) is used to remove ill-conditioning.  The input Uiso for each atom is checked and reset if necessary.  The lowest allowed Uiso is set with ULIML; the highest by ULIMH.

2.3.6 Group isotropic and anisotropic thermal parameters.

Physical background:

In this option the thermal parameters of atomic groups are refined using the approximation that the groups possess, either partly or wholly, "correlated amplitude" motion.  This is not necessarily the same as "rigid body" motion because the Bragg scattering is sensitive only to the amplitudes of vibrating atoms, not to their relative phases.  Small rigid groups of bonded atoms such as the planar aromatic rings in HIS, PHE, TYR and TRP are likely to vibrate as rigid bodies, because the mean square vibration amplitude of a typical bond is very small (~ 0.002Å2). However larger groups such as secondary structure elements or domains are likely to have larger internal motions, where sub-structures have vibration amplitudes which are correlated, but whose relative phases are not (e.g. in anti-phase, as opposed to in phase); this correlated amplitude motion will be indistinguishable from true in-phase rigid body motion if only Bragg scattering data is used.

The atomic groups may be whole molecules, units of secondary structure (e.g. alpha helices) or they may be pseudo-rigid side groups such as phenyl rings, imidazole, carboxylate, guanidinium or amide groups.  When units of secondary structure are chosen, there is an option to include main chain atoms only.  For small groups (i.e. < 20 atoms) data at high resolution (e.g. 1.5Å) may be required for success.  It should also be remembered that the model assumes harmonic thermal parameters and this may not be valid for side groups on the surface of a macromolecule.

There are three group thermal parameter options: UISO, UANISO and TLS.  The UISO option refines 1 parameter per group, the UANISO option 5 or 6 per group, and the TLS (translation/libration/screw-rotation) option 19 or 20 per group.  This is still likely to be far fewer than the 6 per atom required in full anisotropic refinement (see section 2.3.7).  The potentially rigid groups in proteins which may be suitable are aromatic rings, the "propellors" of ASP/ASN, GLU/GLN and ARG, ligands such as heme, the secondary structure elements, domains, the entire molecule, or even the entire contents of the asymmetric unit.

For the UANISO and TLS options it is possible to refine the atomic isotropic thermal parameters in addition to the group parameters; this reduces the number of group parameters from 6 to 5 and 20 to 19 respectively (because the isotropic component of the T tensor is then not used, and is set to the mean Uiso).  This is in fact the default if atomic isotropic thermal parameters are refined (ISOREF=true); if this option is not desired it must be deselected (see option NOATOM in the description of parameters).

In order to analyse the TLS tensors, the output files may be used as input to the CCP4 program TLSANL.  The resulting anisotropic tensors may be visualised by using the output coordinate file to compute very high atomic resolution (0.7Å) structure factors, and then contouring the Fcalc electron density with a program such as O.

2.3.7 Individual atomic anisotropic thermal parameters.

If your data extend to atomic resolution it will be possible to refine individual atomic anisotropic thermal parameters using a 6 element anisotropic U tensor.  This type of refinement can be started up by defining groups using the ANISO keyword. The isotropic U value of each atom will be put in the diagonal elements of the anisotropic U tensor (U11, U22, U33) to use as a starting value.

After refinement the new anisotropic U tensor (U11 U22 U33 U12 U13 U23) will be written to the coordinate file behind the ATOM record in a separate record identified by ANISOU using the standard PDB format.  These records will then be used in future runs for reading and writing the anisotropic tensors.

2.3.8 Occupancy refinement.

Uncoupled group occupancy refinement may be useful for protein- inhibitor complexes, where the inhibitor is not present in stoichiometric amounts.  The occupancy groups are defined in the control data with records using keyword OCCUp.  The contiguous segment(s) comprising each group is/are specified by the starting atom number as present in the coordinates, the number of atoms in the segment (may be just 1 atom), and the group identifier using free format.  Use as starting occupancy for the atoms in the group a value as suggested by the electron density.

Coupled alternative sites may be most easily created by using extra dictionary entries (see section 3.2).  e.g. call the short alternative site residue ASX if it is the alternative site of the side chain for an ASP.  These alternative site residues should then be added to the coordinate data set as ATOM records after chains terminated by TER, and effectively treated as separate protein chains themselves by inserting a TER record.  Both the first and subsequent sites are specified as described above, but with different coupling identifiers appended; the group identifier must be the same for these coupled sites.  It will be useful to use an extra restraint to tie the alternative site(s) down to the atom where it diverges, and extra restraints will also be required between atoms defined as HETATM's (see XTRDIST in section 3.1.1).  Van der Waals repulsion is automatically turned off for coupled groups.  It is always important to study the U values for the atoms in alternative sites because of the strong correlation between occupancy and U.  Too large a U value with a low occupancy either means that the coordinates have been built in the wrong position, or that the site is not "real".  A reasonable starting atomic isotropic U value for the second site is 0.2Å2.


Weighting may assume two distinct purposes in the refinement of protein structures.  Firstly it may be used to drive the refinement down the correct minimum in as few cycles as possible.  This will be used in the initial stages of a refinement, where as many errors should be corrected as possible.  This is achieved by coarse resolution cut-off, and/or by using a small amplitude cut-off and/or a SIGMA type cut-off, and by down-weighting higher angle reflections in the remainder.  This may be called convergence weighting.

Secondly in the latter stages of refinement, the weights may be used to reflect the expected discrepancies between observations and target values or functions and the corresponding quantities calculated from the model.  As the model improves, higher resolution data may be included, and the higher angle data and weak reflections may be given higher weighting until the sum of the weighted residual squared over all observations and restraints equals the total number of observations and restraints minus the total number of variable parameters.  This may be called statistical weighting.  The weighting strategies to be adopted in the two cases may be quite different.

When applying any weights one has to recall the function that is minimised:

M = SUM [w(f) (|Fo| - G.|Fc|)2]                         [=M(a)]
  + SUM [w(p) (PHIo - PHIc)2]                           [=M(b)]
  + SUM [w(d) (d(t) - d(c))2] + SUM [w(b) (b(o) -b(min))2]
  + SUM [w(U) delta-U2] + SUM [w(Ua) delta-Ua2]
  + SUM [w(v) |V|2] + SUM [w(c) (d(t) - d(c))2]         [=M(c)]

The factors w(f), w(p), w(d), w(U), w(Ua), w(v) and w(c) are the weights, the choice of which determines the relative influence of the terms in the function M which is to be minimised.  It should be noted that only relative weights are significant.  The choice of the absolute value of the weights does not influence the course of refinement.  The relative contributions to the residual will be found in the general weighting analysis table (***ANALYSIS OF FUNCTION MINIMISED***).  The weights are not directly supplied by the user.  Instead weighting coefficients are supplied which are used in a formula to generate the weights.  The formulae and their use are discussed in the sections below.

2.4.1 Structure amplitude weighting.

If the structure factor model perfecty described the diffraction of the macromolecule, the theory of least squares shows that the structure amplitudes should be given weights which are inversely proportional to their variances.  However, due to the disorder present in macromolecular crystals, the structure factor model is always significantly in error.  The final values of residuals and R factors usually owe more to errors in the model than due to experimental errors in the diffraction data.

The object of weighting the structure amplitude terms is to ensure that terms heavily affected by model or experimental errors are down-weighted.  Several weighting schemes may be employed.

  • The simplest (SCHEME=1) applies equal weights to all the reflections.  For SCHEME=1 the weight is given by
      w(f) = WF(1).

    This is the scheme that should be employed only in the initial stages.

  • A second scheme (SCHEME=2) is a modified form of one proposed by Rees (1976) and involves the use of the standard deviations of F(obs) which must be supplied on the reflection file.  The weights are given by the formula
      w(f) = WF(1).SWF(2) / [WF(3).sigma(Fo)2 + WF(4).Fo2]
    where S = sin(theta)/lambda

  • Scheme 3 is that of Cruickshank (1965) and can be used when experimental standard deviations are not available or not trusted.
      w(f) = WF(1) / [WF(2) + WF(3).Fo + WF(4).Fo2]

  • Scheme 4 is derived from Nielsen (1977) and employs a more sophisticated formula than scheme 3.
      w(f) = WF(1) / [WF(2) + WF(3).Fo + WF(4).Fo2 +
                      WF(5).S + WF(6).S2 + WF(7).Fo.S]

Note that the previously suggested procedure of adjusting the WE coefficients on each cycle is not recommended.  The current recommendation is to leave the WE coefficients set at their default values, and adjust the WF coefficients only after a rebuild.  In any case because the structure factor and energy weights are purely relative, adjusting only WF(1) to raise or lower the F weights will give the same effect as simultaneously adjusting the geometry weights.

Alternatively the weighting coefficients can be chosen manually so that the mean values of
w(f).(|Fo| - |Fc|)2 are approximately independent of Fo and/or resolution (within a factor of two or three).  These mean values may be inspected in the tables ***ANALYSIS OF STRUCTURE FACTOR TERMS*** supplied in the output where they are displayed in bins dependent on sin(theta)/lambda and Fo.

It is recommended that the user starts with scheme 1 and then when most of the ordered atoms have been refined, scheme 2 should be selected if standard deviations are available, otherwise use scheme 3 or 4.  The choice of weighting coefficients is not a precise science but the resulting parameters are not likely to be critically dependent on it.

For schemes 2, 3 and 4, the optimum coefficients to make the mean values of w(f).(|Fo| - |Fc|)2 approximately independent of Fo and/or resolution, will be calculated by Nielsen's method before the first refinement cycle if USEWFC is set true, and the same values will then be used for all the cycles in the job.

2.4.2 Phase weighting.

Phase observations from isomorphous replacement or anomalous scattering measurements may be weighted using the figure of merit. The weighting formula is designed to weight down those reflections according to the difference between the observed and calculated values.  Centric reflections are always given zero weight as they cannot contribute to a refinement.  The formula is

w(p) = WP(1)*FOM*[180 - |PHIo - PHIc|WP(2)]2
The figure of merit (FOM) must be read from the reflection file. The best way to choose WP(1) and WP(2) requires further research. Use the weighting analysis table for guidance.

2.4.3 Energy weighting.

Energy weighting involves the application of geometric restraints to the structure during refinement.  The paucity of reflection data in a macromolecular refinement usually means that large random errors in atomic coordinates occur when an unrestrained refinement is attempted.  These errors result in poor molecular stereochemistry.

Energy weighting uses a dictionary of target interatomic distances and standard deviations which govern the allowed deviations from the target values.  Alternatively, the weights may be controlled by use of weight coefficients (WE) supplied in the steering data.

    Weight            Case                Ideal r.m.s deviation

W(d) = WE(1)2  if d(t) < 2.12Å                 0.02Å
W(d) = WE(2)2  if 2.12Å < d(t) < 2.625Å        0.04Å
W(d) = WE(3)2  if d(t) > 2.625Å                0.05Å
W(v) = WE(4)2  for planar peptide groups       0.01Å
W(c) = WE(5)2  for all other planar groups     0.01Å
W(c) = WE(6)2  for edges of chiral tetrahedra  0.02Å

Chiral restraints are applied as distance restraints along the edges of chiral tetrahedra with d(t)<=2.12A.  In all cases WE(i)2 is the weighting coefficient that decides the relative weight of the particular energy restraint and the other terms in the function minimised.

Softer restraints than those suggested above may assist convergence at earlier stages.  Note that application of harder restraints at too early a stage may severely reduce the rate of convergence.  Because the structure factor and geometry weights are purely relative, the effect of reducing all the geometry weights can be obtained by increasing the weight coefficient WF(1).

Relevant information about the weighting can be found in the table under the heading:


2.4.4 Thermal parameter restraint weighting.

There are 2 weighting coefficients (WU(1) and WU(2)) for the thermal parameter restraints which aim to minimise the difference between thermal parameters of pairs of atoms whose interatomic distance is also restrained (i.e. 1-2 and 1-3 bonded atoms), though the two types of restraint can be applied independently.  WU(1) applies to isotropic thermal parameters, and WU(2) to anisotropic thermal parameters (but not group thermal parameters as these are already constrained).

The standard deviation of the half-bond restraint for an atom in the isotropic and anisotropic cases (where d is the interatomic distance) is given by the equations:

   siso  = WU(1).U2iso

   saniso = WU(2).d2

The weight for the restraint on the thermal parameter difference between atoms i and j is then:

   wij = 1/(s2i + s2j)

The target of the restraint is also different in the two cases; in the isotropic case it is simply the difference between the Uiso's; in the anisotropic case it is the difference between the components of the anisotropic tensors along the line joining the atoms.

There are sound statistical and physical reasons for using different forms of the weight in the isotropic and anisotropic cases.

In the isotropic case the differences are purely statistical in origin: they are almost entirely due to the assumption of isotropy, not to any actual difference in thermal parameters.  In reality atomic vibrations in a macromolecule, in particular in loosely bound regions such as chain termini and side-chains will have large anisotropic and/or librational components, so that the isotropy assumption is only very approximate. 

The distribution of Uiso's is always very skewed, i.e. most cluster near the modal value, but with a long tail of large values.  Consequently an atom with a value near the mode is most likely to find itself next to one with a similar value giving a small difference, whereas one with a value much larger than the mode will also most likely be near one with a value near the mode, giving a large difference.  This leads to a dependence of the r.m.s. difference in Uiso proportional to the square of the mean Uiso, with a proportionality factor found empirically from refinement of high resolution (1Å) structures of ~ 1; this is the weighting coefficient WU(1).

In contrast, in the anisotropic case, where the difference is between the along-bond components of the tensors, the differences are real and reflect the physical situation.  From IR spectroscopy it is found that the mean square amplitude of a typical (single C-C bond) bond vibration at ambient temperature is about 0.002Å2 (equivalent to delta-B ~ 0.16Å2), which is very rigid in comparison with the atomic vibrations (B typically > 5 to 10Å2).  The atomic vibrations therefore arise almost entirely as a consequence of bond librations.

In the anisotropic case, therefore, the r.m.s. difference in the thermal tensor components should be independent of the isotropic thermal parameters.  The difference between thermal tensor components will however be larger across bond angles (1-3 restraints), so a dependence on the square of the interatomic distance is used.  The default value of the weighting coefficient WU(2) (0.01) is rather larger than the expected difference (0.002).  This is because if the correct value is used initially the restraints are so tight that the refinement often fails to converge.  It may be possible to use the correct value of WU(2) (0.0007) once convergence has been attained.


There are 5 input files to RESTRAIN.
     FILE                        FILE NAME       EXPLANATION

- control and steering data                      section 3.1
- dictionary                     DICTION         section 3.2
- atomic coordinates             XYZIN           section 3.3
- group thermal parameters       TLSIN           section 3.4
- reflections                REFIN or HKLIN      section 3.5


The control and steering data in the standard input data set both consist of a number of optional items.  Within each of these data blocks the order of these items is immaterial.

Any record or part of a record can be temporarily "commented out" by use of the ! or # character; this causes all subsequent characters on the same line to be skipped.

Each record in the control data is identified by a keyword, but only the first 4 characters are significant and case-insensitive.  Any other input required follows immediately in free-format (space-separated) on the same line, with the sole exception of the keyword STEER where the data must follow on the succeeding line(s).  Data records (but not comments) may be continued by finishing a line with a "-".  The keywords available are:


3.1.1 Description of control data.

Each of the keywords DICTION, XYZIN, TLSIN, HKLIN, REFIN, XYZOUT, TLSOUT, HKLOUT, REFOUT, MATOUT and DESOUT specifies a filename.  Files may be also connected using the CCP4 logical names matching these keywords.  The keyword information overrides the logical names.

TITLE (optional)
This is arbitrary text.

If present sets harvest directory permission for first open of $HOME/DepositFiles to rwx------ .  default is rwxr-xr-x

If present the harvest file is opened in current working directory.  The default is to output the file to $HOME/DepositFiles/ProjectName/DataSetName.ProgramName .

Project name.  If given with DNAME
Dataset name.  If given with PNAME then harvest will output a file.  This dataset name is the name of one of the diffraction data sets used in a particular project.  No default.

The filename of the dictionary used (often defaulted, but see below for alternatives).

The filename of the input coordinates in PDB format.

The filename of the input group thermal parameters (optional).

The filename of the input unformatted (MTZ) reflection file.

In the case that no HKLIN or REFIN name is given, the only possibility is regularisation.

The input column assignments for the MTZ file.

The program labels are 'H', 'K', 'L', 'FP', 'SIGFP', 'PHIB', 'FOM' and 'FREE' with the conventional meanings.  For conventional amplitude refinement only the FP and SIGFP columns need be assigned.  To calculate free R factors, assign label FREE to a free R flag column generated by `freerflag' (or otherwise).

The filename of the input formatted reflection file; either HKLIN or REFIN may be used, but not both at the same time.

Reflection file format.

This record contains the format for the reflections when using a formatted reflection file (section 3.5).  This record is not required for unformatted reflection files.

The filename of the output coordinates.

The filename of the output group thermal parameters.

The filename of the output unformatted (MTZ) reflection file.

This requires that the input reflection file be also MTZ format.  It is not possible to have an input formatted and an output unformatted file, or vice versa.

The optional output column assignments for the MTZ file.

The program labels are 'H', 'K', 'L', 'FP', 'SIGFP', 'FC', 'PHIC' and 'FREE'.

The filename of the output formatted reflection file.  This requires that the input reflection file be also formatted.

The filename of the output full normal matrix.

This is used to obtain individual standard devations of the parameters by matrix inversion, which is performed by a separate program (FUMAIN*).
*FUMAIN is not yet a part of CCP4.

Be aware that the process of accumulating the terms of the full normal matrix and then inverting it is extremely CPU and memory intensive!

The filename of the output restraint design matrix.

This feature is experimental.

Individual atomic anisotropic thermal parameter definition.

Each ANISO record defines a contiguous segment of atoms in the coordinate file whose isotropic thermal parameters are to be converted to individual anisotropic parameters by setting each of the diagonal elements of the U tensor to Uiso and the off-diagonal elements to zero.  The new anisotropic tensors will be written to the coordinate file in the standard PDB format.  This option should therefore not be used for atoms that are already defined as anisotropic (unless you really want to reset them).  This option should not be confused with the group thermal parameter option TLSIN.  For each segment atoms may be selected by name or by using various keywords.  Each contiguous segment is specified as:

- Keyword ANISo
- Starting atom identifier of anisotropic segment (character string)
- Ending atom identifier of anisotropic segment (character string)
- Optional selection character string (character string)

An atom identifier is interpreted as a character string, not as an integer, and is matched with the atom number in columns 7 to 11 of the PDB ATOM or HETATM record.  Alternatively (and probably more conveniently, as some programs may change the atom numbers), the atoms may be specified by their residue and atom labels joined by a ".", for example: 34A.CG1 .  If the coordinate file contains chain identifiers, the chain id must be prepended, including the correct number of spaces.  If the resulting string contains any spaces it must be completely enclosed by apostrophes, for example: 'C 13.N'.  The atom name may also be omitted leaving both the residue label and the final "."; in that case the range specified either starts at the first atom of the residue or ends at the last atom.  Atom and residue labels, and also residue names are always case sensitive (usually only capital letters are used).

If the second component of the range specification is omitted or given as a null string (i.e. double apostrophe:  ' ' in the input), it is set equal to the first component, i.e. specifying a single atom or residue.  If both are omitted or given as nulls, the range is set to the entire coordinate file.  Note that if you want to specify the optional selection string, you can't leave out either of the range components, you must supply both of them as either non-null or null, so that the selection string is then the third one on the line after the keyword.

Beware that the range specification applies to the file AFTER any re-ordering is done, so it is probably safer to re-order first, then check the coordinate file and specify the ANISO ranges in a separate job.

In the optional selection string, atom names have to conform to the PDB convention.  All atom codes found in the PDB atom files can be used.  Additionally, four group codes can also be specified: SDCH, MNCH, ALL and NOT.  MNCH will select all mainchain atoms (' N ', ' CA ', ' C ' and ' O '), SDCH selects all non-mainchain atoms, ALL selects all atoms and NOT negates the selection of atom types on the line.  The order of atom specifiers is not important.  If no atom specifier is given, the default is ALL. 

Non-crystallographic symmetry operator.

Each record contains either the rotation or the translation component of an orthogonal non-crystallographic symmetry operator.  The 9 elements for the rotation matrix are read in ROW-wise (beware other programs which read and write column-wise matrices!).  In the case of two molecules in the asymmetric unit the input would be:

NCSY MATRIX R11 R12 R13 R21 R22 R23 R31 R32 R33 (rotation 1-2)
NCSY TRANS  T1  T2  T3                      (translation 1-2)

For N molecules in the asymmetric unit there would be N-1 pairs of these records altogether.  Alternatively it may be more convenient (and less error-prone!) to use polar angles to specify the rotation component.  The use of Eulerian angles to specify the rotation has not been implemented because there are so many different Eulerian angle conventions in use.

NCSY POLAR  theta phi chi                 (rotation 1-2)
NCSY TRANS  T1   T2   T3                    (translation 1-2)

Note that the identity operator is always assumed and may be omitted. 

Occupancy segment definition.

Each record defines a contiguous "occupancy segment" by means of a starting atom number, the number of atoms in the segment, an optional segment "group identifier" and an optional segment "coupling identifier".  One or more occupancy segments with the same group and coupling identifiers comprise an "occupancy group".  All atoms belonging to the same occupancy group have the same shift applied during occupancy refinement.  Two or more occupancy groups may be coupled so that the sum of their occupancies is constrained to be constant; this is done by giving the groups the same group identifier but different coupling identifiers.  If occupancy coupling is not required, the coupling identifiers may be omitted.  Note that the content of a group identifier carries no significance; only its equality or inequality as compared with the other group identifiers is significant.  The same applies to the coupling identifiers; in addition their equality or inequality is only significant when they share a common group identifier.

Each contiguous occupancy segment is specified as:

- Keyword OCCUp

- Starting atom number of occupancy segment.  (character string)

Alternatively, the atom may be specified by its residue and atom labels; for details see above under "ANISO".

- Number of atoms in segment (may be 1).  (integer)

- Optional arbitrary group identifier; if occupancies are to be coupled then this will be the same for all the coupled occupancy segment(s).  (character string)

- Optional arbitrary coupling identifier; if occupancies are to be coupled then this is different for each coupled group.  (character string)

Note that non-unit occupancies must be specified in OCCUP records even if they are only to be used in structure factor calculation and not in refinement (in which case OCCREF=false).

Rigid body segment definition.

Each RIGID record defines a contiguous segment of atoms in the coordinate file whose rigid-body parameters (3 rotations and 3 translations) are to be refined.  Each contiguous segment is specified as:

- Keyword RIGId

- Starting atom number of rigid segment (character string)

- Ending atom number of rigid segment (character string)

Alternatively, the atoms may be specified by their residue and atom labels; for details see above under "ANISO".

- Optional identifier of rigid group (character string)

The purpose of the optional identifier is to allow consolidation of several segments into one rigid body, because a rigid body does not necessarily consist of contiguous atoms in the file.  To do this just give the same identifier to segments that are to be part of the same rigid body.  Many rigid bodies may be present, but nesting is not allowed.

General equivalent position.

Each record contains a general equivalent position for the space group typed as in INTERNATIONAL TABLES FOR X-RAY CRYSTALLOGRAPHY, Vol A.  If symmetry information is not given here, the SGROUP parameter is used; if that is not defined, the CRYST1 record in the PDB file is searched for the space group name; if none is found then the symmetry information in the MTZ file (if given) is used.  Use of this option is discouraged as it is very error-prone; it is better to update the "A HREF="symlib.html">symop.lib" file, and then thoroughly test your modifications.

Extra non-dictionary restraint.

Each record contains an extra interatomic restraint.  This is specified as

- Keyword XTRD
- Atom number 1 (character string)
- Atom number 2 (character string)
- Distance (Å) (real)

The atoms either may be specified by their atom numbers in the coordinate file, or by their residue and atom labels; for details see above under "ANISO".

If the distance is given as negative it is interpreted as a repulsion- only restraint, i.e. it is only applied if the calculated distance is less than the specified target distance.  This explicit extra restraint may be required because the implicit repulsion restraints (when REPEL=true) are not applied to pairs of atoms in the same residue; nor are they applied to an atom involved in any explicit extra restraints, whether repulsion or not.

Extra non-dictionary planar restraint definition.

Each record contains an extra plane.  This is specified as:

- keyword XTRP

- First atom number of plane (character string)

Alternatively, the atom may be specified by its residue and atom labels; for details see above under "ANISO".

- Number of atoms in plane (integer)


This keyword introduces the steering data.  It must appear on a line by itself after all of the keywords in the above list.

3.1.2 List of steering data.

After the record with the single keyword STEER, the data follows on the next line and consists of a series of "name=value" specifications separated either by a comma or by the end of the line (a comma at the end of a line is optional) e.g.:

     A=10.8, Gamma =90,  ISO= f , isoref = T, Aniso=    False
       G=2 ,High=2.8, dxyzlm=.02 ,

The read statement makes use of a simulated version of the FORTRAN NAMELIST facility and thus the order in which the variables are given is immaterial.  The letter case and spacing do not matter, and there may be any number of "name=value" specifications per record, up to 80 columns.  However a "name=value" specification may not be split across two or more lines, and the use of the "-" continuation character is not allowed.

Only those items which you want to differ from default values need be entered.  For example cell parameters are not normally supplied in the steering data because the values in the reflection and/or coordinate files are usually the correct ones.  A list of variables which can be input to the program is given below.  A detailed explanation of each variable is given in section 3.1.3.

The steering data may be terminated either by end-of-file, or by a variable name &EOF (without a value).  In either case, this will cause refinement to be initiated.  Additional steering data items (starting on a new line) may follow the &EOF variable.  The refinement will then be restarted from the point that it was terminated.  The values of the variables used will be those at the termination of the original refinement updated by the new supplied values.  This may be repeated as often as desired.

A (see note)
ANISO true
B (see note)
C (see note)
CREACT false
DESMAT false
DICPRI false
DIFS true
FREF true
FULMAT false
G (see note)
ILLCON false
ISO true
LOW 9999
NORMAT false
OFFDIA false
ONLYFC false
ONLYFR false
ORDER false
PHAS false
PRTALL false
REPEL true
RIGID false
RSIZE true
SB1 5
SB2 1.6
SGROUP (see note)
TESTIN false
U 0
UHIGH 0.15
ULOW 0.02
USEFR false
USEWFC false
WATER true
WE(1) 0.02
WE(2) 0.04
WE(3) 0.05
WE(4) 0.01
WE(5) 0.01
WE(6) 0
WF(1) (see note)
WF(2) (see note)
WF(3) (see note)
WF(4) (see note)
WF(5) (see note)
WF(6) (see note)
WF(7) (see note)
WFREF false
WP(1) 20
WP(2) 0.2
WU(1) 1
WU(2) 0.01
&EOF -

Note for table: refer to full explanation of variable in the next section.

3.1.3 Full description of the steering data.

Default values are given in brackets immediately after the variable name.

Cell parameter a (Å). 

Cell parameters default first to those defined by the SCALE matrix in the input PDB coordinate file; if one is not supplied the values given in the steering data are used; if none are supplied the values given on the CRYST1 record in the PDB file are used; finally if one is not given, the values read from the reflection file are used.  If cell parameters cannot be found anywhere the program will terminate abnormally.  Usually it is not necessary to supply cell parameters.  The default orthogonal setting is the standard PDB one, i.e.  x || a  and  z || c* .

ALPHA (90)
Cell parameter alpha (deg.).

ANISO (true)
When set false, disables refinement of individual atomic anisotropic thermal parameters, if supplied or generated.  The default is to refine any anisotropic thermal parameters.

Cell parameter b (Å).

BETA (90)
Cell parameter beta (deg.).

BINPUT (true)
When set false, this causes the isotropic atomic thermal parameters read from XYZIN to be used as read and not converted from B to U (divided by 8.PI2). 

Cell parameter c (Å).

This is the maximum number of conjugate gradient iterations allowed when solving the normal equations for the positional parameters (see also SFTLIM), and also the maximum number of failures allowed for the iterations to converge.

CREACT (false)
When set true this causes the centre of reaction of any TLS groups to be refined.  At present this feature is experimental; if set true it is likely to cause problems with small planar groups (< 6 atoms), because the centre of reaction may be far away from the mean centre.  The default is to keep the local origin of the TLS groups fixed, and compensate by refining 8 components of the screw (S) tensor instead of 5, allowing it to become non-symmetric.

The cycle number which may be specified by the user and is printed out for identification purposes.  It is incremented automatically when multiple cycles are carried out.

DESMAT (false)
When set true the design matrix is written to the file DESOUT.  The output file is used by another program (FUMAIN2*) for estimation of the variance of the least-squares residual.  At present this feature is experimental.
*FUMAIN2 is not yet a part of CCP4.

DICPRI (false)
When set true the dictionary is interperted and printed as it is read in.  This will be useful in the development of new dictionary entries.

DIFS (true)
When set false reflection output on file HKLOUT or REFOUT (section 4.4) is disabled.  This output can be used for Fourier calculations.  If necessary an extra cycle of structure factor calculation is done.

DXYZLM (0.05)
This is used to control the listing of atomic shifts.  Those less than DXYZLM (Å) are not printed (but see also ULOW and UHIGH).

FLIBR (0.)
This is a factor that controls libration corrections.  At present it is experimental and should not be used.

FMAX (0, defaulting to 1/3 the value of the largest Fo when available)
This governs the distribution of reflection bins as a function of Fo in the weighting analysis table.  The default value is 1/3 the value of the largest observed structure amplitude, Fo, which is printed on each run.  See also HIGH and LOW.

This is included to set a lower limit for |Fo| cut-off.  The default includes all reflections.  See also HIGH and LOW.

Specifies which free R set of the reflections to use in calculation of "Rfree".  This only applies to MTZ input files when the `FREE' program label is assigned to a column generated by `freerflag' or similarly.

FREF (true)
When set false this causes structure factor terms to be excluded from the least-squares minimisation function, so that only geometry regularisation is done.

FULMAT (false)
If set true this causes the full normal matrix for the coordinates to be output to the file specified by MATOUT, for purposes of error estimation.  If necessary an extra cycle of structure factor and derivative calculation is done.  Note that this option is likely to require a large amount of CPU time and memory (for a medium size structure of 300 residues (~ 3000 atoms) there will be ~ 12000 parameters (4: x, y, z, Uiso per atom), which will require REAL memory (not virtual!) of ~ 300 Mb!  DON'T TRY IT UNLESS YOU HAVE THE MEMORY AVAILABLE - if the real memory required is not available, the program will really hammer the swap file!

This is the overall scale factor and should be updated in each cycle by replacing it with the value obtained from the previous cycle. 

The scale factor G and the overall thermal parameter U may, as an alternative to least squares refinement from initial estimated values, be calculated ab initio by the method of Kraut.  (See the documentation for the CCP4 program FHSCAL for details of the method, noting that FP is to be considered as Fobs and FPH as Fcalc). The initial values of G and U are obtained by the program in an extra structure factor cycle before the coordinate refinement cycle(s) (but if the weight calculation option USEWFC is also set to true only one extra cycle is done).  This option is activated by omitting both the G and U parameters from the input.

GAMMA (90)
Cell parameter gamma (deg.).

This is the maximum number of Gauss-Seidel iterations allowed when solving the normal equations for the positional parameters (see also SFTLIM), and also the maximum number of failures allowed for the iterations to converge.

HIGH (0)
High resolution cut-off.  No reflections of smaller interplanar spacing than specified by HIGH are included (see also FOBMIN and LOW).  The default is not to apply any high resolution cut-off.  HIGH is also used to construct the resolution bins in the weighting analysis table (see also FMAX).

ILLCON (false)
If set true, uses only diagonal matrix elements from restraint derivatives; this is only required when the matrix is severely ill-conditioned and is not normally necessary.

ISO (true)
When false the input overall isotropic thermal parameter U is used in the structure factor calculation.  When true the individual atomic thermal parameters as read from the coordinate file are used (see also ISOREF).

ISOREF (true)
When ISO=false and ISOREF=true the input overall U is converted to individual Uiso's which are then refined independently. When ISO=false and ISOREF=false any isotropic thermal parameters in the coordinate file are totally ignored, and the overall isotropic thermal parameter is refined.  When ISO=true and ISOREF=true individual isotropic thermal parameters are read from the coordinate file and refined.  For most proteins this should not usually be attempted at resolutions lower than 3Å.  When ISO=true and ISOREF=false individual isotropic thermal parameters are read from the coordinate file but only the overall U is refined.  The resulting shift is applied to the individual isotropic thermal parameters which are written to the output coordinate file.

ISYM (0)
This is the number of the structure factor subroutine to be used, corresponding to the number of the space group for which it is specific.  However any sub-group of your space group, provided it has the same origin, will also work  Space group no. 1 (P1), being a sub-group of all space groups, always works, and this is the only option when using any anisotropic thermal parameters or TLS or non-crystallographic symmetry.  Presently available options are: 1, 3, 4, 16, 17, 18, 19, 77, 92, 94, 96, 169, 170, 178 & 179.  If ISYM is omitted (usually the best option) or set to 0, the program chooses the subroutine automatically. 

Number of lattice points per unit cell: 1 for P cell, 2 for A,B,C and I cells, 3 for R cell indexed on hexagonal axes and 4 for F cells.

LOW (9999)
Low resolution cut-off.  No reflections of larger interplanar spacing than specified by LOW are included (see also FOBMIN and HIGH).  The default is not to apply any low resolution cut-off.  If WATER is set true (solvent background correction, the default) you should not apply this cut-off.  LOW is also used to construct the resolution bins in the weighting analysis table (see also FMAX).

This is the number of items to be read from the character reflection file (see section 3.5).

MFACR (0.1)
This is the incremental value of Marquardt's factor, which is used when the positional parameters normal equations become ill- conditioned and the structure may fail to refine, e.g. when the data/parameter ratio is poor, or the structure is disordered. Initially a solution is tried with a factor of 0, then on each failure to solve the equations it is incremented by MFACR, up to GSFACR times (if using the Gauss-Seidel method) or CGFACR times (if using conjugate gradient).  In difficult cases it may be necessary to increase MFACR to 0.2 or 0.3.

This governs the number of reflection and distance calculations skipped in the sampled residual calculations.  Normally the default value should suffice.  However for large structures the sampling error will be smaller, so MODULO can be increased, saving some time.

NCSREF (true)
When set false, no refinement of any non-crystallographic symmetry operators specified by NCSYMM records is done.

NCYC (1)
This is the number of refinement cycles to be carried out in this run.

NORMAT (false)
If set true this causes the full normal matrix for the coordinates to be output to the file specified by MATOUT, for purposes of error estimation.  If necessary an extra cycle of structure factor and derivative calculation is done.  Note that this option is likely to require a large amount of memory (using 3 coordinate parameters per atom).  READ COMMENTS UNDER "FULMAT" ABOVE BEFORE ATTEMPTING THIS!

OCCREF (true)
When set false, no refinement of any atomic or group occupancies specified in OCCUP records will be done.

OFFDIA (false)
If set true, uses contribution to off-diagonal matrix elements from amplitude derivatives; this is not normally necessary.

ONLYFC (false)
If set true no refinement, only structure factor calculation, will be performed.

ONLYFR (false)
When set true only the "free set" of reflections, as defined by FREERFLAG, will be used.  This is not normally a sensible option, and is used only for special purposes (e.g. Rfree statistics).

ORDER (false)
If the coordinates supplied to the program are in a different order than in the dictionary, setting this parameter true will put them in the right order and continue refinement.  At the end the program will write the coordinates out in the correct order (see also TESTIN).

PHAS (false)
If set true phase restraints are used.  Phase information must be present in the reflection file to use this option (see section 3.5).

PRTALL (false)
If set to true then full analyses of structure factors, geometry, etc. are printed every refinement cycle, otherwise this is only done on the last cycle with summaries on the other cycles.

REPEL (true)
If set true non-bonded interactions shorter than the van der Waals contact distances as defined in the dictionary are restrained.  This can be usefully employed when attempting to refine the 'pucker'/stereochemistry of a group with unknown 'pucker'/stereochemistry, or at the beginning of a protein refinement when the side chains may have large deviations from the true position.

RIGID (false)
If set true constrained rigid body refinement is applied to those parts of the molecule defined by RIGID records.  All other coordinates will not move.  There may in fact be no RIGID records, in which case all coordinates are kept fixed.

RMERGE (0.1)
The Rmerge of the data when it was processed.  This is only used to estimate WF(1) when it is omitted.  If a value for WF(1) is supplied then RMERGE is ignored.

RMSMIN (0.03)
This governs the amount of output.  If the r.m.s. deviation from planarity (Å) > RMSMIN then the plane is printed out. If set to a negative value all restrained planes are listed.

RSIZE (true)
If set false sets record size for harvest file to 80, default is 132.

RWDMIN (100)
This governs the amount of output.  Only those reflections are printed for which
  (w(f)1/2) DELTA(|F|) > RWDMIN
where DELTA(|F|) is the absolute difference between the calculated and observed structure amplitudes.  If set to a negative value all structure factors are listed.

This performs a similar function to RWDMIN.  If
 (w(d)1/2) DELTA(dist) > RWLMIN

where DELTA(dist) is the absolute difference between calculated and observed distances then the distances are printed.  If set to a negative value all distances restrained are listed.

SB1 (5)
Solvent background scale factor.  Disordered solvent makes a significant contribution to the Bragg scattering at low angles.  This is allowed for by applying Babinet's Principle.  Accordingly modified scattering factors f' are used in the structure factor calculations.
  f' = f - SB1*exp(-1/2*SB2*q2)
where q = 4.PI.sin(theta)/lambda.

The parameters SB1 and SB2 are only used if WATER=true.  Their refined values may be used in subsequent cycles in the same way as G and U.  These parameters are highly correlated and well defined values may not exist.  They may also allow for disordered parts of a macromolecule which do not form part of the model currently being refined.

SB2 (1.6)
Solvent background thermal parameter.  See SB1 above.

The number of the amplitude weighting scheme.  The structure amplitude terms in the function minimised are weighted with weights which are calculated from a weighting formula.  Four formulae are available (see section 2.4.1) and these are referred to as SCHEMES 1, 2, 3 and 4.  SCHEME 1 gives a constant weight to all structure amplitudes and provides the maximum rate of convergence, so it is usually tried first.  In the later stages, one of the other schemes should be selected, so that the errors in the amplitudes can be more accurately represented.

SFACR (0.8)
Initial shift factor.  This is automatically adjusted during the sampling calculations.  The frequency of sampling the observations and restraints is governed by the value of MODULO

SFTLIM (0.02)
The Gauss-Seidel or conjugate gradient iterations terminate either when GSFACR or CGFACR cycles respectively have been carried out or when all differences between successive solutions are less than SFTLIM.

The space group name or number.  If one is not supplied it defaults first to the one given on the CRYST1 record, then to the one in the reflection file.  It requires the SYMOP variable to be defined in order to locate the symop.lib file.  Normally it is not necessary to supply the space group.

This allows the refinement with those reflections only for which Fobs >=SIGMA*standard deviations.  The input reflection file (see section 3.5) should contain standard deviations.

TESTIN (false)
If set true this will cause all subroutines dealing with the structure factors and geometry to be bypassed.  This will be useful when setting up input coordinates for refinement. 

TPREST (true)
If set false, thermal parameter restraints are not used.

TLSREF (true)
If set false, refinement of the group thermal parameters (isotropic, anisotropic or TLS) will be skipped.  The default is to refine the group thermal parameters if they are present in the input.

TSFACR (0.01)
This is used to determine the accuracy of the line search for the minimum.  Smaller values will locate the minimum more accurately, but at the expense of CPU time.

U (0)
U is the overall thermal parameter.  If ISO=false it should be updated in each cycle by replacing it with the value obtained from the previous cycle.  If ISO=true the effect of giving a U > 0 is to adjust all the individual Uiso's so their mean is the input U.  Note that this differs from previous versions: previously an input U was ignored if ISO=true.

UHIGH (0.15)
If an atomic thermal parameter is greater than UHIGH it is printed.  This allows the amount of output to be controlled when used in conjunction with ULOW and DXYZLM.

ULIMH (2.5)
If an atomic thermal parameter is calculated to be greater than ULIMH, it is reset to that value.  A suggested value is the larger of 0.3.HIGH2 and 1.5.

If an atomic thermal parameter is calculated to be less than ULIML, it is reset to that value.  A suggested value is -0.0005.HIGH6.

ULOW (0.02)
If an atomic thermal parameter is smaller than ULOW, then it is printed.  This allows the amount of output to be controlled when used in conjunction with UHIGH and DXYZLM.

USEDSD (true)
By default, the individual standard deviations of the distances given in the dictionary are used to weight the distance restraints (weight = 1/s.d.2).  However, if the standard deviation in the dictionary is given as 0 (because no estimate was available), then the weight is obtained from the corresponding WE coefficient (1, 2 or 3, see below).  If USEDSD is set false, all the standard deviations in the dictionary are ignored, and all weights are obtained from the WE coefficients.

USEFR (false)
When set true this causes the "free set" of reflections to be used as normal reflections; this should be only done once the refinement is complete.

USEWFC (false)
If USEWFC is set true, F-weighting scheme coefficients for weighting schemes 2, 3 and 4 are calculated before the first refinement cycle and used in subsequent cycle(s).  The weighting coefficients to be calculated should not be specified in the input; if they are they will be ignored.

WATER (true)
If WATER is set to false, values of SB1 and SB2 are not used (see SB1).  The default is to use and refine SB1 and SB2.  These parameters allow for the bulk solvent scattering in a macromolecular crystal.  You should not apply a low resolution cut-off (LOW) in that case.

WE (0.02, 0.04, 0.05, 0.01, 0.01, 0)
The 6 elements of this array are used for applying weights (= 1/WE(i)2) on the geometry part of the function that is minimised.

WE(1) is used for restraints on 1-2 distances (< 2.12Å).
WE(2) is used for restraints on 1-3 distances across bond angles (>= 2.12 but < 2.625Å).
WE(3) is used for restraints on non-bonded distances (>= 2.625Å).
WE(4) is used for restraints on peptide planes.
WE(5) is used for restraints on ring and other planes.
WE(6) is used for restraints on the edges of chiral tetrahedra.

If all these variables are set to 0 then no contribution from ideal geometry is included, i.e the refinement is based solely on the structure amplitudes, thermal parameters and/or phase data.  See also section 2.4.  If USEDSD is set true, and the standard deviation of the distance restraint given in the dictionary is > 0, then the WE coefficient (1, 2 or 3) is not used to obtain the weight.  For compatibility with previous versions of the program, this version will also accept values of WE(i) >= 1, in which case the value used is 1/WE(i).

WF (0, 0, 0, 0, 0, 0, 0)
The 7 elements of this array are used for applying weights on the structure amplitude part of the function that is minimised. Four weighting schemes are available, one of which incorporates the standard deviations in Fo.  For details see section 2.4 and SCHEME.  If WF(1) is omitted or set to 0 then an estimated value will be used, based on either the default or the supplied value of RMERGE.  However it is to be regarded as very approximate, and it will very likely need to be updated.  The other elements WF(2) ... WF(7) may also be omitted; in which case if SCHEME is not 1, USEWFC will be forced true and the elements of WF (2 ... 7) will be determined automatically before the first refinement cycle.  It will save time if values from the previous run are inserted.  However periodically you should omit them and let the program determine new values.

WFREF (false)
If set true, optimizes value of WF(1), i.e. scale of F-weights relative to restraint weights.  However this is likely to be a very time-consuming process, and should only be attempted after convergence has been attained (use MODULO=1 and TSFACR=0.004).

WP (20, 0.2)
The two elements of this array are used for applying weights on the structure factor phase part of the function that is minimised. These weights are only used when PHAS=true and phases have been read from the input reflection file.  See also section 2.4.

WU (1, 0.01)
The two elements of this array are weighting coefficients for the isotropic and anisotropic thermal parameter restraints respectively.

&EOF (No value)
Input of steering data may be terminated and refinement initiated by either &EOF or end-of-file.  Further steering data on a new line may follow &EOF.


The dictionary is read from the file DICTION.  The use of a user defined dictionary makes RESTRAIN extremely flexible with respect to the type of structures that can be refined.  The dictionary is divided in two blocks, the first containing all residues and accompanying restraints, the second one containing all the information necessary for the program to calculate the scattering factors for each atom type included in the first block.  Keyworded free format input is used throughout, with spaces, tabs or newlines separating items, and with record continuations (max 24) being specified by a "-" at the end of the line.  Character strings containing leading spaces (e.g. atoms with single character atomic symbols) must be enclosed in quotes ("...").  REMARK records may be interspersed freely to make comments.

The first block is organised into residue types, the first entry for each type being "RESI" followed by the residue name as a three letter abbreviation.  Note that these residue names must correspond to those present in your coordinate set (see section 3.3).  Within each residue entry the records may appear in any order.

Following the residue entry record are a series of "DIST" records defining the atom names, and each distance restraint in sequence moving down the residue.  Each restraint is specified by a positional number defining which atom following the current atom it is restrained to, then the distance in Å and its standard deviation. The order of the different atoms in the residue therefore specifies the positional number.  By default the restraint weights are calculated from the standard deviations.  Note that the atom names must correspond to those present in your coordinate file (see section 3.3).

"DIHE" records define the name of each dihedral angle and the four positional numbers of the atoms defining this angle.  Note that the names are not stored in the program.  It is however sensible to use a consistent logical order, since the calculated dihedral angles will be printed in the same order, e.g. phi and psi, chi angles, omega for amino acid residues.

"CHIR" records define the name of each chiral centre and the four positional numbers of the atoms defining this centre.  The order in which these atoms should be given should refer to a right-handed rotation when looking along the bond between the first atom (with the lowest positional number in the table) and the one at the centre of the tetrahedron.  For Calpha chiral centres in amino acids the order therefore is N-Calpha-C-Cbeta.  Note that the names are not stored in the program.

"PLAN" records define the name of each plane, the plane type, an individual plane weight (not used; for future development), and the atom pointers defining these planes.  In this version of RESTRAIN only two types of planes are recognised.  Planes of type 1 in the list will be put in the first category (PLANE1), all of type 2 in the second one (PLANE2).  For amino acid residues the peptide planes therefore are usually put in first position.  The reason for this is that RESTRAIN allows different weighting to be used for the two types of plane (see section 2.4).  Note that the plane names are not stored in the program.

The residue entries in the first block are terminated by a record starting with END.

The second block consists of "ATOM" records and is organised into atom types, the first entry for each type being the atom name.  Note that these atom names must correspond to those present in the first block and in your coordinate set (see section 3.3).  Each atom name is followed by a record containing the 4 constants S(i), the 4 constants E(i), the constant C and the closest van der Waals radius RKL.

These constants will be used for a four-Gaussian expansion of the scattering factor:

f(hkl)=SUM(i) S(i)exp(-E(i)(sin(theta)/lambda)2)+C for i = 1,4
These constants can be found in INTERNATIONAL TABLES FOR X RAY CRYSTALLOGRAPHY, Vol. IV.  The van der Waals radius is used for calculation of nearest allowed distances of atoms more than three bond distances apart when REPEL=true.  The second block is terminated by a record starting with END.

The distributed dictionaries (in $CLIBD) are:

chiral_pep4.dic: Main-chain chiral restraints; 4-atom peptide planes.
chiral_pep5.dic: Ditto, but 5-atom planes.

The first is the default if DICTION isn't assigned.  A program "rdent" is available to generate RESTRAIN dictionary entries from PDB coordinate files; however it only makes the distance records (without standard deviations), the user has to work out the other sections, but this is not difficult.

The peptide dictionaries use values published by Engh & Huber (1991).


Cartesian orthogonal coordinates are read from file XYZIN.  The default set of orthogonal axes XO, YO and ZO is defined as follows:
     XO || a
     YO || c* x a
     ZO || c*

If SCALE records are present in the file, these will override the above, as well as any cell parameters given in the steering data.

A CRYST record if present will override any crystal data (i.e. cell and space group) read from the MTZ file (if used).  However any crystal data given in the steering data will override both the PDB and MTZ files.

The coordinate records must be in the format designed by the Brookhaven Protein Data Bank.  The format expected is:

Record identifier (A6)
ATOM for polymer atoms, HETATM for other atoms such as water.

Atom number (A5)
This may be alphanumeric.

Atom identifier (1X, A4)
The first two characters are the atom type right-justified.  The last two characters are the remoteness indicator and the branch number respectively.  These may be omitted if desired.

Residue name (1X, A3)
The name of the residue or the molecule (e.g. ALA or H2O)

Residue label (1X, A6)
The residue label or number.

Atom coordinates (3X, 3F8.0)
The orthogonal coordinates of the atom.

Occupancy (F6.0)
Atomic occupancy on the scale 0 to 1.

Isotropic thermal parameter (F6.0)
This may be given either as U or B (8 pi2U).
Chains are terminated with a record with "TER " in the first 6 character positions.  The program can deal with more than one protein chain.  This is particularly useful when refinement is carried out with certain residues missing/removed or when second sites are included. 

Care must be taken in preparing the coordinates for refinement. After each polymer chain a TER record must be inserted.  All atoms not contained in chains must be labelled HETATM.

Note that atomic thermal parameters can be read as either U's or B's (B=8.PI2.U); the variable BINPUT must be set accordingly. After previous refinement and extensive rebuilding you may want to reset large U or B values for atoms incorrectly positioned before rebuilding (e.g. U > 0.8 or B > 64Å2) to more reasonable starting values (e.g. U=0.2 or B=16Å2).

The number of atoms in each residue in the polymer chains must be the same as the number of atoms in that residue in the dictionary.  The names of all atoms must correspond to the names of the atoms in the dictionary.  Blanks (including leading blanks) are significant in assessing an atom name.

The atomic coordinates in the polymer chains must be ordered in each residue in the same way as the atoms in the residue are ordered in the dictionary.  If this is not the case, set ORDER=true in the steering data in the initial cycle.  The output file of atomic coordinates will then be produced in dictionary order for subsequent cycles.  Alternatively, set TESTIN=true and ORDER=true to use the program to order and analyse the file without carrying out any refinement.

For anisotropic thermal parameters the six values defining the U tensor of an atom U(11) U(22) U(33) U(12) U(13) U(23) are written out (multiplied by 104 immediately following the coordinate record of that atom.  The record containing the U tensor is identified by the label ANISOU.  The format used for this record is (A6,22X,6I7).


All information for the group thermal parameter refinement is contained in the file assigned to TLSIN; the steering data does not contain any information.  Each thermal parameter group is defined by an entry in the TLSIN file. 

The layout of a UISO entry is typically:

UISO    name
RANGE   atom_id_start  atom_id_end  [selection]
RANGE   . . . . . . . . . . . . . . . . . . . .
U       Uiso                                           (Å2)

The layout of a UANISO entry is typically:

UANISO  name
RANGE   atom_id_start  atom_id_end  [selection]
RANGE   . . . . . . . . . . . . . . . . . . . .
U       U11 U22 U33 U23 U31 U122)

The layout of a TLS entry is typically:

TLS     name
RANGE   atom_id_start  atom_id_end  [selection]
RANGE   . . . . . . . . . . . . . . . . . . . .
ORIGIN  x y z                                          (Å)
T       T11 T22 T33 T23 T31 T122)
L       L11 L22 L33 L23 L31 L12                            (deg.2)
S       S1  S2  S23 S31 S12 S32 S13 S21                     (Å.deg.)

Uij means the element (i,j) of tensor U.  Since X-ray data allow the calculation of only eight of nine S tensor elements, the usual constraint of setting the trace of S to zero is adopted.  This means that the elements S1 and S2 are (S33 - S22) and (S11 - S33) of the S tensor as defined by the equation
U = T + A L A' + A S + S'A' (Johnson and Levy, 1974).

Note that the order of the off-diagonal terms in the group U, T and L tensors is different from that of the U tensor in the coordinate file (the 23 and 12 elements are swapped).

All the records of each except the first (UISO, UANISO or TLS) are optional, and can appear in any order.  The data will assume sensible defaults if not supplied (so the TLSIN file may contain only 1 line).  If the U or T record is omitted, the mean isotropic thermal parameter for the group is either used as is for UISO, or converted to the equivalent anisotropic tensor for UANISO or TLS.  ORIGIN specifies the local origin of a TLS group; if omitted it is set to the mean centre of the group.  The L and S tensors if omitted are set to zero. In addition to the keyworded records shown above, the following are also accepted: DEFAULTNOATOMRESIDUE  (see the next section for details).

3.4.1 Description of the data records in the TLSIN file.

Only the first 4 letters of the keywords are significant and they are case-insensitive.  The format is free, that is items separated by one or more spaces.  If items are left blank they default to zero values. 

UISO  [name]
Introduces Uiso group.  "name" is optional text used to identify the group in the output.

UANISO  [name]
Introduces Uaniso group.

TLS  [name]
Introduces TLS group.

RANGE  atom_id_start  atom_id_end  [atom_selection]
The RANGE record contains two atom identifiers indicating the start and finish of a segment of the coordinate file followed optionally by the names of atoms to be selected from this segment for inclusion in the group.  There may be any number of RANGE records per entry, including none (in which case the range of the group is the entire coordinate file).  See section 3.1.1 under keyword ANISO for a description of the options available for defining the range and the atom selection.

U       Uiso
U       U11  U22  U33  U23  U31  U12
Group isotropic thermal parameter, if a UISO group, or group anisotropic thermal tensor components (6), if a UANISO group.

ORIGIN  x  y  z 
Coordinates of the local origin of the TLS group.  For an aromatic ring it is usually the C-beta atom; for larger groups such as domains it is usually the mean centre (the default). 

T       T11  T22  T33  T23  T31  T12
T tensor components (6) for TLS group.

L       L11  L22  L33  L23  L31  L12
L tensor components (6) for TLS group.

S       S1   S2   S23  S31  S12  S32  S13  S21
S tensor components (8) for TLS group.  If CREACT=true (refine centre of reaction of all TLS groups), the S tensor is symmetric, so only the first 5 components are needed.

This specifies that the values in the current group may be overridden if a subsequent group specifies any atoms in common with this group.  Otherwise it is an error to specify groups that have common atoms.  For example, one could specify a default UANISO group for the whole coordinate file; then override it with smaller UANISO or TLS groups.  Any atoms left outside these groups would get the overall Uaniso.

This switches off the default option to refine isotropic thermal parameters for atoms in the current group at the same time as the group parameters.  This is only valid for UANISO and TLS groups.

This causes all the range(s) specified for the current group to be split up into single residues, each with its own set of parameters of the same type as the parent group, which are then refined independently.

An example of the TLS record specifying a TLS group consisting of two mainchain segments, with atoms in residues 1 to 68 and 129 to 300 is:

TLS   N domain
RANGE    1.   68.  MNCH
RANGE  129.  300.  MNCH
T     .112    .165    .131   -.052   -.003   -.003
L    1.877   2.165   3.471   4.562   6.152   7.313
S     .366   -.382    .147   -.981    .185    .118    .132    .140

Warning and error messages:

Where TLS tensors result in U tensor that is not positive-definite, a warning message is printed out stating the atom name, number and U tensor.

If the L tensor elements are large (>20 degr2) and an atom is far away from the centre of origin for the calculation of the TLS tensors (>20Å), then the observed and calculated structure factor amplitudes can be different by several orders of magnitude.  This is a consequence of the numerical instability in calculation of derivatives of the TLS tensors with respect to positional coordinates (on some machines it may also result in an overflow floating point error).  These problems usually appear at the beginning of the TLS refinement of large groups if the user does not set the initial L small enough and origin of the rigid group sufficiently close to the centre of gravity.  Such an error is checked for in two ways.  First, a warning message is printed if the selected origin is more than 10Å away from the gravity centre.  Second, a warning message is printed if more than 30% of elements of U tensors for individual atoms had to be reset to an arbitrary interval [0, ULIMH].

Note that TLS calculations, like all anisotropic calculations, cannot take advantage of space-group specific subroutines.  The general space-group subroutine must be used.


The reflections are read from file REFIN or HKLIN.  These files may contain:

Item                Description                    Form-  Unform-
                                                   atted   atted

H K L        Miller indices of reflection              I      R
FOBS         Observed structure factor amplitude       I      R
SIGMA(FOBS)  Standard deviation in observed amplitude  I      R
PHASE        Estimated phase from isomorphous and/or
             anomalous data                            I      R
FOM          Figure of merit for phase
             (on scale of 0-100)                       I      R
FREERFLAG    Free R flag (MTZ only)                           I
Two file types containing the amplitude and/or phase data are accepted.  Which file type is actually read depends on the keyword REFIN or HKLIN (see section 3.1.1).

When REFIN is used, a formatted reflection file is read and the input depends on the value specified for MAXFMT which must be >=4 and <=7.  When MAXFMT is 5 the items H, K, L, FOBS AND SIGMA(FOBS) will be read.  The reflections are read in with the format specified after the steering data.  Note that the format must be consistent with the value for MAXFMT.

When HKLIN is used then the input is read from an unformatted (MTZ) reflection file.  The file has header information containing the crystal data (cell parameters and space group), which means that this information does not normally need to be supplied in the steering data.


Besides line printer output (described below in section4.1) there are a number of output files depending on the steering data.
           File                        File name    Description

- refined atomic coordinates           XYZOUT       section 4.2
- refined group thermal parameters     TLSOUT       section 4.3
- structure factors               HKLOUT or REFOUT  section 4.4
- full normal matrix                   MATOUT       section 4.5
- design matrix                        DESOUT       section 4.5
There are also 3 scratch files used by RESTRAIN:
           File                          Unit       Description

- coordinates for ordering                12        section 4.6
- reflections for scaling & weighting     14        section 4.6
- normal equations for positional parms.  11        section 4.6


The program is so designed that all possible information that could be required by the user is accessible.  However, to prevent unnecessary output the user can manipulate parameters that control the amount of output (see section 3.1.3).  Obviously a run will produce a limited selection of the output items, depending on the refinement parameters.  The major output items for each cycle are summarised below.  They are subdivided in major blocks indicated by a title between a pair of three asterisks.

1.  The program header stating the version number used.

2.  The array dimensions which have been set using the PARAMETER statements.

3.  The TITLE as supplied by the user in the control data.


4.  The filenames for coordinates input XYZIN, reflections input REFIN, dictionary DICTION, coordinates output XYZOUT and reflections output REFOUT.

5.  FORMAT FOR INPUT: The format specified by the user is printed.



6.  Under this heading there follows a list of all the steering parameters, with their default values and the input values which were specified by the user.  If no value for a parameter has been given, the default value is used, with the exception of the cell parameters and the scale factor G and overall thermal parameter U, which must be supplied by the user.

7.  FRACTIONAL CRYSTALLOGRAPHIC EQUIVALENT POSITIONS.  The general equivalent positions are given in the format of International Tables Vol. A.  It is advisable to check these at the beginning of a refinement.

8.  When refining using non-crystallographic symmetry MODE 2 (RIGID=false) ORTHOGONAL NON-CRYSTALLOGRAPHIC EQUIVALENT POSITIONS will be printed.  These will then be followed by a list of ALL ORTHOGONAL EQUIVALENT POSITIONS including those generated by the non-crystallographic symmetry.

9.  When extra distance restraints are to be used NUMBER OF NON-DICTIONARY RESTRAINTS will be printed.  Six restraints per line are listed.  These restraints are ATOM1 ATOM2 DISTANCE e.g. 190- 638 2.08 means that the distance between atoms 190 and 638 is 2.08Å.  Check that the restraints are correct.

10.  When extra planes are to be used NUMBER OF NON- DICTIONARY PLANES will be printed.  Check that the planes are correct.  E.g.

     1045           6
This means that there are 6 atoms in the extra plane, the first atom being number 1045, the other 5 atoms following sequentially with no atoms being skipped.

11.  When atoms are to have occupancies refined (OCCREF=true) NUMBER OF OCCUPANCY GROUPS will be printed.  The occupancy groups are then listed.

                    OF ATOMS   NUMBER     NUMBER
       910          6          1           1  
       952          5          2           1
      1045          6          1          -1
This shows the two cases:

  1. Coupled occupancies.  Group number 1, containing 6 atoms, occupies two sites, with first atom number 910 and 1045 respectively.  This group has a coupled occupancy for the two sites (as indicated by -1 for the second site).

  2. Partial occupancies.  Group number 2, containing five atoms, with first atom number 952.

The present occupancies as read from the input coordinates are then listed.

     ATOM  910 HAS OCCUPANCY 0.621.
     ATOM 1045 HAS OCCUPANCY 0.379.
     ATOM  952 HAS OCCUPANCY 0.565.
Note that coupled occupancies should add up to 1.


12.  In case DICPRI is true, the contents of the dictionary will be printed as it is read to facilitate the development of new entries.  At the end some overall statistics are printed.


13.  MOLECULAR PARAMETERS.  This is self-explanatory.  Note that (groups of) terminal atoms may be counted as extra residues. This is seen when for the carboxyterminal oxygen a separate residue entry in the dictionary is used.

14.  If there are groups of atoms which are to have their thermal parameters refined by rigid body option, the header ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED ANISOTROPICALLY BY RIGID BODY (TLS) is printed, followed by the description of rigid bodies using the format in section 2.3.6.

15.  If there are groups of atoms which are to have their thermal parameters refined anisotropically the header ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED ANISOTROPICALLY is printed, followed by 10 ranges per line giving first and last atom number (internal counters).

16.  If there are rigid groups, these are listed under the heading ATOMS IN THE FOLLOWING RANGES TO BE REFINED AS RIGID GROUPS. Ten ranges per line are printed giving first and last atom number (internal counters).

17.  When refining using non-crystallographic symmetry MODE 1 (RIGID=true) ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED AS RIGID GROUPS RELATED BY NON- CRYSTALLOGRAPHIC SYMMETRY is printed.  For each molecule the atom ranges (internal counters) are given, followed by a description of the non-crystallographic symmetry operation in terms of a rotation and a screw translation.  This is an aid in visualising the transformation involved.

18.  NUMBER OF PARAMETERS TO BE REFINED.  This gives an indication of the stability of the refinement seen in relation to the number of observeds and restraints.

19.  The cycle number CYCNO as supplied by the user (or default value 1).

20.  When refining TLS parameters there is a list of those atoms within TLS groups for which are the derived anisotropic tensors are not positive definite.  This information is listed below details of the TLS group concerned.


26.  TITLES READ FROM REFLECTION FILE when a binary reflection file is used.

27.  UNFAVOURABLE AGREEMENTS BETWEEN F(OBS) AND F(CALCS) AS DETERMINED BY RWDMIN.  Under this header structure factors are listed, when their rootweighted (Fo - G.Fc) (DELTA ROOTW) is larger than the user supplied value for RWDMIN.  In the early stages of a refinement it is advisable to print some structure factors, to check whether the amplitudes and/or phases are read correctly, and to see which reflections cause problems.  In later stages this output can then be suppressed.

28.  TABLE OF TOTALS DERIVED FROM THE STRUCTURE FACTORS INCLUDING THE R FACTOR.  This table gives information about the number of reflections (and phases) used, W DELTA SQ or SUM w(f)(Fo - G.Fc)2 is the term being minimised.  Then two residuals and a correlation coefficient are printed.

R     = SUM(|Fo| - G.|Fc|) / SUM(|Fo|)

RDASH = (SUM(W.(|Fo| - G.|Fc|)2) / SUM(W.|Fo|2))1/2

C     = (N.SUM(|Fo|.|Fc|) - SUM(|Fo|).SUM(|Fc|)) /
        ((N.SUM(|Fo|2) - SUM(|Fo|)2) .
        (N.SUM(|Fc|2) - SUM(|Fc|)2))1/2
where N is the number of amplitudes used.

The conventional R-factor is self-explanatory.  However, it is the weighted R-factor which gives an indication of the progress of the refinement.  As long as this residual is decreasing, there is hope, even when the unweighted R-factor temporarily increases (which is sometimes seen in the initial cycles of a refinement).  The correlation coefficient may have a greater discerning power than the R-factors, when refining potential molecular replacement solutions at low resolution.


29.  This table prints the mean w.delta2 values for amplitudes (and phases if PHAS is true) in batches according to the resolution (columns) and amplitudes (rows).  The table will be very useful when judging the effect of the weights which are printed above the table.  Above the table the weighting formula as defined by SCHEME and WF(i) is shown.

30.  The values of the refined scale (G) and overall thermal parameter (U).  If WATER=true, the values of the parameters SB1 AND SB2 will also be printed.


31.  Under this header restrained interatomic distances are listed, when their rootweighted d(t) - d(c) (RWDELTA) is larger than the user supplied value for RWLMIN.  In the early stages of a refinement it is advisable to print some differences, to check whether the order of the coordinates is correct, and to see which distances cause problems.  In later stages this output can then be suppressed.  This table also gives the r.m.s deviations from planarity of the peptide and ring planes where they exceed 0.03Å.  If a chiral centre threatens to reverse hand, or has already done so, the tetrahedral volume will be printed.  If many residues have this tendency as sometimes happens in the early stages of a refinement, it may be useful to use a dictionary with extra chiral restraints, and to use a value for the weighting coefficient WE(6) < WE(1).

At the right-hand side of this table the torsion angles as calculated from the coordinates are listed in the order as defined by the dictionary.


32.  A table printing the mean w.delta2 values for distance and planarity restraints in groups according to the target distance or plane type is given.  This table will be very useful when judging the effect of the weighting coefficients which are also printed in this table, with WE(1) to WE(6) from left to right.


33.  Under this heading a table prints the value of the function minimised (see section 1.1), showing the sum of the w.delta2 values for the amplitudes, phases, distance restraints and planarity restraints, and their relative contribution to the total minimum.  This will be useful in defining the relative weights for each term.  When FREF=true there will be a second table showing the relative residuals in dependence on the resolution.


34.  This next block of information describes the convergence of the Gauss-Seidel iterative method for solving the normal equations for the positional parameters.  The first table describes the condition of the matrix.

This is followed by a table describing the solution of the normal equations listing for each iteration : the iteration number I, MEAN(Q) and MAX(Q), the mean and maximum respectively of the elements of DELTA P(I) - DELTA P(I-1) and DELTA P (I) - DELTA P (I-1) / DELTA P (I), where P(I) = solution vector at iteration I.

The ANGLE BETWEEN SHIFT VECTOR AND DIRECTION OF STEEPEST DESCENT gives an indication of the progress towards the minimum.

In case the program cannot not solve the normal equations, MFACR will be automatically incremented, and a retry will take place.  When this leads to divergence again, some suggestions are printed.


35.  This table shows the results of the sampled residual calculations using

     Actual shift = SFACR * calculated shift
Sampled residual calculations are made to determine the optimum shift factor (ESTIMATED SHIFT FACTOR).

36.  The r.m.s atomic shift is printed out.  This indicates whether any refinement is still taking place, or if convergence has been reached.

37.  If there are rigid groups, for each group the three translations and a rotation angle around an axis, of which the direction cosines are given, are printed together with the r.m.s atomic shift.  The latter value will give an indication if convergence is being approached.


38.  When refining using non-crystallographic symmetry MODE 1 (RIGID=true) the program will print the new transformation for each molecule, followed by a description of this non-crystallographic symmetry operation in terms of a rotation and a screw translation.  This can then be compared to the input value printed in item 17.


39.  Next is printed a listing of all atoms, to which shifts larger than DXYZLM have been applied, or which have U values not within the range ULOW to UHIGH.  In case of anisotropic atoms the trace is used to determine whether the tensor is printed.  In the case of multiple cycles the shifts refer to the last cycle only.

40.  The r.m.s atomic shift for the original input coordinates is printed out.  This will be different from the one under item 31 when more than one cycle has been run, and/or when constrained-restrained refinement has taken place.

41.  When refining TLS parameters there is a list of the refined TLS groups with the derived anisotropic tensor for each atom in the group.  This is checked for being positive definite.  The results may be compared with those of item 20.


These are written out to file XYZOUT.  The coordinates are written out in the same format as the input coordinates (see section 3.3). Atomic anisotropic U tensors are also written to this file and are in the format described earlier.  In the next run the file specified as XYZOUT should therefore be used as XYZIN.


These are written out to file TLSOUT, in the same format as in file TLSIN (section 3.4), provided the latter was supplied.  In the next run the file specified as TLSOUT should therefore be used as TLSIN.


These are written out to file HKLOUT or REFOUT, and are ideally meant for FFT input.  Each record contains
H K L  40000(sin(theta)/lambda)2  Fo/G  SIGMA/G  Fc  PHASE
in the format (3I4,4I6,I4) for REFOUT, or
unformatted for HKLOUT.  When no sigma is read in, 1/sqrt(weight) replaces SIGMA in the output.


If FULMAT or NORMAT is set true, the normal matrix is written to the file MATOUT.  This is used for calculating standard deviations of all parameters (FULMAT) or just coordinates (NORMAT).

If DESMAT is set true, the design matrix is written to the file DESOUT.  The output file is used by another program (FUMAIN2*) for estimation of the variance of the least-squares residual. At present this feature is experimental.
*FUMAIN2 is not yet a part of CCP4.


A formatted scratch file (unit 12) for temporarily storing the newly ordered coordinates when the option ORDER is true.  Otherwise this scratch file will not be opened.

An unformatted scratch file (unit 14) may be used for temporary reflection storage when initial calculation of the overall scale and thermal parameters, or of the amplitude weighting coefficients, is required.

An unformatted scratch file (unit 11) will be opened to store the approximation to the normal matrix where contributions to the off-diagonal terms are included for the energy restraints and 3x3 blocks are used for the contribution from the position all parameters of the atoms.  All other off-diagonal terms are taken as zero.  This file is read several times during the solving of the normal equations (see variables SFTLIM, CGFACR and GSFACR in section 3.1.3).


RESTRAIN is designed to check the input data, and to either print out a message informing the user what the problem is and what corrective action has been taken, or in more severe cases to print out a message and stop, as continuation would be useful in these cases.  These messages are usually preceded by '***'.  Much care has been taken to make the messages as informative as possible and thought has gone into the detection of illegal combinations of refinement options (see section 2.3.1).  Obviously it is impossible to allow for all eventualities, so if you find an error that is not covered or you do not understand then please seek assistance.  When starting up a refinement use low values for RWDMIN (the weighted differences between observed and calculated structure factors) and RWLMIN (the weighted differences between observed and ideal distances) to obtain as much information as possible about the input reflections and coordinates respectively (see section 3.1.3). In the following paragraphs some common errors are described.


Depending on the number of atoms and residues in your structure a suitably dimensioned version of RESTRAIN will have to be used.  The array dimensions of RESTRAIN dealing with problem specific variables are set using the PARAMETER statement.  They are printed in each listing immediately after the program title.  Exceeding the boundaries will produce a message telling which parameter to increase and a run termination.  Recompilation will then be necessary.  If you are not sure what to do seek assistance.


Make sure that protein chains are terminated with TER records. 


Reflection file errors are often caused by format errors when reading formatted files.

When using weighting schemes with the standard deviation or when using MIR or MIRAS phases you must have these present in your reflection file.


Illustration of input.

This is not intended to be a working example; it contains all the commonly used options together in the same script, and is meant to illustrate the available options.  Most restrain scripts are nowhere near as long as this one!  Just change the filenames and column labels, and delete the other bits you don't need.  Note that the script below will apply both geometric and thermal parameter (isotropic or anisotropic as appropriate) restraints by default.

set r=$0:r
time restrain <<EOF
TITLE  Illustrating all the options in one script!
! First define the input and output files (can also do it on command line).
! All input is free format, order and letter case of keywords don't matter.
XYZIN  hexpep.brk      ! Check section 3.3 for preparation guide.
TLSIN  hexpep.tls      ! Needed for group thermal parameters.
                       ! Described in detail below.
HKLIN  hexpepf.mtz
LABIN  FP=FP_hexpep SIGFP=SP_hexpep FREE=FreeR_flag
XYZOUT $r.brk
TLSOUT $r.tls
HKLOUT $r.mtz
LABOUT FC=FC_hexpep PHIC=PC_hexpep
! ANISO creates individual atomic anisotropic thermal tensors (high res.only!).
ANISO  327.CA                 ! This will match either Calpha or calcium.
ANISO   10.  50.              ! Residues 10-50, all atoms.
ANISO  200. 250. ' CA' ' CB'  ! Calpha's (but not calcium!) & Cbeta's only.
ANISO  100. 150. mnch         ! Main chain atoms only.
ANISO  151. 190. sdch         ! Side chain atoms only.
! NCSYMM defines NCS operators (3 molecules/a.u. here; identity is assumed).
NCSY   POLAR   25.563   87.995  127.906  ! Can also say "NCSY MATRIX ...".
NCSY   TRANS  100.076   -3.502    9.137  ! Use lsqkab to get these.
NCSY   POLAR   65.746  117.435  180.153
NCSY   TRANS  119.479   46.151   31.805
! OCCU allows occupancies in PDB file to be used, and creates occupancy groups.
! Here group A consists of 4 atoms with 3 coupled occupancy parameters,
! i.e. their sum is constant.
! Group B consists of 6 atoms with one free occupancy parameter.
OCCU   101.CG  4  A   1  ! First atom id, no. of atoms, group id, coupling id.
OCCU   151.CB  5  B
OCCU    51.SG  1  B
OCCU   251.CG  4  A   2
OCCU   201.CG  4  A   3
! RIGID defines rigid bodies.
RIGID   10.  50.  A  ! Residues 10-50, all atoms, rigid group A.
RIGID  200. 250.  A  ! More atoms in group A.
RIGID  100. 150.  A  ! Yet more.
RIGID  151. 190.  B  ! These are in rigid group B.
! XTRD defines extra distance restraints.
! Here's a real example with a disordered cystine.
XTRD    18.N   618.CB  2.455  0.034  ! Atom 1  Atom 2  d  [sigma(d)]
XTRD    18.CA  618.CB  1.530  0.020  ! Residue 618 is an alternate s/c of 18.
XTRD    18.CA  618.SG  2.822  0.043
XTRD    18.CB  622.SG  3.034  0.059
XTRD    18.SG  622.CB  3.034  0.059
XTRD    18.SG  622.SG  2.030  0.008
XTRD    18.C   618.CB  2.504  0.038
XTRD    22.N   622.CB  2.455  0.034  ! Residue 622 is an alternate s/c of 22.
XTRD    22.CA  622.CB  1.530  0.020
XTRD    22.CA  622.SG  2.822  0.043
XTRD    22.C   622.CB  2.504  0.038
XTRD   618.CB  622.SG  3.034  0.059
XTRD   618.SG  622.CB  3.034  0.059
XTRD   618.SG  622.SG  2.030  0.008
! "Steering data" follows STEER keyword (uses simulated Fortran NAMELIST).
NCYC=8, CYCNO=21, SCHEME=5  ! May want to modify these.

Here is an illustrative example of a TLSIN file (group thermal parameters):
UANISO                       ! Overall anisotropic tensor.
DEFAULT                      ! Defines default values,
                             ! i.e. may be overridden.

UANISO N-domain              ! Group anisotropic tensor just for one domain.
RANGE    1.  180.            ! Domain consists of 2 contiguous segments.
RANGE   98.  327.

TLS    C-domain              ! TLS tensor for other domain.
RANGE  191.  290.            ! This domain has just one segment.

TLS    A-helix               ! TLS tensor for helix main chain.
RANGE   30.   55.  mnch
NOATOM                       ! Don't refine atomic Uiso's for this group.

RANGE   99. '' sdch          ! U tensor for individual side chain.
NOATOM                       ! Don't refine atomic Uiso's for this group.

UISO                         ! Can also do group isotropic tensors.
RANGE  100.  130.  sdch      ! Side-chains of residues 100-130 will have
RESIDUE                      ! separate group Uiso's.

Note that you don't need to put in any values for the tensor components; the program will supply sensible defaults for any undefined tensors.  Once the job has been run, the refined values of all tensor components will be put in the TLSOUT file ready for the next run.  Here is an example:

! APP ANISO/TLS AT 2.1Å no refine Creact.
! Output from refinement cycle   5

UANISO Polypro helix.
RANGE   1. 9. ALL
U        0.0053 -0.0086  0.0033 -0.0140  0.0044  0.0007
!       (0.0112)(0.0127)(0.0204)<0.0064>(0.0063)(0.0066)

TLS    Alpha helix.
RANGE  13. 32. ALL
ORIGIN -3.103  -8.863   3.788
T        0.2217  0.1885  0.2016 -0.0052  0.0067  0.0045
!       <0.0083><0.0090><0.0145>(0.0053)<0.0052>(0.0052)
L          0.62    1.67    2.23   -0.45   -0.15    0.23
!       (  1.12)<  0.46><  0.87><  0.44>(  0.82)(  0.47)
S        -0.025   0.000   0.045  -0.008   0.048  -0.034  -0.034  -0.066
!       ( 0.057)( 0.070)< 0.034>( 0.049)( 0.052)( 0.052)( 0.054)< 0.033>

If further refinement is necessary, such as after rebuilding, one would normally replace the old XYZIN and TLSIN files with the new ones, and insert new names for the output files.  In the script above this is done automatically by creating a script with a new name.  Also certain parameters would need to be updated, in particular any NCS operators, the overall scale factor G, the solvent background parameters, SB1 & SB2, and the F-weighting coefficients WF(2) ... WF(4).  The new values of all updated parameters are always printed at the end of the standard output, e.g.:


G=5.0972, SB1=3.6804, SB2=10.6158
WF(2)= 1.94021E+03, WF(3)= 1.06121E+01, WF(4)= 1.22706E-02

Unix example script found in $CEXAM/unix/runnable/

VMS example script found in $CEXAM/vms/


  1. Cruickshank D W J (1965) Computing Methods in Crystallography, (J S Rollett, ed.), pp. 112-116, Oxford, Pergamon Press.
  2. Driessen H, Haneef M I J, Harris G W, Howlin B, Khan G and Moss D S (1989) J Appl Cryst., 22, 510-516.
  3. Engh R A and Huber R (1991) Acta Cryst. A, -.
  4. Haneef I, Moss D S, Stanford M J and Borkakoti N (1985) Acta Cryst., A41, 426-433.
  5. Howlin B, Butler S A, Moss D S, Harris G W and Driessen H P C (1993) J. Appl. Crystallogr. 26, 622-624.
  6. Johnson C K and Levy M A (1974) in International Tables for X-ray Crystallography, Vol IV (Ibers, J.A. and Hamilton, W.C., eds.), pp. 320-332.
  7. Jones T A, Zou J Y, Cowan S W and Kjeldgaard M (1991) Acta Cryst., A47, 110-119.
  8. Moss D S (1981) Refinement of protein structures, Proceedings of the Daresbury Study Weekend, (P Machin, ed.), pp. 9-12, Daresbury, SERC.
  9. Moss D S & Morffew A J (1982) Comput Chem, 6, 1-3.
  10. Nielsen K (1977) Acta Cryst., A33, 1009-1010.
  11. Rees B (1976) Acta Cryst., A32, 483-488.
  12. Rollett J S (1965) Computing Methods in Crystallography, pp. 38-56. Oxford, Pergamon Press.
  13. Waser J (1963) Acta Cryst., 16, 1091-1094.


Alternative refinement program: