WATERTIDY (CCP4: Supported Program)


watertidy - rationalise waters at the end of refinement


watertidy xyzin refined-coords.brk distout distang-out.log xyzout tidied-coords.brk
[Keyworded input]


At the end of refinement it is useful to try to rationalise the H2O naming. You may have more than one molecule in the asymmetric unit; have two isomorphous structures, etc., and want to compare the H2O structures for them.

This program has two purposes.

  1. It moves the H2O coordinates to the symmetry related position nearest to the host molecule.
  2. It attempts to design an H2O naming system which gives some information about the residue which a particular H2O is hydrogen bonded to. The user inputs chain IDs for host chains and assigns an output ID for the H2Os bonded to this chain.

The distance search is done with the program DISTANG, which must be run first. WATERTIDY then reads in the DISTANG output ("log file") which lists all close contacts, and does some preliminary analysis of H2O contacts (e.g. contact too close, C involved in close contact, number of contacts per chain).

This generates another problem; what to do about H2Os which are bonded to more than one host atom? The solution used here is to list such H2Os more than once, giving the site closest to a host atom the input occupancy, and all secondary sites occupancy <occw> (default value 0.01, see keyword OCCW).

The program can be run first to find the H2Os linked to the protein molecule, then a second or third pass would attempt to apply the same rules to renaming H2Os in a second or third solvent shell which will not have been renamed at all in the previous pass.

All non relabelled atoms are output exactly as input.

WATERTIDY names the waters with the appropriate output ID and a label containing information about which residue and atom type the water is H-bonded to. An H2O is labelled in the output PDB file as

O<i><j> WAT <chnid> <nres>
where <nres> is the host residue number and <chnid> is the assigned output ID. <i> and <j> are defined as follows:
  1. If the host atom belongs to a protein residue the number <i> (range 0-9) defines the bonding atom type as follows:
          0 for N 
          1 for O
          2 for OG OG1 
          3 for OD1 ND1
          4 for OD2 ND2
          5 for OE OE1 NE1  
          6 for OE2 NE2
          7 for NZ        
          8 for OH OH1 NH1 
          9 for OH2 NH2
    Additional assignments for <i> are made as follows:
          0  for OW
         <n> for O<n> or OW<n> where n=0-9
         <n> for O<n><m> where n,m=0-9
    The number <j> (range 0-3) numbers the contact of the H2O to the protein atom; up to <hbond> H2Os can be bonded to one atom. An extension to allow other acceptor atoms (e.g. C S etc.) means that the numbering has to be modified slightly.
          0 for CA        as well
          1 for C         as well
          2 for CG CG1    as well
          3 for CD CD1    as well
          4 for CD2 CD3   as well
          5 for CE CE1    as well
          6 for CE2 CE3.. as well
          7 for CZ        as well
          8 for CH CH1    as well
          9 for CH2 CH3.. as well
  2. If the host atom is another H2O the number <i> will be the same as that of the host atom.
    The number <j> (range 4-6) numbers the contact of the H2O to its host for the second shell; up to 3 H2Os can be bonded to one atom and <j> is offset to the range 4-6 to make it clear which H2Os are in the second shell.
    The number <j> (range 7-9) numbers the contact of the H2O to its host for the third shell; up to 3 H2Os can be bonded to one atom and <j> is offset to the range 7-9 to make it clear which H2Os are in the third shell.
    For molecules with non-crystallographic symmetry there is no guarantee that the <j>-th number for one related chain will be the same as that for the other.

When you have assigned as many shells as you feel are needed, resort the output water atoms of the PDB file on <chnid>, residue number, etc., using the system sort utility. On Unix, this sorts on <chainid> first, then residue number then atom number:

sort +4 -5 +5 -6 +3.1 - 3.3 wat.pdb > wat_sorted.pdb
A VMS example is
$SORT/KEY=(POS:21,SIZE:6)/key=(pos:15,size:1)/key=(pos:16,size:1) - /key=(pos:7,size;6) DSCR:DPI047R2.pdb DSCR:DPI047R2.pdb
BEWARE: Your CRYSTAL and SCALE cards will be scrambled by the sorting.



Input coordinate file in PDB format.
Output log file from the program DISTANG. The program reads the list of distances included in the log file, and ignores the rest.


Output coordinate file in PDB format. Water atoms will be relabelled as described above, and may have been moved to a symmetry-related position. Water atoms which bond to more than one host atom will be duplicated, with second and subsequent entires having occupancy <occw>.


Available keywords are:


ACCEPT <id> ...

Specify extra acceptors: single character atom types, default O N.

CHNID <chainid> [ WATOUTID <id> ] [ RANGE <residue1> <residue2> ]

The host chain id (the chain identifier for the <ich>-th host chain), as it appears in XYZIN e.g. A or B.
A single character label for the water chain bonded to <chainid>, to be used in XYZOUT.
<residue1> <residue2>
The starting and ending residue numbers for the host chain. This range is necessary if the chain is not numbered 1, 2, 3... or if you have more than one chain.

HBOND <hbond>

Maximum number of waters bonded to one atom, default 4.

OCCW <occw>

Occupancy for secondary sites (default 0.01). If <occw> is set to 0.0 then secondary sites are not written to XYZOUT.

SHELL <shell>

Specify the shell number (up to 3), default 1.

SYMMETRY <SG name> | <SG number> | <operators>

Standard symmetry specification. This must be the same as used for DISTANG.

TITLE <title>

<title> is written to output PDB file as a REMARK.

WATID <id>

Water chain id. The chain identifier for unassigned H2Os to be assigned in this pass, as it appears in XYZIN.


Terminate input.


Example of output file

SCALE2       0.00000   0.03820   0.00000        0.00000
SCALE3       0.00000   0.00000   0.01937        0.00000
SCALE1       0.01897   0.00000   0.00099        0.00000
ATOM      1  N   GLY A   1      -8.094   0.714  38.861  1.00 19.52
ATOM     18  C   VAL A   3     -10.635   2.653  34.037  1.00 15.79
ATOM     13  N   VAL A   3      -8.153   2.210  33.953  1.00 16.23
ATOM     25  N   GLU A   4     -10.661   2.145  35.262  1.00 13.58
ATOM     28  O   GLU A   4     -12.831   4.702  36.359  1.00 15.64
ATOM     21  OE1 GLU A   4      -9.572   0.074  36.837  1.00 30.05
ATOM     20  OE2 GLU A   4     -11.042  -1.224  35.968  1.00 32.63
ATOM    769  O00 WAT P   1      -8.453  -1.913  39.350  1.00 45.10
   A H2O bonded to the N of GLY A 1...
ATOM    772  O00 WAT P   3      -7.612  -0.514  34.997  0.01 22.90
   A H2O bonded to the N of VAL A 3...
ATOM    750  O10 WAT P   4     -14.304   4.121  38.925  1.00 25.25
ATOM    772  O50 WAT P   4      -7.612  -0.514  34.997  1.00 22.90
ATOM    795  O04 WAT T   3      -5.847  -2.930  35.432  0.01 30.04
ATOM    749  O14 WAT T   4     -11.391   4.228  40.350  1.00 32.06
ATOM    811  O15 WAT T   4     -14.681   2.966  41.308  1.00 56.74
ATOM    795  O54 WAT T   4      -5.847  -2.930  35.432  0.01 30.04

Unix example script found in $CEXAM/unix/runnable/


distang, pdbset, sort (1)