MTZ2VARIOUS (CCP4: Supported Program)


mtz2various - produces an ascii reflexion file for MULTAN, SHELX, TNT, X-PLOR/CNS, MAIN, CIF or user-defined format. This may contain amplitudes, intensities or differences.


mtz2various hklin foo_in.mtz hklout foo_out
[Key-worded input file]


This reads an mtz file (assigned to HKLIN) and produces an ASCII file (assigned to HKLOUT)in a suitable form for MULTAN, SHELX, TNT, X-PLOR/CNS, MAIN or in a user-defined format. For SHELX output all quantities are given as intensities, ie F and delF terms are squared. An mmCIF file can also be produced with all the relevant information taken from the MTZ header.
There are many options controlled by the assignments on the LABIN line. The most common requirements are:
Generate a list of h k l F or h k l I. If anomalous data is present, hkl and -h-k-l will be output on seperate lines.
If only FP, SIGFP or IP, SIGIP are assigned on LABIN, hkl FP SIGFP or hkl IP SIGIP is output.
If FP,SIGFP and DP, SIGDP are assigned, then F+ and F- are reconstructed, and 2 reflections,, hkl and -h-k-l, are output (X-PLOR,SHELX and CIF formats only)
If F(+),SIGF(+) and F(-),SIGF(-) or I(+),SIGI(+) and I(-),SIGI(-) are assigned, then again 2 reflections are output, hkl and -h-k-l .
If FP, SIGFP and FPH, SIGFPH are both assigned, then hkl |FP-FPH| SIG|FP-FPH| is output (not applicable for USER and CIF). This can be useful when solving heavy atom positions via direct methods.
If DP, SIGDP are assigned, and FP, SIGFP are NOT assigned, then hkl |DP| SIGDP is output (not applicable for USER and CIF).This also can be used to solve for anomalous scatterers using direct methods.
The same result can be obtained by assigning FP to FPH(+) and FPH to FPH(-). Then hkl |F(+) -F(-)| SIG|F(+) -F(-)| is output.
There is no guarantee that the reflection count is completely robust. Files sometimes have been slightly corrupted; eg DP not present but F(+) and F(-) there. I have TRIED to make sensible decisions in ambigous cases.
When using OUTPUT USER you should get what you want, no tricks.


The allowed keywords are:


Compulsory input keywords are OUTPUT and LABIN.


The output types are as follows:
The output file has h, k, l, f, imt in FORMAT(3I4,7X,F7.0,I6), where imt=0 for a good reflection.
The output file has the SHELX header followed by all h, k, l, "I", sigma"I", 1 in FORMAT(3I4,2F8.2,I4). Reflections previously excluded from refinement for FreeR analysis are flagged with the word FREE at the end of the line. This means they can be easily extracted from the SHELX file if desired.
NB: The SHELX programs expects intensities, so even if you assign input F terms the program will automatically perform the conversion (see the FSQUARED and SCALE keywords). SHELX is usually used to find heavy atom sites. If FP and FPH are assigned, the program calculates the Diso difference |FP - FPH| and outputs its squared value; |FP - FPH|^2 and an appropriate SIGMA. If you wish to use anomalous differences as input, you can EITHER assign FP=FPH(+) and FPH as FPH(-), which signals the programs to output |FPH(+) - FPH(-)|^2 squared, OR assign DP=DPH in which case the program will output DPH^2.
The output file has 'HKL ', h, k, l, F, sig(F), phase, fom in format(A4,3I4,3F8.1,F8.4), with phase = 1000, fom = 0 i.e. dummies. Note that files for TNT must be sorted on h, k, l and certain reflection zones are required. You may need to run CAD to resort your data. Use keywords INCLUDE FREER <num> and EXCLUDE FREER <num> to generate files for R-free calculation.
There is a maximum likelihood version of TNT from Pannu and Read which requires a free-R flag (in Xplor convention). This column will be output if you assign the FREE column in LABIN and do not use the INCLUDE | EXCLUDE FREER options.
CIF <data block header>
CIF output is invoked, where <data block header> is a maximum of 80 characters long, and must begin with the characters "data_" (any mixture of upper and lowercase thereafter). OUTPUT CIF can be used to prepare data (from crystallography or EM) for deposition to the PDB.
Unlike the other output formats, all the reflections from HKLIN are written to HKLOUT. Not all column labels are appropriate for CIF output (see Notes on CIF). Also, only RESO, EXCLUDE SIGP and FREEVAL can be used with OUTPUT CIF. They are used to flag certain reflections but not to reject them. The others are ignored.
The output file has FORMAT(A,3I5,A,F10.1,F10.1,A,F10.2,A,I6...). The exact contents will depend on which labels have been specified by the LABIN keyword. See the documentation for FREERFLAG for a table explaining the differences in free R flag conventions.
Similar to XPLOR output. However, free R flags are left unchanged. To select the correct free R flag in CNS, you will need something like:
{===>} test_flag=0;
Anomalous data
For CIF, SHELX and XPLOR/CNS ONLY. If the anomalous difference is assigned (see LABIN), then the amplitudes for reflections h,k,l and -h,-k,-l are generated and output as separate reflections. In this case, the column ISYM may also be assigned if it is present: this is a flag from Truncate which = 0 if F comes from from both positive (hkl) and negative (-h-k-l) Bijvoet reflections, = 1 if only from F+ and = 2 if only F-
This gives output suitable for the MAIN program. The output file contains H K L FP SIGFP and optionally PHIB and FOM if they are specified on the LABIN line. Alternatively, if FC is specified on the LABIN line, then FP and FC are interpreted as the real and imaginary parts respectively of a calculated F, and output as a "COMPLEX" field.
This gives pseudo-SCALEPACK output which is needed as input to the SOLVE package. The output file assigned to HKLOUT is ASCII and writes out H K L I(+) SIGI(+) I(-) SIGI(-), with the format (3I4,4F8.1). The output may need to be rescaled to fit this format.
USER <format>
The output file is of the form H K L ? ? ... where the user can specify which columns are to be output, how many and in what format. Ten dummy labels (DUM??) are available to assign to any column and are output as real. Also, there are ten dummy columns (IDUM??) which are output as integer. The order of the data in the ASCII file are taken from the order of the program labels specified on the LABIN card e.g. LABIN FP=FP1 DP=DP1 SIGFP=SIG1 SIGDP=SIGDP1 would give the order H K L FP1 DP1 SIG1 SIGDP1 in the output file. The format must either be of a FORTRAN type with initially three integer items and the rest must be complementary with the LABIN card e.g.
  OUTPUT USER '(3I4,2F7.1,I4)'
to use free formatted output. However, all columns after H, K and L will be treated as real numbers.

LABIN <program label>=<file label>

Input labels accepted are:

        H, K, L           Indices
        FP, SIGFP         F and Sigma for native
        FPH, SIGFPH       F and Sigma for derivative
        FC, PHIC          F and Phase from model
        FPART, PHIPART    F and Phase from partial structure
        DP, SIGDP         Anomalous difference and Sigma
        I, SIGI           I and Sigma
        F(+), SIGF(+)     F+ and Sigma(F+) 
        F(-), SIGF(-)     F- and Sigma(F-)  used for anomalous output
        I(+), SIGI(+)     I+ and Sigma(I+) 
        I(-), SIGI(-)     I- and Sigma(I-) 
                          Partial F and Phase for bulk solvent correction
        W, FOM            Weights
        PHIB              Best phase (experimental)
        HLA,HLB,HLC,HLD   Hendrickson-Lattman coefficients
        FREE              FreeR flag
        ISYM              (see TRUNCATE)
        DUM??             Dummy labels (output as real)
        IDUM??            Dummy labels (output as integer)

Not all columns are used in the various output formats, see Notes on INPUT and OUTPUT. Also, the contents of the columns which are output may depend on which input columns are assigned by LABIN, see DESCRIPTION above.

Note: when using the DUM?? and IDUM?? labels, the program may generate warnings about column type mismatches. This may happen for instance if an anomalous difference (column type D) is assigned to one of the DUM labels (which is nominally of type R, i.e. 'any other real'). These warnings should be ignored, and the output is not affected.


End input.


If this flag is set, the program expects F and SIGF and will output I and SIGI: I = F*F, SIGI = 2*SIGF*F + SIGF*SIGF. These intensities are not necessarily the same as the measured intensities (pre-TRUNCATE) it is better to use the measured values if you have them.


followed by an integer <Nmon>. Every <Nmon>-th reflection within the resolution range is monitored (printed out).

RESOLUTION <resmin> <resmax>

Followed by 2 real numbers, <resmin>, <resmax>. This can be used to restrict the output data to the given resolution range.

SCALE <scale>

The F and SIGF (or I and SIGI) are multiplied by <scale> before output. This may be necessary if you are outputting F_squared into the fixed SHELX format.

INCLUDE <keyword> <value> ...

Each secondary keyword is followed by a number setting the appropriate limit for excluding data. Possible keywords are FREER.
FREER <num>
Include only reflections with FreeRflag = <num>. This is different from the FREEVAL keyword which specifies the freeR set. This will only be applicable if you have assigned the FREE column.

EXCLUDE <keyword> <value> ...

Each secondary keyword is followed by a number setting the appropriate limit for excluding data. Possible keywords are SIGP, SIGH, DIFF, FPMAX, FPHMAX, FREER. If DP is assigned without FP then the exclusion criterion for DIFF are applied to |DP|.
SIGP <Nsig1>, SIGH <Nsig2>
Reflections are excluded if: FP<(<Nsig1>*SIGFP), FPH(<Nsig2>*SIGFPH). Formerly MULTAN reflections were flagged and others unaffected but now not output to any format.
DIFF <difference_limit>
Reflections are excluded if |FP-FPH| (or |DP|) > <difference_limit>
FPMAX <maximum>
Give <maximum> value for FP.
FPHMAX <maximum>
Give <maximum> value for FPH
FREER <num>
Omit reflections with FreeRflag = <num>. This is different from the FREEVAL keyword which specifies the freeR set. This will only be applicable if you have assigned the FREE column.


The reflections with FreeRflag = <num> are treated as the freeR set: the default is 0 if FREE is assigned. This is important if you want to include a free-R test in your XPLOR/CNS or SHELX refinement, or you are using the Pannu-Read version of TNT. The FREE column must be assigned with LABIN.

MISS <valm>

By default, if any data associated with a reflection are missing, i.e. are represented in HKLIN by a Missing Number Flag (MNF), then that reflection will not appear in the output. However, if the keyword MISS is given then these reflections will be output, but with the MNFs converted to <valm>. The latter need not be given, and defaults to 0.0. The other exclusions are still effective.

Also, if MISS is present then when producing isomorphous data, i.e. |FPH-FP|, if either FPH or FP is a MNF then the difference is set to zero and the sigma is twice the measured sigma. For example; FP=MNF SIGFP=MNF, FPH=100 SIGFPH=10 then FPH-FP = 0 and SIG=20.

Notes on INPUT and OUTPUT

Not all INPUT columns are accepted with a particular OUTPUT format. If one has OUTPUT <subkw> then the allowed input columns are given below (see LABIN and OUTPUT) :
subkw = USER
accepts all input columns. Remember the format must match up with the column assigments i.e. assigments to IDUM must be output as integers, all others are treated as real. Warnings about mismatched column types when using DUM or IDUM labels can be ignored; see LABIN keyword.
subkw = XPLOR [or CNS]
accepts all input columns except DUM1 to DUM10 and IDUM1 to IDUM10 and I+, SIGI+, I- and SIGI-.
subkw = SHELX
accepts columns H to SIGFPH and FREE, (DP SIGDP without FP).
subkw = MULTAN
is like SHELX but will only use FREE to include or exclude reflections.
subkw = TNT
is like SHELX except for the use of FREE: if the INCLUDE FREER or EXCLUDE FREER keywords are specified then FREE is used to include or exclude reflections, otherwise the FREE column (if assigned) is output.
subkw = MAIN
accepts H, K, L, FP, SIGFP, PHIB, FOM, FC
subkw = CIF

You may still have trouble getting exactly the output you want. You can use the unix utilities cut(1) or sed(1) to manipulate the mtz2various output.

Notes on CIF

All reflections in the MTZ input file will be output to the CIF file. However, there are ways to flag certain reflections with the data type _refln.status. Observed reflections will be flagged with 'o'. Unobserved reflections, i.e. those flagged as missing, will be flagged as 'x'; these reflections will not be added to _reflns.number_obs. The 'free' reflections will be flagged as 'f'. The keyword FREEVAL can be used to indicate this set. Systematically absent reflections are flagged with '-'.

If the RESO keyword is specified then reflections at higher or lower resolution than the limits given, will be written with _refln.status 'h' or 'l' respectively. The limits will be written to the CIF as the values of _refine.ls_d_res_high and _refine.ls_d_res_low .

If EXCLUDE SIG is given then reflections for which F < <value>*sigma(F), and which satisfy the resolution limits (if given), will be written with _refln.status '<'. The value of _reflns.number_obs excludes all reflections which do not satisfy the condition on sigma(F). All other sub-keywords of EXCLUDE are ignored for CIF output.
N.B. The translation of the RESOLUTION and EXCLUDE SIGP conditions to _refln.status values does not imply that the the use of these conditions is good crystallographic practice. Be prepared to justify why you have excluded any data from your final refinement!

If DP is assigned, anomalous mode is selected, and reflections for which DP has been measured are written out as (hkl)/(-h-k-l) pairs. In this case, if intensities are available, then both I+ and I- must be assigned, or a warning will be printed, and the output CIF will not contain intensities.

Below is a list of the items output to the CIF file:





These items are the ones per reflection.

 _refln.wavelength_id     Always written
 _refln.crystal_id        Always written
 _refln.scale_group_code  Always written
 _refln.index_h           Always written
 _refln.index_k           Always written
 _refln.index_l           Always written
 _refln.status            Always written
 _refln.F_meas_au         FP
 _refln.F_meas_sigma_au   SIGFP
 _refln.F_calc            FC
 _refln.phase_calc        PHIC
 _refln.phase_meas        PHIB
 _refln.fom               FOM
 _refln.intensity_meas    I+
 _refln.intensity_sigma   SIGI+
 _refln.ebi_F_xplor_bulk_solvent_calc        FPART_BULK_S
 _refln.ebi_phase_xplor_bulk_solvent_calc'   PHIPART_BULK_S

mmCIF (at least at version 0.8) makes no provision for the output of derivative data in the same data block as native data. For more information about what these mmCIF categories are, check out the mmCIF dictionary.


    mtz2various HKLIN nicona HKLOUT dell.hkl << EOF
    RESOLUTION 10000 2
    EXCLUDE SIGP 0.01   # to exclude unmeasured refl.

    mtz2various HKLOUT $CCP4_SCR/toxd.hkl hklin $CEXAM/toxd/toxd <<EOF
    RESOLUTION 100 4
A runnable unix example script is in $CEXAM/unix/runnable/ A non-runnable unix example script which demonstrates mtz2various used to output anomalous data is in $CEXAM/unix/non-runnable/


mtzdump, f2mtz, cut(1), sed(1)


Eleanor Dodson, York University