mmCIF FORMAT: (CCP4: Formats)


mmCIF format for CCP4 - the mmCIF format as used in CCP4


The macromolecular Crystallographic Information File (mmCIF) format was developed by a working group of the IUCr formed in 1990. It represents an extension of the CIF format used by small molecule crystallographers, and which is used for automatic submission to Acta Crystallographica C. mmCIF files are text files with a flexible format based around either <data_name> <data_value> pairs or a loop structure (works like a table). In particular, a wide variety of data items are supported (as defined in the mmCIF dictionary), and character data values may be lengthy and descriptive. This alleviates many of the restrictions of the traditional PDB format.

Full details of the mmCIF format can be found on the IUCr mmCIF Page. Central to the format is the dictionary of allowed data items. Note that data items are grouped into categories. As of January 2002, the mmCIF dictionary is on Version 2.0.03. The dictionary is designed to be extensible, and new data items are added with new versions.

An mmCIF dictionary is distributed with the CCP4 suite as $CCP4/lib/data/cif_mm.dic, consisting of the standard mmCIF dictionary together with some additional data items required for data harvesting and some data items for TLS refinement. The CCIF software library uses a binary symbol table representation of the mmCIF dictionary which is produced during the CCP4 build.

The mmCIF format is currently used in the CCP4 Suite in the following ways:

  • Data Harvesting: A limited number of programs write out data harvesting files into a subdirectory of HARVESTHOME (which defaults to the home directory) for subsequent transfer to deposition sites at the time of structure deposition. These files are in mmCIF format.
  • The CCP4 distribution includes Peter Keller's CCIF software library for reading and writing mmCIF files. Some of the harvest files are produced using this library.
  • The CCP4 distribution also includes a set of library routines which perform a similar function for mmCIF as the rwbrook library does for the PDB format.
  • Reflection files in mmCIF format can be created by the program MTZ2VARIOUS. This format is suitable for deposition of structure factors.
  • Version 2.7 of RASMOL will read and display coordinate files in mmCIF format.
  • An emacs lisp file for a CIF major mode is distributed as $CCP4/include/cif.el.
  • The refinement program REFMAC (from version 5 onwards) stores restraint information and other intermediate files in mmCIF format.
  • The forthcoming MMDB software library for coordinate data will read and write coordinate files in mmCIF format, as well as PDB format and an internal binary format.

Overview of some useful mmCIF categories

The following categories cover the information typically held in CCP4-PDB files:
cell dimensions (replacing CRYST1 card).
spacegroup name or number (not always included in CCP4-PDB files).
cell transformations (replacing SCALEx cards).
atom site information (replacing ATOM,HETATM,ANISOU cards).
pointed to by _atom_site.label_alt_id and gives more information on alternative conformations: _atom_site.label_alt_id is sufficient for programs in their current form.
this contains alternate_exclusive data items to those in category ATOM_SITE: in general, it is simpler to use the latter. However, when there is anisotropic U data for only a small subset of atoms, e.g. for metal ions only, then it might be more convenient to use a separate category.

In addition, the following categories are also useful:

information on how the file was created and subsequently modified.
define polymer/non-polymer/water entities.
sequence information. Ideally, this should correspond to the sequence in the ATOM_SITE category, although there are exceptions, e.g. if the latter describes a temporary poly-ALA model.
describes contents of asymmetric unit.
describes disulphides, salt bridges and hydrogen bonds. The first would be useful for protin.

See Also

The imgCIF Dictionary
The image CIF dictionary (imgCIF) is a CIF dictionary of data names required by the Crystallographic Binary File (CBF) image representation project. imgCIF/CBF is an initiative to extend the IUCr CIF concept to cover efficient storage of 2-D area detector data and other large datasets.
The Symmetry CIF Dictionary
The symmetry CIF dictionary (symCIF) is a supplement to the Core dictionary designed to provide the data names required to describe crystallographic symmetry.