THE UPACK PROGRAM PACKAGE

Bouke P. van Eijck, Department of Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands. E-mail: vaneyck@chem.uu.nl This is version 5.4, January 2004.

Download

You can obtain the program via anonymous ftp. It is distributed in the form of a compressed tar file (download). See under Installation for further information.
Before deciding to use UPACK it is advised to read some recent lecture notes, which can be viewed immediately in PDF format. This should provide background on the possibilities and limitations of crystal structure prediction.

Contents

Introduction Overview Installation

Procedure

Detailed instructions for the preliminary stage Force fields Detailed instructions for the search stage References
 
 
 

Introduction

The program suite UPACK (Utrecht Crystal Packer) is made available under the following conditions: The programs are under constant development. Please report any problems encountered, including the version date.

The program package is designed to construct crystal structures of low potential energy, using a molecular force field. It is at present restricted to space groups with triclinic, monoclinic or orthorhombic symmetry. The programs have been designed with simple carbohydrates as target molecules, but should be usable for other chemical classes if the user is prepared to extend some default parameter files. For reviews on crystal structure prediction see Refs. [1] and [2].
 
 

Definitions

The substance to be studied is characterized by a group of separate molecules. Such a group is the smallest possible repeating unit, for instance ABB in a dihydrate where A is a certain molecule and B is water. The constituent molecules are building blocks for the generation of hypothetical crystal structures. Generally several distinct molecular conformations are thinkable. It is possible to specify their geometries precisely, and to investigate a list of those that are deemed likely to occur. Alternatively, one may rely on automatic generation of possible conformations. This is especially useful for hydroxyl groups where the possible crystal conformations are hard to predict.

In the group a number of bonds may occur about which internal rotation is possible. We shall denote the number of rotatable hydroxyl groups by Nh and the number of other rotatable bonds (not connecting a hydrogen atom) by Nz. The symbol  will be used for the dihedral angles. Furthermore, let the group contain Mg molecules and Ag atoms. Of course, all these properties of the group are independent of the actual molecular conformation and crystallographic environment.

To generate a crystal, a number (Gi ) of independent groups is placed in a certain space group which has S general equivalent positions. We shall exclude the possibility that a symmetry element of the group coincides with a symmetry element of the lattice: that lattice symmetry element is disregarded, and the crystal structure is described in a space group with lower symmetry. Thus the Gi groups always coincide with the asymmetric unit, and the total number of groups is S Gi . The number of really nonequivalent molecules in a unit cell will be denoted by Z''. In the example of the dihydrate ABB it can be seen that Z'' must be between 2 and 3Gi .
 
 

Procedure

Possible crystal structures are generated by placing molecules with varying positions and orientations into a unit cell of varying dimensions. This gives at most twelve (P-1) and at least eight (e.g. Pna21) parameters to be varied; this number can be considerably larger if the molecules are flexible or if more than one independent molecule must be placed. One of the cell parameters can be eliminated by using an approximate crystal density.

In the older versions of UPACK a systematic grid search was used [3]. However, we have now found that a random search performs at least equally well, and is easier to handle in practice [16]. Therefore, the main part of this manual covers only the random search method, and the grid search is treated separately. Both approaches are fairly time-consuming, and require a simple force field with a low cutoff radius for nonbonded interactions. At the end a preliminary list of possible structures is obtained. In a final stage the energy minimizations can be continued with a better and more elaborate force field.

The calculations must be repeated for all probable space groups. Fortunately, relatively few space groups account for about 90% of the actually observed structures for organic molecules with Z'' = 1: P21/c, P-1, P212121, P21, Pbca, C2/c, Pna21, Cc, Pca21, C2, P1, Pbcn, Pc. If the substance is chiral, only P1, P21, P212121, and C2 remain. For a group with internal symmetry the generated structures often belong to a space group with higher symmetry. The determination of the actual space group of these structures is not possible in UPACK; programs like PLATON [4] or CERIUS [5] may be employed.
 
 

Coordinate systems

Except for input and output, where fractional coordinates may be used, the programs work with Cartesian coordinates in nanometer units. The crystallographic a-axis coincides with the Cartesian x-axis and the b-axis is positioned in the x-y plane. The crystallographic axes are projected on the Cartesian axes and the components ax, bx, by, cx, cy, and cz are used to characterize the unit cell.

The crystal is built from rigid molecules, defined in a local axes system with aid of three reference atoms for each molecule. The axes system is chosen in such a way that:

The position of this molecule in the crystal is determined by translation and rotation of the local coordinate system. More precisely, any atomic position vector (x, y, z) in the crystal is found by rotating the corresponding vector (x0, y0, z0) in the free molecule over an angle  around the z-axis, over around the x-axis, over  around the z-axis, and finally translating over X, Y, Z: with
 

Energy calculations

Energy minimization is the central technique of the method. The energy is always reported in kJ/mol for one group (although other units may be used in the force field files). The programs were developed with routines from the GROMOS molecular dynamics simulation program package [6] as basis, and we are grateful to Professor W. F. van Gunsteren for his permission to do so.

As discussed above, in the searching stage fast calculations are essential. A Lennard-Jones 12-6-1 potential is used here (possibly even without atomic charges, to speed up the calculations). In the final stage Ewald summation may be used for the Coulomb and attractive r -6 dispersion terms, and a Buckingham-type exponential repulsion may be used instead of the r -12 term. Atomic polarizabilities may also be included.

To allow fairly reliable calculations of the Coulomb energy without Ewald summation, the molecules can be split up into charge groups of (almost) zero net charge. The cutoff criterion is based on charge groups rather than on individual atoms, thus making the electrostatic interaction decaying at long distance with r -3 rather than r -1. To allow for small errors in the force-field setup, the total molecular charge is set to zero by adding equal amounts to each atomic charge.

The energy minimizations are performed by a number of steepest descents steps, followed by a more or less exhaustive conjugate gradients procedure. The use of pair lists for nonbonded interactions (as well as lists of vectors in reciprocal space for Ewald summations) is essential for stable convergence [7]. These lists are only updated upon a significant change of any geometric parameter.

Unavoidably, the same structures will be encountered repeatedly, probably with different choices for the unit cell parameters. Of course, it is very important to eliminate such equivalent structures by a clustering procedure. A general algorithm, based on comparison of radial distribution curves, is available.
 
 

Overview

The general course of a crystal structure prediction is as follows.

1. For every substance the molecule(s) in the group must be defined through a topology file, which also assigns a code to every force field contribution (like bond angles, atomic charges, and so on). To this end the program codes can be used. Later the actual values of the force field parameters are introduced by selecting a force field file.

2. For every conformation to be studied separately, a coordinate file for the constituent free molecule(s) has to be prepared. The program prep may be helpful. Note that often one input conformation can suffice, since rotational freedom about single bonds can be taken care of automatically.

3. The structures are generated by a random search for one or more independent molecules by the program pack12. Each molecule is placed with random orientation, random position, and (possibly) random rotatable dihedral angles in a crystal with random cell axes and cell angles. The energy of the generated structures is provisionally minimized with full geometry relaxation (non-rigid molecules).

4. The results are clustered by the program dist. This program also determines the number of crystallographically independent molecules (Z'').

5. The non-rigid minimization is continued with the program pack3. There is the possibility of relaxing the space group symmetry by expanding to P1, and to perform a primitive molecular dynamics "shake-up". In this way the stability of the obtained crystal structures can be investigated. It is also possible to study the effect of lattice vibrations on thermodynamical quantities [17].

6. Clustering delivers results which may be considered as final, or used for further study in a subsequent stage with a more sophisticated force field (e.g. explicit hydrogen atoms, more accurate Ewald summation, inclusion of polarization; or, using other program packages, atomic multipoles and ab-initio energies). A word of warning: it cannot always be taken for granted that there is a one-to-one correspondence between the crystal structures in different force fields.

If the experimental structure is known, one will want to know whether or not it was found by the packing procedure. The program distcan be used for this purpose too.

Molecular dynamics calculations can be carried out individually for one structure with aid of the program crysmd.
 
 

Script files

Files for input and output have fixed names in the various programs, to be described in detail below. These file names should be linked to actual files present on disk. To this end a number of C-shell scripts is available which link the necessary files and execute the programs. In this manual a certain convention for directory structure and file names will be used; other users may prefer to create a different system.

File names are given in brown color. Instructions to be given for executing the script files are red. Many programs require code-directed input: on each input line a code word is followed by a number of numerical values which are identified by colored names. To aid in understanding the Fortran sources, these names generally (but not quite always) correspond to identifiers used in the programs. The symbol # indicates that a certain number must be inserted in that place.

A script file to execute a certain program typically looks like this:
runprog [u/a] [s/d] [filenum] [subst] [coord] [ffname]
Here runprog is one of the execute scripts, to be discussed in detail later. Some or all of the following parameters must be specified; entries between square brackets [...] are optional. A short explanation:
u/a: use a force field for united/all atoms
s/d: calculate in single/double precision
filenum: a number (#) referring to a pack-type file (vide infra)
subst: a chosen name for the substance under study
coord: the name of a coordinate file
ffname: the name of the force field to be used
 
 

Input files

Most input files must have a precisely defined structure, but some are code-directed: parameters then have default values unless specified by a line ("card") starting with one of a well-defined set of code words. An overview is given here, while a detailed description is given in a later section. Some explanation is also given as comments at the beginning of the subroutines.

The following files are more or less general, and have to be updated only incidentally:

For every substance the following files must be prepared: Optionally, the default specifications can be changed by entries in the files: If an experimental structure is used for comparison, we also need: Output files

The generated structures are collected in parameter files pack.#, the number # indicating the stage of the calculation in which the file was prepared:

# = 10: first stage (from pack12)
# = 20, 30, ...: subsequent stages (from pack3)

These files contain general information about the generated structures. Detailed atomic coordinates are delivered in corresponding files pack.19,pack.29, ..., respectively. These files generally contain equivalent solutions, which can be removed by the clustering program dist. This increases the file numbers by 3 (e.g., pack.30 is clustered to pack.33).

It is also possible to produce individual coordinate files of the hypothetical crystal structures in formats mdi, spf,fdat and/or cssr (for use in other modeling packages).

General output information is given on a file out.pri, which serves as a record of the simulation conditions. This file is usually linked to pack.pr#, clus.pr, and so on. There is also output to the screen to signalize error conditions, and to obtain (optionally) more output for debugging purposes.
 
 
 

Installation

The following directions apply to Silicon Graphics UNIX systems. Little testing on other systems has been done, but all programs are in simple Fortran77 and should be fairly portable. Deviations from the standard are: If the program package is delivered in an uu-encoded file upack.tar.gz.uu, decode it:
uudecode upack.tar.gz.uu
This delivers upack.tar.gz. Decompress this file:
gunzip upack.tar.gz
Finally, unpack the resulting file in your home directory:
tar xvf upack.tar
This should create a subdirectory upack directly under your home directory, with subdirectories progs, scripts, tops, and data. Note that any existing files with the same names are overwritten without warning!

The subdirectory progs contains:

The subdirectory scripts contains: To be able to call these scripts from other directories, upack/scripts should be in the path.

To compile and link the programs, call
compall-unix
or
compall-linux
in subdirectories progs as well as in progs/double . This should create various object files (*.o) and executables (*.e) in these directories.

The subdirectory tops contains the force field files (ending with .ff ), the codes file all.cod, and the space group file all.spg.

The subdirectory data is meant to contain your data files. It should contain a subdirectory subst for every substance to be investigated, and here the topology files subst.tpu and/or subst.tpa should be placed. The best policy is to create sub-subdirectories for every space group to be considered. If you wish to use experimental data for comparison, they should be placed in a sub-subdirectory exp.

Two subdirectories under data, named after their CSD refcodes, are already included for demonstration and testing purposes. The example for the random search is ethanol which has two independent molecules in space group Pc. It is an exceptionally favourable case where the structure generation proceeds quickly. See below for a detailed prescription.

The other example is in data/glucsa/p212121, and is meant to demonstrate the grid search. The substance is -D-glucose, and the computing time is artificially and dishonestly reduced by considering only eight Euler angles close to the experimental structure rather than scanning all 504 orientations which would be considered in a real search. Also the maximum number of structures used in later stages is unrealistically low.
 
 
 

List of files containing programs
 
ycodes.f Creates framework for topology files
ypack1.f Generates structures with one independent molecule
ypack2.f Generates hydroxyl torsional angles
ypack12.f Generates structures with more than one independent molecule
ypack3.f Minimizes the energy for non-rigidmolecules
yclus.f Removes equivalent structures with one independent molecule
yzoek.f Searches for one equivalent structure with one independent molecule
ydist.f Combines the latter two functions for the general case
yprep.f Minimizes the energy for one structure
ycrysmd.f Performs molecular dynamics calculations for one structure.

List of files containing subroutines or functions
 
ybonde.f cbond, choek, cdiha: calculates intramolecular energies
ycart.f cart: constructs coordinates for a free molecule
ych.f ch: calculates eigenvalues and eigenvectors (from EISPACK)
yconnect.f connect: tries to assign atoms from mislabelled coordinates
ycongra.f congra: performs conjugate gradients energy minimization
ydclus.f dclus: compares possibly equivalent structures
ydens.f dens: estimates initial density for pack12
ydiff.f diff: compares possibly equivalent structures
ydihan.f dihan (function): returns the value of a selected dihedral angle
ydlist.f dlist: makes list of nonbonded distances for dclus
yexpsym.f expsym: expands symmetry operations to the full group
yfiles.f files, fdat: makes mdi, spf, and/or dat type of coordinate file
yfunc0.f func0: calls energy routines for a free rigid molecule
yfunc1.f func1: calls energy routines for a rigid molecule in the crystal
yfunc2.f func2: calls energy routines for a rigid molecule with rotatable hydroxyl groups in the crystal
yfunc3.f func3: calls energy routines for a flexible molecule in the crystal
ygeom.f geom: calculates and compares geometric parameters
yintre.f intre: calculates intramolecular energy
yintrot.f introt: rotates a selected hydroxyl group
ykwald.f kwald: calculates reciprocal part of Ewald correction
ylatvib.f latvib: calculates lattice vibrations
yleesc.f inpt (function): reads and stores card-directed input
yleesmd.f leesmd: reads data for molecular dynamics run
yleesr.f leesr: reads general space group information
yleess.f leess: reads coordinates and expands equivalent positions
yleest.f leest: reads molecular topology information
yleesx.f leesx: reads geometry information, places hydrogen atoms
ylist.f list: calculates nonbonded energy and forces from pair list
ylistk.f listk: calculates reciprocal part of Ewald correction from list
ylistp.f listp: as list, with polarization
yltw.f nword (function), realstr, intstr: internal read procedure
ynewchg.f newchg: read charges from file  charges.cssr 
ynonbe.f nonbe: calculates nonbonded energy and forces
ynonbep.f nonbep: as nonbe, with polarization
yopth.f opth, listh: calculates energy for various hydroxyl torsional angles
yrunmd.f runmd, stopcm: performs molecular dynamics run
ysecder.f secder: calculates second derivatives for lattice vibrations
yshake.f shake: keeps bond lengths constant during MD run
ysort.f sort: sorts data in ascending order
ystedes.f stedes: performs steepest descents energy minimization
ysubs.f random (function), norm, prod: various utilities
ytest.f test: numerical check on calculation of derivatives
ytrans.f trans: transforms to standard axes system
ytransf.f transf, matmat, invbox: transforms to another crystal setting
ytri.f tri: transforms to oblique coordinates and back

 

Procedure

For each substance all work should be done in a subdirectory data/subst. The detailed formats for the various input files are given in later sections.

Preliminaries

1. Prepare the connectivity file(s) subst.con which are helpful to obtain the molecular topology file(s) by running the program codes:
runcod subst.con
If parameters turn out to be missing, update the codes file all.cod and try again. The output file out.tp is a framework which, after slight editing, will be needed as a topology file for most of the following programs. Rename the edited file to subst.tpu (for united-atom force fields) or to subst.tpa (for all-atom force fields).

2. Select or prepare the force field files (extension .ff). If the force field file name in the following scripts is omitted, the indicated default file is chosen (e.g. gromnoh.ff in the first application discussed below under 4).

3. If necessary, update the space group file all.spg.

4. Decide which conformations are to be used for crystal packing, and prepare a coordinate file cor.# for each of them (# = 001, 002, 003, ...). This can be done by your favorite molecule builder or by the UPACK program prep which uses the united-atom topology file subst.tpu to construct the geometry of a free molecule, followed by an energy minimization. To this end the desired dihedral angles must be given on a tors card in subst.pp:
tors (1)...(Nz+Nh)
The first Nz dihedral angles define the conformation; usually the Nh hydroxyl dihedral angles are unknown and allowed to take any value. Run the program:
runpp subst [gromnoh]
Check the resulting geometry, especially the dihedral angles. Repeat the procedure for every desired conformation, and rename the output file(s) 00000.mdi or 00000.spf to cor.#, the coordinate file(s) to be used in the crystal structure generation.
 
 

The search

Decide which space groups are to be investigated. Create subdirectories named after these space groups (with omission of slashes, e.g. subst/p21c). In every subdirectory a search must be carried out. Usually the default settings are adequate; otherwise input files subst.pa# must be created. Copy the coordinate files cor.# into each subdirectory. The random search consists of the following steps:

1. Generating and clustering structures:
runpa12 u s subst [unitat3]
Output files are pack.10 (parameters) and pack.19 (coordinates); the parameter file must be clustered to pack.13:
rundist u 10 subst

2. Energy minimization: for instance, to continue with an all-atom force field in double precision:
runpa3 a d 13 subst [opls]
rundist a 20 subst
with final results on files pack.23 and pack.29.
 
 

The experimental structure

The program prep can also be used to minimize the energy of a given individual crystal packing. The crystal cell data and atomic coordinates are given in an input file subst.cor. The space group information can be taken from this file or communicated through a codes file subst.ppu or subst.ppa.
Run the program:
runpp u d subst [unitat3] (for united-atom topologies subst.tpu)
runpp a d subst [opls] (for all-atom topologies subst.tpa)
Output coordinate files are given in various formats.

An important application is the minimization of the energy of the experimental structure. This gives an indication about the quality of the force field. Also, the experimental structure should be present in the parameter files from the search procedure. To this end the program dist can also be used:
rundist u/a filenum subst coord
where coord is the name of the file with the experimental coordinates (type spf or mdi). Since interatomic distances are strictly compared, this structure must be energy-minimized in the same force field as was used for the search.
 

Example: ethanol

Go to upack/data/etanol/exp . The experimental structure is etanol.cor. Perform energy minimization:
runpp a d etanol
This creates the energy-minimized structure in four 00001.* files. Observe that 00001.spf equals prep.spf.a (all-atom force field). Now create a subdirectory upack/data/etanol/test,copy the files cor.001 and etanol.pa* from upack/data/etanol/pc, and run:
runrand etanol
This is a standard PACK-run, which takes less than 10 minutes on a reasonably modern PC. It should find the ethanol structure (Z' = 2 in space group Pc), using the random search method. Compare the resulting files with the corresponding ones in upack/data/etanol/pc. Do not worry about slight discrepancies, the results are sensitive to processor-dependent roundoff errors.
 
 
 

Detailed instructions for the preliminary stage

For every coloured name a numerical value must be present in the various files. Multiple entries should be separated by one or more spaces. A square indicates that the corresponding entry must be given on a new line.

Space group file

The space group file upack/tops/all.spg is read by the subroutine leesr in various programs. It must have the following entries for every space group:

cell axes: 1: ax 2: by 3: cz
cell angles: 4: bx 5: cx 6: cy
molecular centers: 7: X 8: Y 9: Z
molecular orientations: 10:  11:  12: 
itest(1) = 1: enables skipping of cells with ax > by
itest(2) = 1: enables skipping of cells with by > cz
itest(3) = 1: enables skipping of cells with ax > cz
Codes file

The file upack/tops/all.codis needed for the program codes. It defines symbols for various types of atoms and code numbers for terms in the force field. It has to be prepared only once for a certain class of molecules; if desired, several files ending with the extension .cod may be present. It must contain the following blocks of data, separated by one blank line (=), where A1, A2, ... indicate the fixed symbols for certain atom types:

regel: one line of arbitrary text
=
A1 amass mpol: atom symbol, atomic mass (a. u.), code for polarizability (800-899).
=
A1 A2 code for bond stretching (100-199)
=
A1 A2 A3 code for angle bending (200-299)
=
A1 A2 A3 A4 code for improper dihedral deformation (300-399)
=
A1 A2 A3 A4 code for dihedral angle torsion (400-499)
The symbol X is allowed in codes 300-499 and stands for any atom type.
=
(Optional): any number of sets of two lines:

Van der Waals parameters are entered directly in the force field file.

Force field file

The force field file upack/tops/ffname.ff is read by the subroutine leest in various programs. To allow an easy change of force fields, the relevant information is transmitted through code numbers as defined in the file all.cod (vide supra). Upon reading a force field file these code numbers are translated into actual numerical values. The force field files must have the following structure:


Additionally,  cha is an optional charge increment, to be subtracted from the charge at the first atom and added to the charge at the second atom. Take care to have these two atom types in the desired order in the topology file (preferably through all.cod).
These terms are necessary to define the chirality around united-atom CH groups and to prevent the deformation of rings and planar groups.

where  is the relative dielectric constant, and qi is multiplied internally by the factor 11.787 needed to obtain the energy in kJ/mol. The 1...4 interactions are multiplied by sc14c.

The van der Waals parameters for nonbonded interactions are treated differently since they are based strictly on atom types rather than on individual atoms. The cards start with two atom types A1 A2 , followed by numerical data. Entries for the combination of two equal atoms must be present; if other interaction parameters are absent (or are given the value -1), they are found from a combination rule. For the combination of atom types i and j this rule is generally the geometric average:

but sometimes the arithmetic average is used:

Intramolecular 1...2 and 1...3 interactions are always excluded. For the Lennard-Jones interaction it is possible to specify separate parameters for 1...4 interactions. Missing (or negative) parameters for the combination of two equal atoms are taken from the corresponding general parameters. For missing mixed 1...4 parameters the rule is:

sc14lj < 0: set equal to the corresponding general parameter
sc14lj > 0: use the combination rule for 1...4 parameters
Subsequently, all 1...4 interactions are multiplied by | sc14lj|.

The following input formats are used:

These are the parameters for the Lennard-Jones interaction: The geometric combination rule applies. As above, but now C12 and C6 are found indirectly: The geometric combination rule applies for E and the arithmetic rule for R. This precludes the use of Ewald summation for the attractive terms. These are the parameters for the Buckingham interaction: There are no special 1...4 interaction terms in this case. The geometric combination rule applies for A and C, and the arithmetic rule for B.
where Ei is the electric field strength at that atom. Molecular topology file

For every substance subst a topology file subst.tpu and/or subst.tpa must be prepared. This can be done manually, but it is more easy to run the program codes:
runcod subst.con [codname]

This program needs two input files:

It produces the output file: Hopefully the contents are self-explaining. Usually some editing is needed: After such editing, rename out.tp to subst.tpu (for united-atom force fields) or to subst.tpa (for all-atom force fields).

Examples are to be found in the directories data/glucsa and data/etanol. It is instructive to compare the files out.tp with the final topology files. For ethanol the all-atom carbohydrate OPLS force field (vide infra) is not adequate, and manual correction was necessary.

Molecular connectivity file

The molecular connectivity file subst.con must contain the following data:

Molecular geometry files

For the crystal structure prediction molecular geometry files are needed for every conformation to be studied. They are named cor.001,cor.002, ... Allowed are the so-called spf, mdi, and cssr formats. The program prep may be helpful for the construction of these files (see also under Preliminaries).

The mdi format is compatible with the gromos program package. It uses fixed format:

The spf format is easier to use since it is in code-directed free format: Lines not starting with any of these code words are skipped.

These files are read by the procedure leesx in various programs. Each atom name must correspond with the name panm chosen in the connectivity file all.cod, except for hydrogen atoms which are selected from the coordinate file on the basis of a distance criterion. Missing H atoms are automatically generated, providing an easy change from a united-atom force field to an all-atom force field. If more than one group of atoms is present, the same sequence of atoms must be used for each group.

The cssr format is useful for interfacing with other program packages, for example TINKER  or CERIUS. It is in fixed format:

Force fields

As discussed above, the force field is defined by the file upack/tops/ffname.ff. As an example for the Buckingham potential, the Williams and Starr [10] force field for benzene can be found in arom.ff. Here we shall discuss Lennard-Jones type force fields for carbohydrates in some detail. For the first stage (program pack12) we recommend the force field UNITAT, which was developed from GROMOS87 [4] by optimizing geometric crystal data for 23 carbohydrate molecules [7]. In the final stage one may wish to use an all-atom force field.  In this way the computations take less time; but note that some additional all-atom structures may be found if that force field is used right from the beginning.

United-atom force fields for monosaccharides and polyalcohols

Atom types:
HO hydroxyl H
OA hydroxyl O
OS ether O in ring and exocyclic OCH3 group
CS0  C atom without H in sugar ring
CS1 united CH1 group in sugar ring
CS2  united CH2 group in sugar ring and exocyclic CH2OH group
CH3 united CH3 group
Entries in the force field files refer to the Lennard-Jones potential; energy units are kcal/mol, distance units Angstrom. Every C-C bond separates two of the following charge groups:
CH3 0.00
CS2 0.00
C-O-H 0.15, -0.50, 0.35
C-O-C  0.18, -0.36, 0.18
C-O-C-O-H 0.20, -0.36, 0.31, -0.50, 0.35
C-O-C-O-CH3 0.15, -0.36, 0.17, -0.36, 0.17
C-O-C-O-CS2 0.16, -0.36, 0.17, -0.36, 0.16
C-O-C-O-C-O-C 0.14, -0.36, 0.40, -0.36, 0.40, -0.36, 0.14

All-atom force fields for monosaccharides and polyalcohols

Many force fields depend on atomic charges from ab-initio quantumchemical calculations. For carbohydrates, with their high conformational flexibility, this is rather unpractical. Carbohydrate force fields with fixed standard charges have been proposed by Ha et al[11], Kouwijzer et al [12] and Damm et al[13]. We have compared the rankings of the experimental structures in a test set of 15 molecules for these force fields [7], and found that the third force field (OPLS) gives the best results. Nevertheless, it is not superior to UNITAT and we have not been able to develop an all-atom force field that performs better. Polarization, although theoretically necessary to account for the cooperativity of hydrogen bonds, gave no improvement at all.

The OPLS parameters can be found in file opls.ff, the two other force fields in the files ha.ff and milou.ff. All these force fields use the Lennard-Jones potential; in the OPLS force field all 1...4 interactions are scaled by 0.5. A cutoff of at least 10 A or, preferably, Ewald summation should be used. No improper dihedrals are necessary.

Atom types:
HO hydroxyl H
HC aliphatic H
OA hydroxyl oxygen
OS ether oxygen in ring
OE hydroxyl oxygen in exocyclic CH2OH group
CA anomeric carbon atom
CS other ring carbon atoms
CE exocyclic carbon atom

The charges in the OPLS force field are found by the following rule: each bond carries a "bond increment" which contributes equal positive and negative charges to the two atoms involved.
Negative side Positive side Bond increment
OA (anomeric) HO 0.435
OA (other) HO 0.418
OA C 0.265
OS, OE C 0.200
C C 0 (dividing charge groups)
CA HC 0.10
C (with OH group) HC 0.06
C (other) HC 0.03

All-atom force fields for carboxylic acids

As discussed in Ref. [18], we have made a few changes and extensions in the OPLS force field for the modelling of carboxylic acids (the "OPLS-AC" force field).  This extension is only a first try, and is by no means fully optimized. The parameters are included in the file opls.ff; charges are as published in Ref. [18].

Atom types:
C C in COOH
O  =O in COOH
OZ -O in COOH
HZ H in COOH
CS aliphatic C
HC H on C or CS
OA hydroxyl O
HO hydroxyl H

All-atom force fields for aromatic systems

Due to conjugation, each aromatic system is unique in principle: geometry parameters and charges have to be derived from theoretical chemistry. For simple chloro- and bromo substituted benzenes a few standard parameters are available in UPACK [19]. For structure generation Lennard-Jones parameters can be taken from the file opls.ff. Buckingham parameters, mostly taken from the publications of Williams, are available in the file will.ff. In both files intramolecular parameters were taken rather arbitrarily.

Atom types:
CR61 united-atom benzene CH group
CR united-atom central C atom or all-atom aromatic C
HR aromatic H
Cl chlorine
Br bromine

Every C-C bond separates two charge groups:
CR61   0.00
CR-HR -0.10, +0.10  (+/- 0.15 in will.ff)
CR-Cl +0.10, -0.10
CR-Br +0.10, -0.10

 

Detailed instructions for the search stage

In the following the standard file setup will be assumed. The space group file is always taken from upack/tops. Topology files and force field files are first sought in the directory where the execute script is called. If they are not found there, they are sought in a reference directory (upack/data/subst and upack/tops, respectively).

All programs have code-directed input. After each code word one or more numerical entries (separated by one or more spaces) may be required or optional. If a certain code word is not specified, the parameters keep their default values which are given between parentheses {...}. The same goes for unspecified optional parameters (indicated by [...]). Code words that are unknown, or preceded by one or more spaces, are disregarded.

A number of code words occurs in several programs. Some are discussed here, and will be mentioned in the individual input descriptions without repeated explanation.

cutoff cutoff [ alphac alpha6 hmaxc hmax6 tole tolk]
Overrides the cutoff value given in the force field file, and possibly also the settings for Ewald summation .

inner inner
A surface charge on polar crystals causes a shape-dependent energy term. Normally it is assumed that this term is compensated by external charges; this corresponds to omitting the h=0 term in Ewald summation [14]. If the same situation is to be simulated without Ewald summation, a term 2p2 / (3 V) must be added. In UPACK this is done if inner > 0; if alphac> 0 inner is set to zero. In lattice vibrational calculations, the contribution of this term to the second derivatives is omitted unless inner = 2.

pol npp [ poltol schaal gamma nnn mmm nte nttt]
Include polarization, possibly overriding the default values given in the force field file.

sd# ncy#sd tol#sd stp#sd nre#sd shm#sd
Specifies steepest descents energy minimization in various stages (#) of a program: the minimization is stopped after ncy#sd cycles or earlier if the energy decreases less than tol#sd. The initial step size is determined by stp#sd. A new pair list is made after nre#sd cycles or if any parameter changes more than shm#sd. The steepest descents minimization is usually followed by:

cg# ncy#cg tol#cg stp#cg dep#cg shm#cg
Specifies conjugate gradients energy minimization in various stages (#) of a program: the minimization is stopped after ncy#cg cycles, or earlier if the root mean square value of the energy derivatives is less than tol#cg and a recalculation with a new pair list does not change the energy more than dep#cg. The initial step size is determined by stp#cg. A restart with a new pair list is made if any parameter changes more than shm#cg.

files ifils ifilm ifild ifilc
Produce output files if these parameters are larger than 0:
ifils = 1: a separate file (format spf) is made for each structure
ifils = 2: the results are also written to one continuous file (all.spf)
ifils = 3: the results are only written to one continuous file (all.spf)
ifilm = 1: format mdi.
ifild = 1: format fdat.
ifilc = 1: format cssr.
ifilc = 2: format cssr, the charges are written out too.
 
 
 

Program pack12

This program generates structures with any number of groups of molecules. It performs the same action as the combined grid search programs pack1 and pack2, but in quite a different way. Rigid molecules are placed with random positions and random orientations in a cell with randomly chosen crystallographic parameters. Dihedral angles may be set to random values too. The resulting structures are optimized with respect to energy, assuming fully non-rigid molecules. An arbitrary number of conformations can be studied consecutively.
It is possible to make use of partial knowledge by limiting the search range for the cell axes and cell angles. It is also possible to take fixed starting values for known positions and orientations of molecules, as well as for dihedral angles.

Execute script: runpa12 u/a s/d subst [ffname]

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tpu for united-atom force fields or to upack/data/subst/subst.tpa for all-atom force fields
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff (default ffname=unitat3 or ffname=opls, respectively)
in.pa (unit 13), code-directed input, linked to subst.pa12
cor.001,cor.002, cor.003, ... (unit 14), coordinates for various conformations.
charges.cssr (unit 14, optional): a cssr-type file containing charges to superside the charges from the topology file (if icoul = 2).

Output files:

out.pri (unit 16), general results, linked to pack.pr1
pack.10 (unit 10), structure parameters
pack.19 (unit 19), coordinates

The number of distinct groups of molecules (Gi) is called ngr. This number must certainly not be larger than 4 to avoid excessively long and usually fruitless computations. Each energy calculation is discontinued immediately if any repulsion term exceeds the value ebig. This will occur in the majority of cases. If not, ncysd cycles of steepest descents and ncycg cycles of conjugate gradients energy minimization follow.

The coordinate file(s) must contain the atoms in exactly the same order as the topology file. If ngr is larger than one, the coordinates can either be specified for each group individually or for a number of groups. In the latter case ngr must be a multiple of that number, and the given coordinates are copied as many times as necessary.
 

Input file

The input data discussed above can be manipulated by cards in file in.pa.

ngr ngr{1}
The number of groups of molecules, Gi.

rand nrand{5000}
The search continues until nrand random structures are found (for every conformation).

spgr naamsp
The space group name (upper or lower case, slashes may be omitted). This card must always be present.

emax emax {40}
Structures with energy higher than emax (with respect to the lowest energy found at that time) are not included in the output file.

axes amin amax bmin bmax cmin cmax {0.4 2.5 0.4 2.5 0.4 2.5}
The cell axes are scanned: ax = amin ... amax, by = bmin ... bmax, cz = cmin ... cmax.

angles bxmin bxmax cxmin cxmax cymin cymax {0 1 0 1 0 1}
The cell angles, represented by the projections of the cell axes on the Cartesian axes, are also varied:
bx=(bxmin ... bxmax) ax, cx=(cxmin... cxmax) ax, cy=(cymin ... cymax) by.

grvfix xxfix yyfix zzfix (1...MgGi )
Fixed starting centers of gravity for each molecule (but random for each value > 999)

eulfix phifix thefix psifix (1...MgGi )
Fixed starting Euler angles for each molecule (but random for each value > 999)

hydfix hydfix ((1...Nz+Nh),1...Gi )) {1000...1000}
Each dihedral angle is set to the corresponding value of hydfix, except:
if hydfix > 999, random values are assigned to dihedral angles
if hydfix < -999, the dihedral angles from the input geometry are retained

celopt icopt {1}
Cell axes and angles are kept fixed if icopt  = 0.

sd ncysd tolsd stpsd nresd shmsd {100 1 0.001 10 1}
cg ncycg tolcg stpcg depcg shmcg {1000 1 0.01 1 0.05}

ebig ebig {2000}

dexp dexp {-100} [volcor {1.10}]
dexp > 0: The experimental density dexp (in gram cm-3) is used to find ax.
dexp = 0: No such restriction on ax. However, this is usually very inefficient since much time is wasted on unproductive starting structures.
dexp < 0: The density is estimated from -dexp random structures (for each conformation) before starting the calculations. Its value is found from the smallest volume encountered.
volcor: The smallest volume encountered is multiplied by volcor to facilitate finding new structures.

inner inner {1}

seed iseed {281197}
The integer iseed initializes the random number sequence.

append iseed
If this keyword is present, the nrand random structures are appended to the results of a previous search (pack.10 and pack.19).  It is necessary to redefine seed iseed here to prevent just repeating the previous calculation! If the previous calculation was stopped uncleanly, take care that pack.10 and pack.19 end with the same entry.

coul icoul {1}
If icoul = 0 no Coulomb interactions are calculated.
If icoul = 2 the charges from the topology file are overruled by those from the file charges.cssr.

intra intra {1}
intra = 0/1: disable/enable the calculation of intramolecular nonbonded interactions

pres pres {0}
The pressure in atmospheres.

print iprint {0}
Minimal output on the screen and on out.pri is given for iprint = 0. A larger value gives more output.

test dtest {0}
If dtest > 0, the analytical energy derivatives are compared with numerically calculated values (using parameter shifts dtest).

cutoff cutoff [ alphac alpha6 hmaxc hmax6 tole tolk]
The default values from unitat3.ff and opls.ff (0.8 3 3 2 2 0.02 0.02) are recommended: Ewald summation with a still fairly small cutoff radius and elimination of quite a lot of terms in the pair list.

stc stcin(1...4) {1 1 1 1}
Steepest descents weights for cell axes, cell axis projections, and atomic coordinates, respectively.
 

Output files

The file pack.10 contains first data concerning space group symmetry, torsional degrees of freedom and atomic coordinates of the free molecule. For explanation of the symbols see the space group and topology files. The following lines occur:

- ispgr icode
Space group number and sequence code. The sequence code is -1 to indicate that the file is generated by program pack12.

- ndihz ndihh: Nz, Nh (= 0). For every dihedral 1...Nz:
- iome jome kome lome omega

- nrp ireftr(1...3): Ag, followed by the three numbers of the reference atoms.
The for every atom i = 1... nrp:
- i panm iac xr(1...3),
where panm is the atom name corresponding to the topology file, iac is the sequence number of the atom type code, and xr gives the Cartesian coordinates xi, yi, zi in nm for the first geometry file (cor.001).

- cel(1...12): starting values for the parameters

- iref(1...12) to control the optimization of the 12 parameters

After these initial data, each line characterizes a certain structure:

- code1: N1, the sequence number of this structure in the random series (after the density test but before the energy test)
code2: N2, the sequence number in this file.
This is kept as identification number in all subsequent calculations.
epot: the energy (kJ/mol)
par(1...): ax, by, cz, bx, cx, cy (pm) unless constrained by symmetry (i.e, the corresponding iref must not be zero).
ntest: sequence number in the random search (before the density test)
en1 vol: the energy value at the initial calculation, and the molecular volume (both multiplied by 1000 to obtain an integer value)
tau(1...): the values of the dihedral angles
1 (the multiplicity, for later reference)

Clustering by dist yields the file pack.13 which has the same structure. Only code1 now gives the sequence number in the clustered file (code2 still refers to N2) and two new data are added:

zflag: the number of independent molecules (Z'')
imult: the cumulative number of structures that have been clustered together for this entry

The file pack.19 contains Cartesian atomic coordinates for every structure:

- icod2 box(1...6): N2, ax, by, cz, bx, cx, cy
- x(1...3Ag): x1, y1, z1,x2,y2, z2, ...
 
 

Program pack3

This program performs the calculations in subsequent stages of the crystal structure prediction: energy minimization with full structure relaxation. Thus intramolecular as well as intermolecar energy contributions are taken into account.

Execute script: runpa3 u/a s/d filenum subst [ffname]

To illustrate the use of the file number, we take filenum = 13 (the first instance where pack3 is usually called). As an alternative example between {...} we take filenum = 42; this should make the general principle clear.

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tpu for united-atom force fields or to upack/data/subst/subst.tpa for all-atom force fields
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff (default ffname=unitat3 or ffname=opls, respectively)
in.pa (unit 13), code-directed input, linked to subst.pa2{5}
in.par (unit 21), structure parameters, linked to pack.13{42}
in.cor (unit 29), coordinates, linked to pack.19{49}
in.add (unit 14), optional set of additional coordinates, linked to file extra
in.md (unit 15), input for molecular dynamics, linked to subst.md
charges.cssr (unit 14, optional): a cssr-type file containing charges to superside the charges from the topology file (if icoul = 2).

Output files:

out.pri (unit 16), general results, linked to pack.pr2 {5}
out.par (unit 30), generated structure parameters, linked to pack.20 {50}
out.cor (unit 39), generated coordinates, linked to pack.29 {59}
For every structure (identified with a five-digit number #) there is the possibility for coordinate output in four formats (unit 17), linked to #.spf, #.mdi, #.dat, #.cssr.
Additional possible output from molecular dynamics:
#.en (unit.15), energy snapshots
#.co (unit.17), coordinate snapshots
#.ve (unit.18), velocity snapshots
#.xvo (unit.19), final coordinates and velocities

The program automatically detects the number of molecular groups from the number of atoms on the coordinate file. It is also possible to read an (additional)
hand-made file in.add with cell parameters and atom coordinates.

It is possible to expand the geometry to independent molecules in space group P1. But the energy still refers to one molecular group. The objective is to see whether the symmetry is destroyed when it is no longer enforced; if so, the structure is in fact unstable. In this situation a significant change in energy occurs, and the resulting symmetry-averaged coordinates do not make sense.

The same objective can be reached by performing a primitive MD-run before (or instead of) the energy minimization. Here too one has the choice whether or not to enforce the space group symmetry. It is obviously better not to do so; in fact, to make the simulation less primitive one should consider more than one unit cell. That option is available in the program crysmd.

It is possible to calculate the lattice vibrations and the associated thermodynamical quantities. Here the occurrence of imaginary frequencies is an indication that the crystal structure is not a stable one.

Subsequent runs

To continue the energy minimization, subsequent runs can be done after clustering. For instance:
rundist 20
creates pack.23, and
runpa3 a d 23 subst [ffname]
uses the parameter file pack.23 , together with input file pack.pa3 and coordinate file pack.29, to produce all-atom results in pack.pr3, pack.30 and pack.39. This procedure can be repeated at will up to pack.pr9.

Input file

The options mentioned above can be manipulated by cards in file in.pa:

expand expand {0}
If this code word is present, the coordinates are expanded to P1. Random position disturbances (with maximum expand) can be added to move the system slightly away from a possible saddle point; a reasonable value would be 0.0005.

maxinp maxinp {5000}
The maximum number of input structures to be read from in.par. To select certain structures from this file, set maxinp < 0 and then give on the same line any desired quantity of sequence numbers N2. If no sequence numbers are given, the program disregards the information from in.par and operates on all entries from in.cor.

emax emax1[emax2]
The highest allowed energy: emax1 is relative to the first entry on the input file in.par,emax2 is the absolute value.

celopt icopt {1}
Cell axes and angles are kept fixed if icopt  = 0.

sd ncysd tolsd stpsd nresd shmsd {100 0.01 0.0001 5 0.02}
cg ncycg tolcg stpcg depcg shmcg {5000 0.0005 0.005 0.3 0.02}
These values ensure good convergence, but this will cost a lot of computer time. If the number of structures is large, an intermediate calculation with fewer cycles and a larger tolerance will be more efficient. However, equivalent structures may then not be recognized in the clustering procedure.

extra [ncycsd tolsd ncyccg tolcg]
If a file in.add is present to provide additional structures, new values for the minimization parameters can be given here. The format of this file may be mdi or spf; in the latter case the data for the different molecules must be separated by an end card. If the file title starts with a number that has not been used before, that number identifies the structure. Otherwise a new number is assigned, starting with 90001.

inner inner {2}

icoul icoul {1}
If icoul = 0 no Coulomb interactions are calculated.
If icoul = 2 the charges from the topology file are overruled by those from the file charges.cssr.

intra intra {1}
intra=0/1: disable/enable the calculation of intramolecular nonbonded interactions

pres pres {0}
The pressure in atmospheres.

print iprint {0}
Minimal output on the screen and on out.pri is given for iprint = 0. A larger value gives more output.

test dtest {0}
If dtest> 0, the analytical energy derivatives are compared with numerically calculated values (using parameter shifts dtest).

cutoff cutoff [alphac alpha6 hmaxc hmax6 tole tolk]
The default values from unitat3.ff and opls.ff are 0.8 3 3 2 2 0.02 0.02: Ewald summation with a still fairly small cutoff radius and elimination of quite a lot of terms in the pair list. This is a reasonable compromise between speed and precision. But note that some calculations may not converge, or even give absurd results. Redo these (using the maxinp -1 ... option to save time) with, for instance, cutoff 1.2 3 3 2 2 0 0. Also, in the last stages of the analysis it is better to set tole = tolk = 0, and to experiment with increasing the other parameters.

pol npp[poltol schaal gamma nnn mmm nte nttt]

files ifils ifilm ifild ifilc {0 0 0 0}

stc stcin(1...3) {1 1 1}
Steepest descents weights for cell axes, cell axis projections, and atomic coordinates, respectively.

If molecular dynamics calculations are desired:

mdrun nstlim {0} [nrun {1}]
Do molecular dynamics. This is done by subroutine runmd, which has been taken from the gromos program package and adapted for use in crystal simulations. See ref. [15] for background. Time units are ps.
The card mdrun must give nstlim, the number of MD steps. Optionally the number of runs (nrun) may be specified. The following cards may also be used:

mdcut cutoff [alphac alpha6 hmaxc hmax6 tole tolk]
Only needed if different from the values used for the energy minimization

mdtsc ntt tempi tautp {1 0 0.10}
Controls temperature scaling:
ntt = 0: disable temperature scaling
tempi: if tempi > 0 the initial velocities are taken from a Maxwellian distribution with T = tempi
tautp: relaxation time for temperature scaling

mdtemp temp0 {298}
temp0: reference temperature for velocity scaling

mdpsc ntp comp taup{4 0.00076 5.0}
Controls pressure scaling:
ntp = 0: no pressure scaling
ntp = 1: isotropic pressure scaling
ntp = 2: orthorhombic pressure scaling
ntp = 3: monoclinic pressure scaling
ntp = 4: triclinic pressure scaling
comp: compressibility (Gromos units), essentially only a scaling factor
taup: relaxation time for pressure scaling

mdpres pres0
pres0: reference pressure (atm)
Only needed if different from the pressure used in the energy minimization

mdshak ntc {3} [tol {0.00001}]
ntc = 1: disable the use of bond length constraints by the shake procedure.
tol: the geometric tolerance for that procedure

mddt dt {0.002}
dt: the time step in picoseconds

mdnscm nscm {500}
Stop translation of the center of mass after every nscm steps

mdnsnb nsnb {10}
Make a new pair list after everynsnb steps

mdpri ntpr ntwx ntwv ntwe ntpw {0 0 0 0 2}
Write output to screen and out.pri after every ntpr steps (the quantity of output being determined by iprint).
If ntwx > 0, write coordinates to #.co (unit 17) after every ntwx steps.
If ntwv > 0, write velocities to #.ve (unit 18) after every ntwv steps.
If ntwe > 0, write various results to#.en (unit 15) after every ntwe steps.
If ntpw = 0, the coordinate and velocity files are unformatted.
 

If lattice vibration calculations are desired:

latvib [nq ...] {0}
If this keyword is present, lattice vibration calculations are performed. To this end double precision is required; it is advisable to do energy minimization first. The calculations are first done for q=0, and subsequently for nq vectors in q-space. Possible modes of input:

vibamp
If this card is present, eigenfunctions are calculated in order to find the vibrational amplitudes.

temp vtemp(1....) {298, undefined ...}
The thermodynamical quantities (and the vibrational amplitudes, if requested) are calculated at temperatures vtemp(1), vtemp(2), vtemp(3), ...
 

Output files

The file pack.20 contains first data concerning space group symmetry, torsional degrees of freedom and atomic coordinates of the free molecule. This is quite similar to the output from pack12, but note that now icode is equal to the sequence number of the call of pack3. After these initial data, each line again characterizes a certain structure:

- code1: N1, the sequence number in this file
code2: N2, the sequence number from pack12. This is kept as identification number in all subsequent calculations.
epot: the energy (kJ/mol)
par(1...): ax, by, cz, bx, cx, cy, X, Y, Z,,, unless constrained by symmetry.
(i.e, the corresponding iref must not be zero). Lengths in pm, angles in degrees multiplied by 10.
omega(1...Nh+Nz): torsional angles (degrees)
imult: the number of structures that have been clustered together for this entry in previous runs.

If more than one independent molecule is present, only the cell parameters are given (i.e., centers of gravity, Euler angles and torsional angles are not included).
 

The file pack.29contains Cartesian atomic coordinates, in exactly the same format as pack.19.
 

The molecular dynamics output file #.en contains for every selected time step:

ener(1...46):
1...10: energy components: total, kinetic, potential; ax, by, cz, bx, cx, cy, volume
11...15: energy components: bonds, angles, impropers, torsions, nonbonded
15...20: not yet assigned
21...30: nine pressure components, total pressure
31...40: nine virial components, total virial
41...46: a, b, c (nm),(degrees)
 
 
 
 

Program dist

This program compares possibly equivalent crystal structures on the basis of a comparison of interatomic distances. It can be used to eliminate equivalent structures ("clustering mode") or to search for the presence of a certain structure (usually the experimental one; "search mode"). Fairly well-refined structures are required: if the geometrical differences become too large, the structural equivalency may be no longer recognized.

Execute script: rundist u/a [-]filenum [subst [coord]]

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tpu for united-atom force fields or to upack/data/subst/subst.tpa for all-atom force fields
in.spg (unit 11), space group, linked to upack/tops/all.spg.
in.dcl (unit 13; optional), code-directed input, linked to:
- clustering mode: subst.clu (united atoms) or subst.cla (all atoms);
- search mode: subst.zku (united atoms) or subst.zka (all atoms).
parameter file(s) ( fort.201, fort.202, fort.203, ...): type pack.20.
coordinate file(s) ( fort.101, fort.102, fort.103, ...): type pack.29
For instance, if filenum = 41 then fort.201 is linked to pack.41 and fort.101 is
linked to pack.49. If filenum = -41, all corresponding files in directories at the same level are linked.
in.spf (unit 14, optional): one continuous spf-type file, linked to extra.spf.
in.exp (unit 15, optional): an spf-type file containing one reference structure, linked to the file coord.

Output files:

out.pri (unit 16), linked to distclu.pr#, distcla.pr#, distzku.pr# or distzka.pr#.
new parameter file(s) (fort.301, fort.302, fort.303, ...): type pack.20.
a combined list of surviving structures: struclist.#.

Two structures are considered equivalent if the radial distribution curves for each atom in the asymmetric unit are the same. This is even more specific than comparing powder diffraction diagrams, since not only the distances but also the atom types must correspond. This type is normally equal to the atom type, but it can be set more specific than just, say, "hydroxyl O": in glycerol the two end oxygen atoms are equivalent, but the middle one is different.

The structures to be compared are characterized by their space group symmetry, but there is no necessity that this space group is the same as required by crystallographic conventions. For instance, a P21/cstructure could also have been generated in P21 with two quasi-independent molecules or even (with cell doubling) in P1 with eight. So the numbers of molecules in the various structures to be compared may differ. The program first determines for every structure the number of crystallographically independent molecules (Z'') before starting the actual comparison of structures. There, of course, only structures with the same Z'' can be really equivalent. The determination of the actual space group of these structures is quite another matter, for which programs like PLATON [4] orCERIUS [5] may be employed.
 

Structural information

The program tries first to read cell parameters and atomic coordinates from a set of pack-type coordinate files, one for each space group to be considered. They must be assigned to files fort.101, fort.102, ..., and the corresponding pack-type parameter files (fort.201, fort.202, ...) must be present to select the desired structures, and to carry information on space groups and atom types. These files have the standard UPACK format, as described under pack2. In the script file rundist the second parameter gives the number of the file to be considered (e.g. 30 for the file pack.30); if this number is given as -30 all pack.30 files present in directories at the same level (usually a directory for every space group) will also be linked.

After that, the program tries to find additional structures from a file in.spf which contains an arbitrary number of spf-type entries. Each entry must contain at least the cards:

CELL box(1...6): a, b,c (Angstrom), ,(degrees)
and cards with fractional non-orthogonal coordinates:
ATOM panm x(1...3): atom name, x, y, z (one line for each atom). The atoms must be ordered as in the topology file, and it must be possible to form molecules from their coordinates without using symmetry operations.
The information for every structure is concluded by the card:
END

Obviously, at least either this file or one pair of pack-type files must be present. Optional information can be given by a title card containing one line of arbitrary text:
TITL regel
and the energy can be transmitted by a card:
ENER ener
This is used for selection of structures to be compared; in the absence of this card no such selection is done for this entry.
 

Symmetry information

In each pack-type file the space group number is given on the first line. In the additional file in.spf the space group information is presented for each entry separately, either by a space group name, e.g.:
SPGR P21/m
or by detailing a set of symmetry operations separated by commas:
SYMM -x, y+1/2, -z
SYMM x, -y+1/2, z
where the redundant card
SYMM -x, -y, -z
is optional. If both types of cards are present, the SYMM information is used.

Lines not starting with any of the described code words are skipped. When only symm cards are used to introduce space group information, the file in.spg needs not to be present. In that case the parameter filenum is a dummy argument, and all space groups can be defined. But if the space group of any entry is defined otherwise, the space group file in.spg is needed and only triclinic, monoclinic and orthorhombic systems can be handled.

Distance selection

It may be necessary or desirable to modify the standard information on atom types as given in the files discussed above for one or more of the following reasons:
- Normally only intermolecular distances are considered. So, if the list of atoms contains distinct molecules, each last atom of a molecule must be marked.
- Atoms may have the same type (vide supra).
- The coordinates of some atoms, for instance hydrogen atoms bound to carbon, carry hardly any significant information. Deleting such atoms can lead to a substantial increase in speed.
The specification of this information is done through an atom card (vide infra) in the file in.dcl.

Mode selection.

In the standard mode of operation ("clustering") the program removes equivalent structures. Output, sorted to energy, is written to new pack files with the number increased by three: for instance, pack.31 is clustered to pack.34.

Alternatively, a file in.exp may be present which defines a reference structure (in the spf format described above). In this case, the program switches to the "search" mode where each structure of the list(s) is only compared with the reference structure.
 

Details of the calculation can be manipulated by cards in file in.dcl:

cutoff cutoff {0.5}
The cutoff distance (nm) for the radial distribution curves

margin margin {0}
The number of contacts per molecule that are allowed not to be found

sym isymm {1}
isymm = 0: the space group is changed to P1. This overrules the space group information given on the pack-files.
isymm = 1: space group information is used.

dener dener {1.0}
The comparison between two structures is skipped if their energies differ by more than dener. This does not apply to entries from in.spf where no energies are available. Also it applies only to the clustering mode.

rvol rvol {0.01}
The comparison between two structures is skipped if the ratio between their molecular volumes deviates from 1 by more than rvol. This applies only to the clustering mode.

tol diff {0.005 in clustering mode, 0.02 in search mode}
Maximum allowable deviation (nm) between two distances that are being compared.

print iprint {0}
Regulates the quantity of output: larger iprint gives more output to the screen and to out.pri.

select isel {0}
Rules details of the clustering:
isel = 0: discard the structure with lowest energy
isel = 1: discard the structure from the list with the lowest fort number
isel = 2: discard the structure from the list with the highest fort number

intra intri(1) intrj(1) intri(2) intrj(2) (...)
Consider also intramolecular contacts between atom pairs numbered intri and intrj. These pairs should be chosen to discriminate between several possible conformations of flexible molecules; it is a probably superfluous safeguard. In the absence of this card all intramolecular distances are omitted.

atom
This card, if present, must be the last one in the file in.dcl and must be followed by nrp cards (one for each atom) containing:
- k: a sequence number (1... nrp); a negative k denotes the last atom of a molecule.
panm: the atom name
type: an integer number to indicate the atom type. These numbers can be arbitrarily assigned between 0 and 99. If type = 0 the corresponding atom is skipped.
In the absence of an atom card, the program takes the types from the topology file in.tp. Lacking that, it searches for the pack-type file fort.201 and sets the type equal to the type found there. If that file is not present either, then all atoms are given a different type.  Note that in the last two cases all atoms are considered to belong to one molecule, and for cases with Z">1 an atom card is necessary.
 
 
 

Program prep

This program calculates the energy of one or more single (crystal) structures, and optimizes their geometries. The input data are obtained from an spf, mdi or cssr type coordinate file. It is possible to expand the geometry to independent molecules in space group P1 or to consider more than one independent molecule in the asymmetric unit. Lattice dynamical calculations are also possible.

Execute script: runpp u/a s/d subst [coord] [ffname]

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tpu for united-atom force fields, or upack/data/subst/subst.tpa for all-atom force fields.
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff (default ffname=unitat3 or ffname=opls, respectively)
in.pp (unit 13), code-directed input, linked to subst.ppu or subst.ppa, respectively.
in.cor (unit 29), one or more sets of coordinates, linked to coord (default coord=subst.cor)
charges.cssr (unit 14, optional): a cssr-type file containing charges to superside the charges from the topology file (if icoul = 2).

Output files:

out.pri (unit 16), general results, linked to prep.pru or prep.pra, respectively
parfile (unit 15), parameter summary
out.exp (unit 18), experimental data to be retrieved by the search procedure
out.tgt (unit 19), same data, but after energy minimization. For the experimental structure this is a target function to be retrieved "exactly".
For every structure (identified with a five-digit number #) there is the possibility for coordinate output in four formats (unit 17), linked to #.spf, #.mdi, #.dat, #.cssr.
 

Variant for a free molecule

This is useful to prepare starting geometries for the search procedure.

Execute script: runpp subst [coord] [ffname]

File definitions, if different from above:
in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tpu
in.ff (unit 12), force field, linked to upack/tops/ffname.ff
(default ffname=gromnoh)
in.pp (unit 13), code-directed input, linked to subst.pp
The output coordinates are transformed to the reference axes system and written to the files 00000.spf and 00000.mdi.
The input coordinate file in.cor is now optional if a united-atom force field is provided where sufficient improper dihedrals are available to define all chiral centers and ring puckerings. In that case the parameter coord is omitted, the target values for the dihedral angles can be specified in in.pp (default values 180 ).  It may be necessary to specify a few more of these than would be minimally required.There are two possibilities:
- if no energy minimization is done afterwards, these values are enforced as well as possible.
- if energy minimization is done, the dihedral angles are allowed to find the nearest minima for the free molecule. Hydroxyl dihedral angles are usually irrelevant in this stage.
Only one group of molecules can be treated in this way. Coordinate files containing groups in different conformations must be made by repeated calls of the program, followed by cutting and pasting.

The input data for both variants can be manipulated by cards in file in.pp:

spgr naamsp {0}
If this card is absent or if naamsp = 0 the calculation is done for a free molecule. However, this information can be overridden by SPGR cards on an spf-type coordinate file, which may be different for each subsequent structure.

inner inner{2}

intra intra{1}
intra=0/1: disable/enable the calculation of intramolecular nonbonded interactions

flex iflex {1}
iflex = 0: rigid molecules, only intermolecular nonbonded interactions
iflex = 1: fully flexible molecules
iflex = 2: rigid molecules with rotatable OH groups, only nonbonded interactions The option naamsp = 0 is incompatible with iflex = 2.

coul icoul {1}
If icoul = 0 no Coulomb interactions are calculated.
If icoul = 2 the charges from the topology file are overruled by those from the file in.cor (which must then be in CSSR format);  if that file does not provide charge information, the file charges.cssr is used instead.

pres pres {0}
The pressure in atmospheres.

ngr ngr {0}
The number of independent groups of molecules, Gi.
ngr >= 0: the coordinate file need not be ordered, the non-hydrogen atoms are recognized from their name in the topology file and the hydrogen atoms are recognized from the distance to their carrying atom. All atoms must be given in the same order for each group. If ngr > 1 only the names of the first group are used. If ngr = 0 the number of groups is determined from the number of occurrences of the same sequence of names.
ngr < 0: the coordinate list must contain - ngr groups of molecules. Now that file must contain the atoms in exactly the same order as in the topology file, and the atom names given in the coordinate file are disregarded.

expand expand {0} [nx ny nz] {1 1 1}
If this code word is present, the coordinates are expanded to P1. Random position disturbances (with maximum expand) can be added to facilitate the minimization leaving a saddle point; a reasonable value might be 0.0005. Optionally, the structure is further expanded to nx, ny, nz unit cells in the a, b, c directions, respectively.

sd ncysd tolsd stpsd nresd shmsd {100 0.01 0.0001 5 0.02}
cg ncycg tolcg stpcg depcg shmcg {5000 0.0005 0.005 0.3 0.02}

print iprint {0}
Minimal output on the screen and on out.pri is given for iprint = 0. A larger value gives more output.

test dtest {0}
If dtest > 0, the analytical energy derivatives are compared with numerically calculated values (using parameter shifts dtest).

reset
If this card is present, the C-H and O-H distances are reset to values from the topology file.

cutoff cutoff[alphac alpha6 hmaxc hmax6 tole tolk]

pol npp [poltol schaal gamma nnn mmm nte nttt]

roth dihin(1...Nh)
Reset dihedrals for hydroxyl groups to dihin (note: dihedral i corresponds to (Nz+i))

Vn Vn (1...Nh) in kcal/mol {0.3...0.3}

ntal ntal (1...Nh) {3...3}
The latter two cards may be used if iflex = 2 (cf. pack2)

stc stcin(1...6) {1 1 1 1 100 1}
Steepest descents weights for cell axes, cell axis projections, centers of gravity, Euler angles, hydroxyl torsional angles, and atomic coordinates, respectively.

files ifils ifilm ifild ifilc {1 1 1 2}
 

For a free molecule only:

tors dihin(1...Nz+Nh) {0...0}
Target values for dihedrals.

scan iscan dihin(iscan) dscan escan
This card is only operative if tors is present. The dihedral iscan is scanned from the given value dihin(iscan) (which supersedes the corresponding entry from tors) in steps of dscan up to and including escan.
 

If lattice vibration calculations are desired:
 

latvib [nq ...] {0}
If this keyword is present, lattice vibration calculations are performed. To this end double precision is required; it is advisable to do energy minimization first. The calculations are first done for q=0, and subsequently for nq vectors in q-space. Possible modes of input:

vibamp
If this card is present, eigenfunctions are calculated in order to find the vibrational amplitudes.

temp vtemp(1....) {298, undefined ...}
The thermodynamical quantities (and the vibrational amplitudes, if requested) are calculated at temperatures vtemp(1), vtemp(2), vtemp(3), ...

celopt icopt dbox(1....) {1  0.01}
if icopt  = 0, the cell axes and angles are kept fixed
if icopt = 1, the cell axes and angles are optimized
if icopt  = 2, the cell axes and angles are optimized followed by a search for the global minimum in the free energy. This is done by a grid search of 3 points along each axis, separated by dbox(1). For determination of the free energy lowering, this is usually sufficient. For thermal expansion more iterations are needed, and to this end more entries dbox(2...) can be given. In that case the search is iterated once for every entry, around the previous minimum using the corresponding new grid separation. Since the minimum depends on temperature, this is only done for the first temperature vtemp(1).

boxfix box(1...6)
This card starts the grid search for the global free energy minimum directly at the given box parameters ax, by, cz, bx,cx, cy, again for the first temperature only.
 
 

Program crysmd

This program performs a molecular dynamics simulation for one structure. The coordinates of that structure are given in file in.xvi which should be in mdi format (the coordinates may be followed by the three velocities). After the simulation the output file out.xvo can be used as input file for a continuation run.

Execute script: runmd u/a s/d subst coord [ffname]

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tpu for united-atom force fields ,or to upack/data/subst/subst.tpa for all-atom force fields.
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff (default ffname=unitat3 or ffname=opls, respectively)
in.md (unit 13), code-directed input, linked to subst.mdu or subst.mda, respectively.
in.xvi (unit 21), coordinates, linked to coord

Output files:

out.pri (unit 16), general results, linked to md.pru or md.pra, respectively
out.en (unit 15), energy snapshots
out.co (unit 17), coordinate snapshots
out.ve (unit 18), velocity snapshots
out.xvo (unit 19), final coordinates and velocities
out.ave (unit 14), average results in MDI format

The input specifications are given in file in.md, using the same code words as under the option mdrun in program pack3. At least mdrunnstlim must be specified. Additional code words are now:

spgr ispgr {0}
The number (not the name!) of the space group

ngr ngr {1}
The number of independent molecules in the unit cell

expand [nx ny nz] {1 1 1}
If this card is present, the structure is expanded to P1. Optionally, the structure is further expanded to nx, ny, nz unit cells in the a, b, c directions, respectively.

mdskip ig {0}
The number of runs to be skipped before block averages of the remaining ones are taken.

print iprint {0}
 
 
 

References

1. A. Gavezzotti, Ed, Theoretical Aspects and Computer Modeling of the Molecular Solid State, Chapters 4 and 6. John Wiley and Sons, Chicester, 1997.

2. P. Verwer and F. J. J. Leusen, Computer Simulation to Predict Possible Crystal Polymorphs. Rev. Comp. Chem. 12 (1998) 327-365.

3. B. P. van Eijck, W. T. M. Mooij and J. Kroon, Acta Cryst. B51 (1995) 99-103.

4. A. L. Spek, Acta Cryst. A46 (1990) C34.

5. Cerius2 (1997), Molecular Simulations Inc.

6. W. F. van Gunsteren and H. J. C. Berendsen, GROMOS, Groningen molecular simulation package, University of Groningen, 1987.

7. B. P. van Eijck and J. Kroon, J. Comput. Chem. 20 (1999) 799-812.

8. B. P. van Eijck and J. Kroon, J. Comput. Chem. 18 (1997) 1036-1042.

9. D. N. Bernardo, Y. Ding, K. Krogh-Jespersen and R. M. Levy, J. Phys. Chem. 98 (1994) 4180-4187.

10. D. E. Williams and T. L. Starr, Comput. Chem. 1 (1977) 173-177.

11. S. N. Ha, A. Giammona, M. Field and J. W. Brady, Carboh. Res. 180 (1988) 207-221.

12. M. L. C. E. Kouwijzer, B. P. van Eijck, S. J. Kroes and J. Kroon, J. Comput. Chem. 14 (1993) 1281-1289.

13. W. Damm, A. Frontera, J. Tirado-Rives and W. L. Jorgensen, J. Comput. Chem. 16 (1997) 1955-1970.

14. B. P. van Eijck and J. Kroon, J. Phys. Chem. B101 (1997) 1096-1100.

15. W. F. van Gunsteren and H. J. C. Berendsen, Angew. Chem. Int. Ed. Engl. 29 (1990) 992-1023.

16. B. P. van Eijck and J. Kroon, Acta Cryst. B56 (2000) 535-542.

17. B. P. van Eijck, J. Comput. Chem. 22 (2001) 816-826.

18. B. P. van Eijck, J. Comput. Chem. 23 (2002) 456-462.

19. B. P. van Eijck, Phys. Chem. Chem. Phys.  4 (2002) 4789-4794.

Back to beginning