The program package is designed to construct crystal structures of low
potential energy, using a molecular force field. It is at present restricted
to space groups with triclinic, monoclinic or orthorhombic symmetry. The
programs have been designed with simple carbohydrates as target molecules,
but should be usable for other chemical classes if the user is prepared
to extend some default parameter files. For reviews on crystal structure
prediction see Refs. [1] and [2].
The substance to be studied is characterized by a group of separate molecules. Such a group is the smallest possible repeating unit, for instance ABB in a dihydrate where A is a certain molecule and B is water. The constituent molecules are building blocks for the generation of hypothetical crystal structures. Generally several distinct molecular conformations are thinkable. It is possible to specify their geometries precisely, and to investigate a list of those that are deemed likely to occur. Alternatively, one may rely on automatic generation of possible conformations. This is especially useful for hydroxyl groups where the possible crystal conformations are hard to predict.
In the group a number of bonds may occur about which internal rotation
is possible. We shall denote the number of rotatable hydroxyl groups by
Nh
and the number of other rotatable bonds (not connecting a hydrogen atom)
by Nz. The symbol
will
be used for the dihedral angles. Furthermore, let the group contain
Mg
molecules and Ag atoms. Of course, all these properties
of the group are independent of the actual molecular conformation and crystallographic
environment.
To generate a crystal, a number (Gi ) of independent
groups is placed in a certain space group which has S general equivalent
positions. We shall exclude the possibility that a symmetry element of
the group coincides with a symmetry element of the lattice: that lattice
symmetry element is disregarded, and the crystal structure is described
in a space group with lower symmetry. Thus the Gi groups
always coincide with the asymmetric unit, and the total number of groups
is S Gi . The number of really nonequivalent molecules
in a unit cell will be denoted by Z''. In the example of the dihydrate
ABB it can be seen that Z'' must be between 2 and 3Gi .
Possible crystal structures are generated by placing molecules with varying positions and orientations into a unit cell of varying dimensions. This gives at most twelve (P-1) and at least eight (e.g. Pna21) parameters to be varied; this number can be considerably larger if the molecules are flexible or if more than one independent molecule must be placed. One of the cell parameters can be eliminated by using an approximate crystal density.
In the older versions of UPACK a systematic grid search was used [3]. However, we have now found that a random search performs at least equally well, and is easier to handle in practice [16]. Therefore, the main part of this manual covers only the random search method, and the grid search is treated separately. Both approaches are fairly time-consuming, and require a simple force field with a low cutoff radius for nonbonded interactions. At the end a preliminary list of possible structures is obtained. In a final stage the energy minimizations can be continued with a better and more elaborate force field.
The calculations must be repeated for all probable space groups. Fortunately,
relatively few space groups account for about 90% of the actually observed
structures for organic molecules with Z'' = 1: P21/c,
P-1,
P212121,
P21,
Pbca,
C2/c,
Pna21,
Cc,
Pca21,
C2,
P1,
Pbcn,
Pc.
If the substance is chiral, only P1,
P21,
P212121,
and C2 remain. For a group with internal symmetry the generated
structures often belong to a space group with higher symmetry. The determination
of the actual space group of these structures is not possible in UPACK;
programs like
PLATON
[4]
or CERIUS
[5]
may be employed.
Except for input and output, where fractional coordinates may be used, the programs work with Cartesian coordinates in nanometer units. The crystallographic a-axis coincides with the Cartesian x-axis and the b-axis is positioned in the x-y plane. The crystallographic axes are projected on the Cartesian axes and the components ax, bx, by, cx, cy, and cz are used to characterize the unit cell.
The crystal is built from rigid molecules, defined in a local axes system with aid of three reference atoms for each molecule. The axes system is chosen in such a way that:
around the z-axis, over
around the
x-axis,
over
around the z-axis, and
finally translating over X, Y, Z:


Energy minimization is the central technique of the method. The energy is always reported in kJ/mol for one group (although other units may be used in the force field files). The programs were developed with routines from the GROMOS molecular dynamics simulation program package [6] as basis, and we are grateful to Professor W. F. van Gunsteren for his permission to do so.
As discussed above, in the searching stage fast calculations are essential. A Lennard-Jones 12-6-1 potential is used here (possibly even without atomic charges, to speed up the calculations). In the final stage Ewald summation may be used for the Coulomb and attractive r -6 dispersion terms, and a Buckingham-type exponential repulsion may be used instead of the r -12 term. Atomic polarizabilities may also be included.
To allow fairly reliable calculations of the Coulomb energy without Ewald summation, the molecules can be split up into charge groups of (almost) zero net charge. The cutoff criterion is based on charge groups rather than on individual atoms, thus making the electrostatic interaction decaying at long distance with r -3 rather than r -1. To allow for small errors in the force-field setup, the total molecular charge is set to zero by adding equal amounts to each atomic charge.
The energy minimizations are performed by a number of steepest descents steps, followed by a more or less exhaustive conjugate gradients procedure. The use of pair lists for nonbonded interactions (as well as lists of vectors in reciprocal space for Ewald summations) is essential for stable convergence [7]. These lists are only updated upon a significant change of any geometric parameter.
Unavoidably, the same structures will be encountered repeatedly, probably
with different choices for the unit cell parameters. Of course, it is very
important to eliminate such equivalent structures by a clustering procedure.
A general algorithm, based on comparison of radial distribution curves,
is available.
1. For every substance the molecule(s) in the group must be defined through a topology file, which also assigns a code to every force field contribution (like bond angles, atomic charges, and so on). To this end the program codes can be used. Later the actual values of the force field parameters are introduced by selecting a force field file.
2. For every conformation to be studied separately, a coordinate file for the constituent free molecule(s) has to be prepared. The program prep may be helpful. Note that often one input conformation can suffice, since rotational freedom about single bonds can be taken care of automatically.
3. The structures are generated by a random search for one or more independent molecules by the program pack12. Each molecule is placed with random orientation, random position, and (possibly) random rotatable dihedral angles in a crystal with random cell axes and cell angles. The energy of the generated structures is provisionally minimized with full geometry relaxation (non-rigid molecules).
4. The results are clustered by the program dist. This program also determines the number of crystallographically independent molecules (Z'').
5. The non-rigid minimization is continued with the program pack3. There is the possibility of relaxing the space group symmetry by expanding to P1, and to perform a primitive molecular dynamics "shake-up". In this way the stability of the obtained crystal structures can be investigated. It is also possible to study the effect of lattice vibrations on thermodynamical quantities [17].
6. Clustering delivers results which may be considered as final, or used for further study in a subsequent stage with a more sophisticated force field (e.g. explicit hydrogen atoms, more accurate Ewald summation, inclusion of polarization; or, using other program packages, atomic multipoles and ab-initio energies). A word of warning: it cannot always be taken for granted that there is a one-to-one correspondence between the crystal structures in different force fields.
If the experimental structure is known, one will want to know whether or not it was found by the packing procedure. The program distcan be used for this purpose too.
Molecular dynamics calculations can be carried out individually for
one structure with aid of the program crysmd.
Files for input and output have fixed names in the various programs, to be described in detail below. These file names should be linked to actual files present on disk. To this end a number of C-shell scripts is available which link the necessary files and execute the programs. In this manual a certain convention for directory structure and file names will be used; other users may prefer to create a different system.
File names are given in brown color. Instructions to be given for executing the script files are red. Many programs require code-directed input: on each input line a code word is followed by a number of numerical values which are identified by colored names. To aid in understanding the Fortran sources, these names generally (but not quite always) correspond to identifiers used in the programs. The symbol # indicates that a certain number must be inserted in that place.
A script file to execute a certain program typically looks like this:
runprog [u/a] [s/d] [filenum] [subst] [coord]
[ffname]
Here runprog is one of the execute scripts,
to be discussed in detail later. Some or all of the following parameters
must be specified; entries between square brackets [...] are optional.
A short explanation:
u/a: use a force field for united/all
atoms
s/d: calculate in single/double precision
filenum: a number (#) referring to a pack-type
file (vide infra)
subst: a chosen name for the substance
under study
coord: the name of a coordinate file
ffname: the name of the force field to
be used
Most input files must have a precisely defined structure, but some are code-directed: parameters then have default values unless specified by a line ("card") starting with one of a well-defined set of code words. An overview is given here, while a detailed description is given in a later section. Some explanation is also given as comments at the beginning of the subroutines.
The following files are more or less general, and have to be updated only incidentally:
The generated structures are collected in parameter files pack.#, the number # indicating the stage of the calculation in which the file was prepared:
# = 10: first stage (from pack12)
# = 20, 30, ...: subsequent stages (from pack3)
These files contain general information about the generated structures. Detailed atomic coordinates are delivered in corresponding files pack.19,pack.29, ..., respectively. These files generally contain equivalent solutions, which can be removed by the clustering program dist. This increases the file numbers by 3 (e.g., pack.30 is clustered to pack.33).
It is also possible to produce individual coordinate files of the hypothetical crystal structures in formats mdi, spf,fdat and/or cssr (for use in other modeling packages).
General output information is given on a file out.pri,
which serves as a record of the simulation conditions. This file is usually
linked to pack.pr#, clus.pr,
and so on. There is also output to the screen to signalize error conditions,
and to obtain (optionally) more output for debugging purposes.
The subdirectory progs contains:
To compile and link the programs, call
compall-unix
or
compall-linux
in subdirectories progs
as well as in progs/double
.
This should create various object files (*.o)
and executables (*.e)
in these directories.
The subdirectory tops contains the force field files (ending with .ff ), the codes file all.cod, and the space group file all.spg.
The subdirectory data is meant to contain your data files. It should contain a subdirectory subst for every substance to be investigated, and here the topology files subst.tpu and/or subst.tpa should be placed. The best policy is to create sub-subdirectories for every space group to be considered. If you wish to use experimental data for comparison, they should be placed in a sub-subdirectory exp.
Two subdirectories under data, named after their CSD refcodes, are already included for demonstration and testing purposes. The example for the random search is ethanol which has two independent molecules in space group Pc. It is an exceptionally favourable case where the structure generation proceeds quickly. See below for a detailed prescription.
The other example is in data/glucsa/p212121,
and is meant to demonstrate the grid search. The
substance is
-D-glucose, and the computing time
is artificially and dishonestly reduced by considering only eight Euler
angles close to the experimental structure rather than scanning all 504
orientations which would be considered in a real search. Also the maximum
number of structures used in later stages is unrealistically low.
List of files containing programs
| ycodes.f | Creates framework for topology files |
| ypack1.f | Generates structures with one independent molecule |
| ypack2.f | Generates hydroxyl torsional angles |
| ypack12.f | Generates structures with more than one independent molecule |
| ypack3.f | Minimizes the energy for non-rigidmolecules |
| yclus.f | Removes equivalent structures with one independent molecule |
| yzoek.f | Searches for one equivalent structure with one independent molecule |
| ydist.f | Combines the latter two functions for the general case |
| yprep.f | Minimizes the energy for one structure |
| ycrysmd.f | Performs molecular dynamics calculations for one structure. |
List of files containing subroutines or
functions
| ybonde.f | cbond, choek, cdiha: calculates intramolecular energies |
| ycart.f | cart: constructs coordinates for a free molecule |
| ych.f | ch: calculates eigenvalues and eigenvectors (from EISPACK) |
| yconnect.f | connect: tries to assign atoms from mislabelled coordinates |
| ycongra.f | congra: performs conjugate gradients energy minimization |
| ydclus.f | dclus: compares possibly equivalent structures |
| ydens.f | dens: estimates initial density for pack12 |
| ydiff.f | diff: compares possibly equivalent structures |
| ydihan.f | dihan (function): returns the value of a selected dihedral angle |
| ydlist.f | dlist: makes list of nonbonded distances for dclus |
| yexpsym.f | expsym: expands symmetry operations to the full group |
| yfiles.f | files, fdat: makes mdi, spf, and/or dat type of coordinate file |
| yfunc0.f | func0: calls energy routines for a free rigid molecule |
| yfunc1.f | func1: calls energy routines for a rigid molecule in the crystal |
| yfunc2.f | func2: calls energy routines for a rigid molecule with rotatable hydroxyl groups in the crystal |
| yfunc3.f | func3: calls energy routines for a flexible molecule in the crystal |
| ygeom.f | geom: calculates and compares geometric parameters |
| yintre.f | intre: calculates intramolecular energy |
| yintrot.f | introt: rotates a selected hydroxyl group |
| ykwald.f | kwald: calculates reciprocal part of Ewald correction |
| ylatvib.f | latvib: calculates lattice vibrations |
| yleesc.f | inpt (function): reads and stores card-directed input |
| yleesmd.f | leesmd: reads data for molecular dynamics run |
| yleesr.f | leesr: reads general space group information |
| yleess.f | leess: reads coordinates and expands equivalent positions |
| yleest.f | leest: reads molecular topology information |
| yleesx.f | leesx: reads geometry information, places hydrogen atoms |
| ylist.f | list: calculates nonbonded energy and forces from pair list |
| ylistk.f | listk: calculates reciprocal part of Ewald correction from list |
| ylistp.f | listp: as list, with polarization |
| yltw.f | nword (function), realstr, intstr: internal read procedure |
| ynewchg.f | newchg: read charges from file charges.cssr |
| ynonbe.f | nonbe: calculates nonbonded energy and forces |
| ynonbep.f | nonbep: as nonbe, with polarization |
| yopth.f | opth, listh: calculates energy for various hydroxyl torsional angles |
| yrunmd.f | runmd, stopcm: performs molecular dynamics run |
| ysecder.f | secder: calculates second derivatives for lattice vibrations |
| yshake.f | shake: keeps bond lengths constant during MD run |
| ysort.f | sort: sorts data in ascending order |
| ystedes.f | stedes: performs steepest descents energy minimization |
| ysubs.f | random (function), norm, prod: various utilities |
| ytest.f | test: numerical check on calculation of derivatives |
| ytrans.f | trans: transforms to standard axes system |
| ytransf.f | transf, matmat, invbox: transforms to another crystal setting |
| ytri.f | tri: transforms to oblique coordinates and back |
1. Prepare the connectivity file(s) subst.con
which are helpful to obtain the molecular topology file(s) by running the
program codes:
runcod subst.con
If parameters turn out to be missing, update
the codes file all.cod
and try again. The output file out.tp
is a framework which, after slight editing, will
be needed as a topology file for most of the following programs. Rename
the edited file to subst.tpu
(for united-atom force fields) or to subst.tpa
(for all-atom force fields).
2. Select or prepare the force field files (extension .ff). If the force field file name in the following scripts is omitted, the indicated default file is chosen (e.g. gromnoh.ff in the first application discussed below under 4).
3. If necessary, update the space group file all.spg.
4. Decide which conformations are to be used for
crystal packing, and prepare a coordinate file cor.#
for each of them (# = 001, 002, 003, ...). This can be done by your favorite
molecule builder or by the UPACK program prep
which uses the united-atom topology file subst.tpu
to construct the geometry of a free molecule, followed by an energy minimization.
To this end the desired dihedral angles must be given on a tors
card in subst.pp:
tors
(1)...
(Nz+Nh)
The first Nz dihedral angles
define the conformation; usually the Nh hydroxyl dihedral
angles are unknown and allowed to take any value. Run the program:
runpp subst [gromnoh]
Check the resulting geometry, especially the
dihedral angles. Repeat the procedure for every desired conformation, and
rename the output file(s)
00000.mdi
or 00000.spf
to cor.#, the
coordinate file(s) to be used in the crystal structure generation.
Decide which space groups are to be investigated. Create subdirectories named after these space groups (with omission of slashes, e.g. subst/p21c). In every subdirectory a search must be carried out. Usually the default settings are adequate; otherwise input files subst.pa# must be created. Copy the coordinate files cor.# into each subdirectory. The random search consists of the following steps:
1. Generating and clustering structures:
runpa12 u s subst [unitat3]
Output files are pack.10
(parameters) and pack.19
(coordinates); the parameter file must be clustered to pack.13:
rundist u 10 subst
2. Energy minimization: for instance, to continue
with an all-atom force field in double precision:
runpa3 a d 13 subst [opls]
rundist a 20 subst
with final results on files pack.23
and
pack.29.
The program prep
can also be used to minimize the energy of a given individual crystal packing.
The crystal cell data and atomic coordinates are given in an input file
subst.cor.
The space group information can be taken from this file or communicated
through a codes file subst.ppu
or subst.ppa.
Run the program:
runpp u d subst [unitat3]
(for united-atom topologies subst.tpu)
runpp a d subst [opls]
(for all-atom topologies
subst.tpa)
Output coordinate files are given in various
formats.
An important application is the minimization of
the energy of the experimental structure. This gives an indication about
the quality of the force field. Also, the experimental structure should
be present in the parameter files from the search procedure. To this end
the program dist
can also be used:
rundist u/a filenum subst coord
where coord
is the name of the file with the experimental coordinates (type spf
or
mdi). Since
interatomic distances are strictly compared, this structure must be energy-minimized
in the same force field as was used for the search.
Go to upack/data/etanol/exp
.
The experimental structure is etanol.cor.
Perform energy minimization:
runpp a d etanol
This creates the energy-minimized structure in
four 00001.*
files. Observe that 00001.spf
equals prep.spf.a
(all-atom force field). Now create a subdirectory upack/data/etanol/test,copy
the files cor.001
and etanol.pa*
from upack/data/etanol/pc,
and run:
runrand etanol
This is a standard PACK-run, which takes less
than 10 minutes on a reasonably modern PC. It should find the ethanol structure
(Z' = 2 in space group Pc), using the random search method. Compare
the resulting files with the corresponding ones in upack/data/etanol/pc.
Do not worry about slight discrepancies, the results are sensitive to processor-dependent
roundoff errors.
The space group file upack/tops/all.spg is read by the subroutine leesr in various programs. It must have the following entries for every space group:
| cell axes: | 1: ax | 2: by | 3: cz |
| cell angles: | 4: bx | 5: cx | 6: cy |
| molecular centers: | 7: X | 8: Y | 9: Z |
| molecular orientations: | 10: ![]() |
11: ![]() |
12: ![]() |
The file upack/tops/all.codis needed for the program codes. It defines symbols for various types of atoms and code numbers for terms in the force field. It has to be prepared only once for a certain class of molecules; if desired, several files ending with the extension .cod may be present. It must contain the following blocks of data, separated by one blank line (=), where A1, A2, ... indicate the fixed symbols for certain atom types:
regel: one line of
arbitrary text
=
A1 amass mpol: atom
symbol, atomic mass (a. u.), code for polarizability (800-899).
=
A1 A2 code for bond
stretching (100-199)
=
A1 A2 A3 code for
angle bending (200-299)
=
A1 A2 A3 A4 code
for improper dihedral deformation (300-399)
=
A1 A2 A3 A4 code
for dihedral angle torsion (400-499)
The symbol X is allowed in codes 300-499 and
stands for any atom type.
=
(Optional): any number of sets of two lines:
The force field file upack/tops/ffname.ff is read by the subroutine leest in various programs. To allow an easy change of force fields, the relevant information is transmitted through code numbers as defined in the file all.cod (vide supra). Upon reading a force field file these code numbers are translated into actual numerical values. The force field files must have the following structure:
:
= epsil
if
iddc = 0
= epsil
*
r
if iddc = 1 (note:
r
in Angstroms!)
Now the force field data are read, in arbitrary
order and possibly mixed with blank lines. The programs use the units nanometer
for length and kJ/mol for energy. However, in the force field file different
units may be used, which are converted by the factors fact
and
facr. Angles
are in degrees, but angle force constants
and
are in energy units rad-2.
The following contributions to the force field are implemented:
,
0
in the harmonic angle bending potential:
,
0 in the harmonic improper
dihedral angle potential:

where
is the relative
dielectric constant, and qi is multiplied internally
by the factor 11.787 needed to obtain the energy in kJ/mol. The 1...4 interactions
are multiplied by sc14c.
The van der Waals parameters for nonbonded interactions are treated differently since they are based strictly on atom types rather than on individual atoms. The cards start with two atom types A1 A2 , followed by numerical data. Entries for the combination of two equal atoms must be present; if other interaction parameters are absent (or are given the value -1), they are found from a combination rule. For the combination of atom types i and j this rule is generally the geometric average:
but sometimes the arithmetic average is used:
Intramolecular 1...2 and 1...3 interactions are always excluded. For the Lennard-Jones interaction it is possible to specify separate parameters for 1...4 interactions. Missing (or negative) parameters for the combination of two equal atoms are taken from the corresponding general parameters. For missing mixed 1...4 parameters the rule is:
The following input formats are used:
i
(volume units), the polarizability of atom i in:
For every substance
subst a topology file
subst.tpu
and/or subst.tpa
must be prepared. This can be done manually, but it is more easy to run
the program codes:
runcod subst.con [codname]
This program needs two input files:
or +35
,
and so for every asymmetric atom the choice between codes 301 and 302 must
be made.
or -60
, codes 303/304) can be used to
fix one of the two possible chair forms in six-membered rings.Examples are to be found in the directories data/glucsa and data/etanol. It is instructive to compare the files out.tp with the final topology files. For ethanol the all-atom carbohydrate OPLS force field (vide infra) is not adequate, and manual correction was necessary.
The molecular connectivity file subst.con must contain the following data:
For the crystal structure prediction molecular geometry files are needed for every conformation to be studied. They are named cor.001,cor.002, ... Allowed are the so-called spf, mdi, and cssr formats. The program prep may be helpful for the construction of these files (see also under Preliminaries).
The mdi format is compatible with the gromos program package. It uses fixed format:
,
,
(degrees)
,
,
(degrees)These files are read by the procedure leesx in various programs. Each atom name must correspond with the name panm chosen in the connectivity file all.cod, except for hydrogen atoms which are selected from the coordinate file on the basis of a distance criterion. Missing H atoms are automatically generated, providing an easy change from a united-atom force field to an all-atom force field. If more than one group of atoms is present, the same sequence of atoms must be used for each group.
The cssr format is useful for interfacing with other program packages, for example TINKER or CERIUS. It is in fixed format:
,
,
(degrees),
space group number, space group nameUnited-atom force fields for monosaccharides and polyalcohols
Atom types:
| HO | hydroxyl H |
| OA | hydroxyl O |
| OS | ether O in ring and exocyclic OCH3 group |
| CS0 | C atom without H in sugar ring |
| CS1 | united CH1 group in sugar ring |
| CS2 | united CH2 group in sugar ring and exocyclic CH2OH group |
| CH3 | united CH3 group |
| CH3 | 0.00 |
| CS2 | 0.00 |
| C-O-H | 0.15, -0.50, 0.35 |
| C-O-C | 0.18, -0.36, 0.18 |
| C-O-C-O-H | 0.20, -0.36, 0.31, -0.50, 0.35 |
| C-O-C-O-CH3 | 0.15, -0.36, 0.17, -0.36, 0.17 |
| C-O-C-O-CS2 | 0.16, -0.36, 0.17, -0.36, 0.16 |
| C-O-C-O-C-O-C | 0.14, -0.36, 0.40, -0.36, 0.40, -0.36, 0.14 |
All-atom force fields for monosaccharides and polyalcohols
Many force fields depend on atomic charges from ab-initio quantumchemical calculations. For carbohydrates, with their high conformational flexibility, this is rather unpractical. Carbohydrate force fields with fixed standard charges have been proposed by Ha et al[11], Kouwijzer et al [12] and Damm et al[13]. We have compared the rankings of the experimental structures in a test set of 15 molecules for these force fields [7], and found that the third force field (OPLS) gives the best results. Nevertheless, it is not superior to UNITAT and we have not been able to develop an all-atom force field that performs better. Polarization, although theoretically necessary to account for the cooperativity of hydrogen bonds, gave no improvement at all.
The OPLS parameters can be found in file opls.ff, the two other force fields in the files ha.ff and milou.ff. All these force fields use the Lennard-Jones potential; in the OPLS force field all 1...4 interactions are scaled by 0.5. A cutoff of at least 10 A or, preferably, Ewald summation should be used. No improper dihedrals are necessary.
Atom types:
| HO | hydroxyl H |
| HC | aliphatic H |
| OA | hydroxyl oxygen |
| OS | ether oxygen in ring |
| OE | hydroxyl oxygen in exocyclic CH2OH group |
| CA | anomeric carbon atom |
| CS | other ring carbon atoms |
| CE | exocyclic carbon atom |
The charges in the OPLS force field are found by the following rule: each bond carries a "bond increment" which contributes equal positive and negative charges to the two atoms involved.
| Negative side | Positive side | Bond increment | |
| OA (anomeric) | HO | 0.435 | |
| OA (other) | HO | 0.418 | |
| OA | C | 0.265 | |
| OS, OE | C | 0.200 | |
| C | C | 0 (dividing charge groups) | |
| CA | HC | 0.10 | |
| C (with OH group) | HC | 0.06 | |
| C (other) | HC | 0.03 |
All-atom force fields for carboxylic acids
As discussed in Ref. [18], we have made a few changes and extensions in the OPLS force field for the modelling of carboxylic acids (the "OPLS-AC" force field). This extension is only a first try, and is by no means fully optimized. The parameters are included in the file opls.ff; charges are as published in Ref. [18].
Atom types:
| C | C in COOH |
| O | =O in COOH |
| OZ | -O in COOH |
| HZ | H in COOH |
| CS | aliphatic C |
| HC | H on C or CS |
| OA | hydroxyl O |
| HO | hydroxyl H |
All-atom force fields for aromatic systems
Due to conjugation, each aromatic system is unique in principle: geometry parameters and charges have to be derived from theoretical chemistry. For simple chloro- and bromo substituted benzenes a few standard parameters are available in UPACK [19]. For structure generation Lennard-Jones parameters can be taken from the file opls.ff. Buckingham parameters, mostly taken from the publications of Williams, are available in the file will.ff. In both files intramolecular parameters were taken rather arbitrarily.
Atom types:
| CR61 | united-atom benzene CH group |
| CR | united-atom central C atom or all-atom aromatic C |
| HR | aromatic H |
| Cl | chlorine |
| Br | bromine |
Every C-C bond separates two charge groups:
| CR61 | 0.00 |
| CR-HR | -0.10, +0.10 (+/- 0.15 in will.ff) |
| CR-Cl | +0.10, -0.10 |
| CR-Br | +0.10, -0.10 |
All programs have code-directed input. After each code word one or more numerical entries (separated by one or more spaces) may be required or optional. If a certain code word is not specified, the parameters keep their default values which are given between parentheses {...}. The same goes for unspecified optional parameters (indicated by [...]). Code words that are unknown, or preceded by one or more spaces, are disregarded.
A number of code words occurs in several programs. Some are discussed here, and will be mentioned in the individual input descriptions without repeated explanation.
cutoff cutoff
[ alphac alpha6 hmaxc hmax6 tole tolk]
Overrides the cutoff value given in the force
field file, and possibly also the settings for Ewald summation
.
inner inner
A surface charge on polar crystals causes a shape-dependent
energy term. Normally it is assumed that this term is compensated by external
charges; this corresponds to omitting the h=0 term in Ewald summation
[14].
If the same situation is to be simulated without Ewald summation, a term
2
p2 / (3
V) must be added.
In UPACK this is done if inner
> 0; if alphac>
0 inner is set
to zero. In lattice vibrational calculations, the contribution of this
term to the second derivatives is omitted unless inner
= 2.
pol npp
[ poltol schaal gamma nnn mmm nte nttt]
Include polarization, possibly
overriding the default values given in the force field file.
sd# ncy#sd tol#sd
stp#sd nre#sd shm#sd
Specifies steepest descents energy minimization
in various stages (#) of a program: the minimization is stopped after
ncy#sd cycles or earlier if the energy decreases
less than tol#sd.
The initial step size is determined by stp#sd.
A new pair list is made after nre#sd
cycles or if any parameter changes more than shm#sd.
The steepest descents minimization is usually followed by:
cg# ncy#cg tol#cg
stp#cg dep#cg shm#cg
Specifies conjugate gradients energy minimization
in various stages (#) of a program: the minimization is stopped after
ncy#cg cycles, or earlier if the root mean
square value of the energy derivatives is less than tol#cg
and a recalculation with a new pair list does not change the energy more
than dep#cg.
The initial step size is determined by stp#cg.
A restart with a new pair list is made if any parameter changes more than
shm#cg.
files ifils ifilm
ifild ifilc
Produce output files if these parameters are
larger than 0:
ifils = 1: a separate
file (format spf)
is made for each structure
ifils = 2: the results
are also written to one continuous file (all.spf)
ifils = 3: the results
are only written to one continuous file (all.spf)
ifilm = 1: format
mdi.
ifild = 1: format
fdat.
ifilc = 1: format
cssr.
ifilc = 2: format
cssr,
the charges are written out too.
This program generates structures with any number
of groups of molecules. It performs the same action as the combined grid
search programs pack1
and pack2, but
in quite a different way. Rigid molecules are placed with random positions
and random orientations in a cell with randomly chosen crystallographic
parameters. Dihedral angles may be set to random values too. The resulting
structures are optimized with respect to energy, assuming fully non-rigid
molecules. An arbitrary number of conformations can be studied consecutively.
It is possible to make use of partial knowledge
by limiting the search range for the cell axes and cell angles. It is also
possible to take fixed starting values for known positions and orientations
of molecules, as well as for dihedral angles.
Execute script: runpa12 u/a s/d subst [ffname]
Input files:
in.tp (unit 20),
molecular topology, linked to upack/data/subst/subst.tpu
for united-atom force fields or to upack/data/subst/subst.tpa
for all-atom force fields
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
(default
ffname=unitat3 or
ffname=opls, respectively)
in.pa (unit 13),
code-directed input, linked to subst.pa12
cor.001,cor.002,
cor.003, ... (unit 14), coordinates for various
conformations.
charges.cssr (unit
14, optional): a cssr-type
file containing charges to superside the charges from the topology file
(if icoul = 2).
Output files:
out.pri (unit 16),
general results, linked to pack.pr1
pack.10 (unit 10),
structure parameters
pack.19 (unit 19),
coordinates
The number of distinct groups of molecules (Gi) is called ngr. This number must certainly not be larger than 4 to avoid excessively long and usually fruitless computations. Each energy calculation is discontinued immediately if any repulsion term exceeds the value ebig. This will occur in the majority of cases. If not, ncysd cycles of steepest descents and ncycg cycles of conjugate gradients energy minimization follow.
The coordinate file(s) must contain the atoms
in exactly the same order as the topology file. If
ngr is larger than one, the coordinates can
either be specified for each group individually or for a number of groups.
In the latter case ngr
must be a multiple of that number, and the given coordinates are copied
as many times as necessary.
Input file
The input data discussed above can be manipulated by cards in file in.pa.
ngr ngr{1}
The number of groups of molecules, Gi.
rand nrand{5000}
The search continues until
nrand random structures are found (for every
conformation).
spgr naamsp
The space group name (upper or lower case, slashes
may be omitted). This card must always be present.
emax emax
{40}
Structures with energy higher than
emax (with respect to the lowest energy found
at that time) are not included in the output file.
axes amin amax bmin
bmax cmin cmax {0.4 2.5 0.4 2.5 0.4 2.5}
The cell axes are scanned: ax
= amin ...
amax, by =
bmin ... bmax,
cz
= cmin ...
cmax.
angles bxmin bxmax
cxmin cxmax cymin cymax {0 1 0 1 0 1}
The cell angles, represented by the projections
of the cell axes on the Cartesian axes, are also varied:
bx=(bxmin
... bxmax) ax,
cx=(cxmin...
cxmax)
ax,
cy=(cymin
...
cymax) by.
grvfix xxfix yyfix
zzfix (1...MgGi )
Fixed starting centers of gravity for each molecule
(but random for each value > 999)
eulfix phifix thefix
psifix (1...MgGi )
Fixed starting Euler angles for each molecule
(but random for each value > 999)
hydfix hydfix
((1...Nz+Nh),1...Gi ))
{1000...1000}
Each dihedral angle is set to the corresponding
value of hydfix,
except:
if hydfix
> 999, random values are assigned to dihedral angles
if hydfix
< -999, the dihedral angles from the input geometry are retained
celopt icopt
{1}
Cell axes and angles are kept fixed if
icopt = 0.
sd ncysd tolsd stpsd
nresd shmsd {100 1 0.001 10 1}
cg ncycg tolcg stpcg
depcg shmcg {1000 1 0.01 1 0.05}
ebig ebig {2000}
dexp dexp
{-100} [volcor
{1.10}]
dexp > 0: The experimental
density dexp
(in gram cm-3) is used to find ax.
dexp = 0: No such
restriction on ax. However, this is usually very inefficient
since much time is wasted on unproductive starting structures.
dexp < 0: The
density is estimated from -dexp
random structures (for each conformation) before starting the calculations.
Its value is found from the smallest volume encountered.
volcor: The smallest
volume encountered is multiplied by volcor
to facilitate finding new structures.
inner inner {1}
seed iseed
{281197}
The integer iseed
initializes the random number sequence.
append iseed
If this keyword is present, the nrand
random structures are appended to the results of a previous search (pack.10
and pack.19).
It is necessary to redefine seed
iseed here to prevent just repeating the previous
calculation!
If the previous calculation was stopped uncleanly, take care that pack.10
and pack.19 end
with the same entry.
coul icoul {1}
If icoul
= 0 no Coulomb interactions are calculated.
If icoul
= 2 the charges from the topology file are overruled by those from the
file charges.cssr.
intra intra {1}
intra = 0/1: disable/enable
the calculation of intramolecular nonbonded interactions
pres pres {0}
The pressure in atmospheres.
print iprint {0}
Minimal output on the screen and on
out.pri is given for iprint
= 0. A larger value gives more output.
test dtest {0}
If dtest
> 0, the analytical energy derivatives are compared with numerically calculated
values (using parameter shifts
dtest).
cutoff cutoff
[ alphac alpha6 hmaxc hmax6 tole tolk]
The default values from
unitat3.ff and
opls.ff (0.8 3 3 2 2 0.02 0.02) are recommended:
Ewald summation with a still fairly small cutoff radius and elimination
of quite a lot of terms in the pair list.
stc stcin(1...4)
{1 1 1 1}
Steepest descents weights for cell axes, cell
axis projections, and atomic coordinates, respectively.
Output files
The file pack.10 contains first data concerning space group symmetry, torsional degrees of freedom and atomic coordinates of the free molecule. For explanation of the symbols see the space group and topology files. The following lines occur:
- ispgr icode
Space group number and sequence code. The sequence
code is -1 to indicate that the file is generated by program pack12.
- ndihz ndihh:
Nz,
Nh
(= 0). For every dihedral 1...Nz:
- iome jome kome
lome omega
- nrp ireftr(1...3):
Ag,
followed by the three numbers of the reference atoms.
The for every atom i = 1...
nrp:
- i panm iac xr(1...3),
where panm
is the atom name corresponding to the topology file,
iac is the sequence number of the atom type
code, and xr
gives the Cartesian coordinates xi, yi,
zi
in nm for the first geometry file (cor.001).
- cel(1...12): starting values for the parameters
- iref(1...12) to control the optimization of the 12 parameters
After these initial data, each line characterizes a certain structure:
- code1:
N1,
the sequence number of this structure in the random series (after the density
test but before the energy test)
code2: N2,
the sequence number in this file.
This is kept as identification number in all
subsequent calculations.
epot: the energy
(kJ/mol)
par(1...): ax,
by,
cz,
bx,
cx,
cy
(pm) unless constrained by symmetry (i.e, the corresponding
iref must not be zero).
ntest: sequence
number in the random search (before the density test)
en1 vol: the energy
value at the initial calculation, and the molecular volume (both multiplied
by 1000 to obtain an integer value)
tau(1...): the values
of the dihedral angles
1 (the multiplicity, for later reference)
Clustering by dist yields the file pack.13 which has the same structure. Only code1 now gives the sequence number in the clustered file (code2 still refers to N2) and two new data are added:
zflag: the number
of independent molecules (Z'')
imult: the cumulative
number of structures that have been clustered together for this entry
The file pack.19 contains Cartesian atomic coordinates for every structure:
- icod2 box(1...6):
N2,
ax,
by,
cz,
bx,
cx,
cy
- x(1...3Ag):
x1,
y1,
z1,x2,y2,
z2,
...
Program pack3
This program performs the calculations in subsequent stages of the crystal structure prediction: energy minimization with full structure relaxation. Thus intramolecular as well as intermolecar energy contributions are taken into account.
Execute script: runpa3 u/a s/d filenum subst [ffname]
To illustrate the use of the file number, we take filenum = 13 (the first instance where pack3 is usually called). As an alternative example between {...} we take filenum = 42; this should make the general principle clear.
Input files:
in.tp (unit 20),
molecular topology, linked to upack/data/subst/subst.tpu
for united-atom force fields or to upack/data/subst/subst.tpa
for
all-atom force fields
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
(default
ffname=unitat3 or
ffname=opls, respectively)
in.pa (unit 13),
code-directed input, linked to subst.pa2{5}
in.par (unit 21),
structure parameters, linked to pack.13{42}
in.cor (unit 29),
coordinates, linked to pack.19{49}
in.add (unit 14),
optional set of additional coordinates, linked to file
extra
in.md (unit 15),
input for molecular dynamics, linked to subst.md
charges.cssr (unit
14, optional): a cssr-type
file containing charges to superside the charges from the topology file
(if icoul = 2).
Output files:
out.pri (unit 16),
general results, linked to pack.pr2 {5}
out.par (unit 30),
generated structure parameters, linked to
pack.20 {50}
out.cor (unit 39),
generated coordinates, linked to pack.29 {59}
For every structure (identified with a five-digit
number #) there is the possibility for coordinate output in four formats
(unit 17), linked to #.spf,
#.mdi, #.dat,
#.cssr.
Additional possible output from molecular dynamics:
#.en (unit.15),
energy snapshots
#.co (unit.17),
coordinate snapshots
#.ve (unit.18),
velocity snapshots
#.xvo (unit.19),
final coordinates and velocities
The program automatically detects the number of
molecular groups from the number of atoms on the coordinate file. It is
also possible to read an (additional)
hand-made file in.add
with cell parameters and atom coordinates.
It is possible to expand the geometry to independent molecules in space group P1. But the energy still refers to one molecular group. The objective is to see whether the symmetry is destroyed when it is no longer enforced; if so, the structure is in fact unstable. In this situation a significant change in energy occurs, and the resulting symmetry-averaged coordinates do not make sense.
The same objective can be reached by performing a primitive MD-run before (or instead of) the energy minimization. Here too one has the choice whether or not to enforce the space group symmetry. It is obviously better not to do so; in fact, to make the simulation less primitive one should consider more than one unit cell. That option is available in the program crysmd.
It is possible to calculate the lattice vibrations and the associated thermodynamical quantities. Here the occurrence of imaginary frequencies is an indication that the crystal structure is not a stable one.
Subsequent runs
To continue the energy minimization, subsequent
runs can be done after clustering. For instance:
rundist 20
creates pack.23,
and
runpa3 a d 23 subst [ffname]
uses the parameter file pack.23
, together with input file pack.pa3 and
coordinate file pack.29,
to produce all-atom results in pack.pr3,
pack.30 and pack.39.
This procedure can be repeated at will up to
pack.pr9.
Input file
The options mentioned above can be manipulated by cards in file in.pa:
expand expand
{0}
If this code word is present, the coordinates
are expanded to P1. Random position disturbances (with maximum
expand) can be added to move the system slightly
away from a possible saddle point; a reasonable value would be 0.0005.
maxinp maxinp
{5000}
The maximum number of input structures to be
read from in.par.
To select certain structures from this file, set
maxinp < 0 and then give on the same line
any desired quantity of sequence numbers N2. If no sequence
numbers are given, the program disregards the information from
in.par and operates on all entries from
in.cor.
emax emax1[emax2]
The highest allowed energy: emax1
is relative to the first entry on the input file
in.par,emax2
is the absolute value.
celopt icopt
{1}
Cell axes and angles are kept fixed if
icopt = 0.
sd ncysd tolsd stpsd
nresd shmsd {100 0.01 0.0001 5 0.02}
cg ncycg tolcg stpcg
depcg shmcg {5000 0.0005 0.005 0.3 0.02}
These values ensure good convergence, but this
will cost a lot of computer time. If the number of structures is large,
an intermediate calculation with fewer cycles and a larger tolerance will
be more efficient. However, equivalent structures may then not be recognized
in the clustering procedure.
extra [ncycsd
tolsd ncyccg tolcg]
If a file in.add
is present to provide additional structures, new values for the minimization
parameters can be given here. The format of this file may be
mdi or spf;
in the latter case the data for the different molecules must be separated
by an end card.
If the file title starts with a number that has not been used before, that
number identifies the structure. Otherwise a new number is assigned, starting
with 90001.
inner inner {2}
icoul icoul
{1}
If icoul
= 0 no Coulomb interactions are calculated.
If icoul
= 2 the charges from the topology file are overruled by those from the
file charges.cssr.
intra intra
{1}
intra=0/1: disable/enable the calculation of
intramolecular nonbonded interactions
pres pres
{0}
The pressure in atmospheres.
print iprint
{0}
Minimal output on the screen and on
out.pri is given for iprint
= 0. A larger value gives more output.
test dtest
{0}
If dtest>
0, the analytical energy derivatives are compared with numerically calculated
values (using parameter shifts
dtest).
cutoff cutoff
[alphac alpha6 hmaxc hmax6 tole tolk]
The default values from
unitat3.ff and
opls.ff are 0.8 3 3 2 2 0.02 0.02: Ewald summation
with a still fairly small cutoff radius and elimination of quite a lot
of terms in the pair list. This is a reasonable compromise between speed
and precision. But note that some calculations may not converge, or even
give absurd results. Redo these (using the
maxinp -1 ... option to save time) with, for
instance, cutoff
1.2 3 3 2 2 0 0. Also, in the last stages of the analysis it is better
to set
tole
=
tolk
=
0, and to experiment with increasing the other parameters.
pol npp[poltol schaal gamma nnn mmm nte nttt]
files ifils ifilm ifild ifilc {0 0 0 0}
stc stcin(1...3)
{1 1 1}
Steepest descents weights for cell axes, cell
axis projections, and atomic coordinates, respectively.
If molecular dynamics calculations are desired:
mdrun nstlim
{0} [nrun {1}]
Do molecular dynamics. This is done by subroutine
runmd, which has been taken from the
gromos program package and adapted for use
in crystal simulations. See ref. [15] for background.
Time units are ps.
The card mdrun
must give nstlim,
the number of MD steps. Optionally the number of runs (nrun)
may be specified. The following cards may also be used:
mdcut cutoff
[alphac alpha6 hmaxc hmax6 tole tolk]
Only needed if different from the values used
for the energy minimization
mdtsc ntt tempi tautp
{1 0 0.10}
Controls temperature scaling:
ntt = 0: disable
temperature scaling
tempi: if
tempi > 0 the initial velocities are taken
from a Maxwellian distribution with T = tempi
tautp: relaxation
time for temperature scaling
mdtemp temp0
{298}
temp0: reference
temperature for velocity scaling
mdpsc ntp comp taup{4
0.00076 5.0}
Controls pressure scaling:
ntp = 0: no pressure
scaling
ntp = 1: isotropic
pressure scaling
ntp = 2: orthorhombic
pressure scaling
ntp = 3: monoclinic
pressure scaling
ntp = 4: triclinic
pressure scaling
comp: compressibility
(Gromos units), essentially only a scaling factor
taup: relaxation
time for pressure scaling
mdpres pres0
pres0: reference
pressure (atm)
Only needed if different from the pressure used
in the energy minimization
mdshak ntc
{3} [tol {0.00001}]
ntc = 1: disable
the use of bond length constraints by the
shake procedure.
tol: the geometric
tolerance for that procedure
mddt dt
{0.002}
dt: the time step
in picoseconds
mdnscm nscm
{500}
Stop translation of the center of mass after
every nscm steps
mdnsnb nsnb
{10}
Make a new pair list after everynsnb
steps
mdpri ntpr ntwx ntwv
ntwe ntpw {0 0 0 0 2}
Write output to screen and
out.pri after every
ntpr steps (the quantity of output being determined
by iprint).
If ntwx
> 0, write coordinates to #.co
(unit 17) after every ntwx
steps.
If ntwv
> 0, write velocities to #.ve
(unit 18) after every ntwv
steps.
If ntwe
> 0, write various results to#.en
(unit 15) after every ntwe
steps.
If ntpw
= 0, the coordinate and velocity files are unformatted.
If lattice vibration calculations are desired:
latvib [nq
...] {0}
If this keyword is present, lattice vibration
calculations are performed. To this end double precision is required; it
is advisable to do energy minimization first. The calculations are first
done for q=0, and subsequently for nq
vectors in q-space. Possible modes of input:
latvib n1 n2 n3
Calculations are done for nq
= n1 * n2
* n3 q-vectors,
filling a regular grid in q-space.
latvib f1 f2 f3 nq
Calculations are done for the vectors defined
by f1,
f2,
f3
divided in nq
equal parts.
temp vtemp(1....)
{298, undefined ...}
The thermodynamical quantities (and the vibrational
amplitudes, if requested) are calculated at temperatures vtemp(1),
vtemp(2), vtemp(3), ...
Output files
The file pack.20 contains first data concerning space group symmetry, torsional degrees of freedom and atomic coordinates of the free molecule. This is quite similar to the output from pack12, but note that now icode is equal to the sequence number of the call of pack3. After these initial data, each line again characterizes a certain structure:
- code1:
N1,
the sequence number in this file
code2: N2,
the sequence number from pack12.
This is kept as identification number in all subsequent calculations.
epot: the energy
(kJ/mol)
par(1...): ax,
by,
cz,
bx,
cx,
cy,
X,
Y,
Z,
,
,
,
unless constrained by symmetry.
(i.e, the corresponding
iref must not be zero). Lengths in pm, angles
in degrees multiplied by 10.
omega(1...Nh+Nz):
torsional angles (degrees)
imult: the number
of structures that have been clustered together for this entry in previous
runs.
If more than one independent molecule is present,
only the cell parameters are given (i.e., centers of gravity, Euler angles
and torsional angles are not included).
The file pack.29contains
Cartesian atomic coordinates, in exactly the same format as pack.19.
The molecular dynamics output file #.en contains for every selected time step:
ener(1...46):
1...10: energy components: total, kinetic, potential;
ax,
by,
cz,
bx,
cx,
cy,
volume
11...15: energy components: bonds, angles, impropers,
torsions, nonbonded
15...20: not yet assigned
21...30: nine pressure components, total pressure
31...40: nine virial components, total virial
41...46: a, b, c (nm),
,
,
(degrees)
This program compares possibly equivalent crystal structures on the basis of a comparison of interatomic distances. It can be used to eliminate equivalent structures ("clustering mode") or to search for the presence of a certain structure (usually the experimental one; "search mode"). Fairly well-refined structures are required: if the geometrical differences become too large, the structural equivalency may be no longer recognized.
Execute script: rundist u/a [-]filenum [subst [coord]]
Input files:
in.tp (unit 20),
molecular topology, linked to upack/data/subst/subst.tpu
for united-atom force fields or to upack/data/subst/subst.tpa
for
all-atom force fields
in.spg (unit 11),
space group, linked to upack/tops/all.spg.
in.dcl (unit 13;
optional), code-directed input, linked to:
- clustering mode:
subst.clu (united atoms) or
subst.cla (all atoms);
- search mode: subst.zku
(united atoms) or subst.zka
(all atoms).
parameter file(s) (
fort.201, fort.202,
fort.203, ...): type
pack.20.
coordinate file(s) (
fort.101, fort.102,
fort.103, ...): type
pack.29
For instance, if
filenum = 41 then
fort.201 is linked to
pack.41 and fort.101
is
linked to pack.49.
If filenum =
-41, all corresponding files in directories at the same level are linked.
in.spf (unit 14,
optional): one continuous spf-type
file, linked to extra.spf.
in.exp (unit 15,
optional): an spf-type
file containing one reference structure, linked to the file
coord.
Output files:
out.pri (unit 16),
linked to distclu.pr#,
distcla.pr#, distzku.pr#
or
distzka.pr#.
new parameter file(s) (fort.301,
fort.302, fort.303,
...): type pack.20.
a combined list of surviving structures: struclist.#.
Two structures are considered equivalent if the radial distribution curves for each atom in the asymmetric unit are the same. This is even more specific than comparing powder diffraction diagrams, since not only the distances but also the atom types must correspond. This type is normally equal to the atom type, but it can be set more specific than just, say, "hydroxyl O": in glycerol the two end oxygen atoms are equivalent, but the middle one is different.
The structures to be compared are characterized
by their space group symmetry, but there is no necessity that this space
group is the same as required by crystallographic conventions. For instance,
a P21/cstructure could also have been generated
in P21 with two quasi-independent molecules or even (with
cell doubling) in P1 with eight. So the numbers of molecules in
the various structures to be compared may differ. The program first determines
for every structure the number of crystallographically independent molecules
(Z'') before starting the actual comparison of structures. There, of course,
only structures with the same Z'' can be really equivalent. The determination
of the actual space group of these structures is quite another matter,
for which programs like
PLATON
[4]
orCERIUS
[5]
may be employed.
Structural information
The program tries first to read cell parameters and atomic coordinates from a set of pack-type coordinate files, one for each space group to be considered. They must be assigned to files fort.101, fort.102, ..., and the corresponding pack-type parameter files (fort.201, fort.202, ...) must be present to select the desired structures, and to carry information on space groups and atom types. These files have the standard UPACK format, as described under pack2. In the script file rundist the second parameter gives the number of the file to be considered (e.g. 30 for the file pack.30); if this number is given as -30 all pack.30 files present in directories at the same level (usually a directory for every space group) will also be linked.
After that, the program tries to find additional structures from a file in.spf which contains an arbitrary number of spf-type entries. Each entry must contain at least the cards:
CELL box(1...6):
a,
b,c
(Angstrom),
,
,
(degrees)
and cards with fractional non-orthogonal coordinates:
ATOM panm x(1...3):
atom name, x, y, z (one line for each atom). The atoms
must be ordered as in the topology file, and it must be possible to form
molecules from their coordinates without using symmetry operations.
The information for every structure is concluded
by the card:
END
Obviously, at least either this file or one pair
of pack-type
files must be present. Optional information can be given by a title card
containing one line of arbitrary text:
TITL regel
and the energy can be transmitted by a card:
ENER ener
This is used for selection of structures to be
compared; in the absence of this card no such selection is done for this
entry.
Symmetry information
In each pack-type
file the space group number is given on the first line. In the additional
file in.spf the
space group information is presented for each entry separately, either
by a space group name, e.g.:
SPGR P21/m
or by detailing a set of symmetry operations
separated by commas:
SYMM -x, y+1/2,
-z
SYMM x, -y+1/2,
z
where the redundant card
SYMM -x, -y, -z
is optional. If both types of cards are present,
the SYMM information
is used.
Lines not starting with any of the described code words are skipped. When only symm cards are used to introduce space group information, the file in.spg needs not to be present. In that case the parameter filenum is a dummy argument, and all space groups can be defined. But if the space group of any entry is defined otherwise, the space group file in.spg is needed and only triclinic, monoclinic and orthorhombic systems can be handled.
Distance selection
It may be necessary or desirable to modify the
standard information on atom types as given in the files discussed above
for one or more of the following reasons:
- Normally only intermolecular distances are
considered. So, if the list of atoms contains distinct molecules, each
last atom of a molecule must be marked.
- Atoms may have the same
type (vide supra).
- The coordinates of some atoms, for instance
hydrogen atoms bound to carbon, carry hardly any significant information.
Deleting such atoms can lead to a substantial increase in speed.
The specification of this information is done
through an atom
card (vide infra) in the file in.dcl.
Mode selection.
In the standard mode of operation ("clustering") the program removes equivalent structures. Output, sorted to energy, is written to new pack files with the number increased by three: for instance, pack.31 is clustered to pack.34.
Alternatively, a file
in.exp may be present which defines a reference
structure (in the spf format
described above). In this case, the program switches to the "search" mode
where each structure of the list(s) is only compared with the reference
structure.
Details of the calculation can be manipulated by cards in file in.dcl:
cutoff cutoff
{0.5}
The cutoff distance (nm) for the radial distribution
curves
margin margin
{0}
The number of contacts per molecule that are
allowed not to be found
sym isymm
{1}
isymm = 0: the space
group is changed to P1. This overrules the space group information
given on the pack-files.
isymm = 1: space
group information is used.
dener dener
{1.0}
The comparison between two structures is skipped
if their energies differ by more than dener.
This does not apply to entries from in.spf
where no energies are available. Also it applies only to the clustering
mode.
rvol rvol
{0.01}
The comparison between two structures is skipped
if the ratio between their molecular volumes deviates from 1 by more than
rvol. This applies only to the clustering
mode.
tol diff
{0.005 in clustering mode, 0.02 in search mode}
Maximum allowable deviation (nm) between two
distances that are being compared.
print iprint
{0}
Regulates the quantity of output: larger
iprint gives more output to the screen and
to out.pri.
select isel
{0}
Rules details of the clustering:
isel = 0: discard
the structure with lowest energy
isel = 1: discard
the structure from the list with the lowest
fort number
isel = 2: discard
the structure from the list with the highest
fort number
intra intri(1)
intrj(1) intri(2)
intrj(2) (...)
Consider also intramolecular contacts between
atom pairs numbered intri
and intrj. These
pairs should be chosen to discriminate between several possible conformations
of flexible molecules; it is a probably superfluous safeguard. In the absence
of this card all intramolecular distances are omitted.
atom
This card, if present, must be the last one in
the file in.dcl
and must be followed by nrp
cards (one for each atom) containing:
- k:
a sequence number (1... nrp);
a negative k
denotes the last atom of a molecule.
panm: the atom name
type: an integer
number to indicate the atom type. These numbers can be arbitrarily assigned
between 0 and 99. If type
= 0 the corresponding atom is skipped.
In the absence of an
atom card, the program takes the types from
the topology file in.tp.
Lacking that, it searches for the pack-type
file fort.201 and
sets the type equal to the type found there. If that file is not present
either, then all atoms are given a different
type. Note that in the last two cases
all atoms are considered to belong to one molecule, and for cases with
Z">1 an atom card
is necessary.
This program calculates the energy of one or more single (crystal) structures, and optimizes their geometries. The input data are obtained from an spf, mdi or cssr type coordinate file. It is possible to expand the geometry to independent molecules in space group P1 or to consider more than one independent molecule in the asymmetric unit. Lattice dynamical calculations are also possible.
Execute script: runpp u/a s/d subst [coord] [ffname]
Input files:
in.tp (unit 20),
molecular topology, linked to upack/data/subst/subst.tpu
for united-atom force fields, or upack/data/subst/subst.tpa
for all-atom force fields.
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
(default
ffname=unitat3 or
ffname=opls, respectively)
in.pp (unit 13),
code-directed input, linked to subst.ppu
or subst.ppa,
respectively.
in.cor (unit 29),
one or more sets of coordinates, linked to
coord (default
coord=subst.cor)
charges.cssr (unit
14, optional): a cssr-type
file containing charges to superside the charges from the topology file
(if icoul = 2).
Output files:
out.pri (unit 16),
general results, linked to prep.pru
or prep.pra,
respectively
parfile (unit 15),
parameter summary
out.exp (unit 18),
experimental data to be retrieved by the search procedure
out.tgt (unit 19),
same data, but after energy minimization. For the experimental structure
this is a target function to be retrieved "exactly".
For every structure (identified with a five-digit
number #) there is the possibility for coordinate output in four formats
(unit 17), linked to #.spf,
#.mdi, #.dat,
#.cssr.
Variant for a free molecule
This is useful to prepare starting geometries for the search procedure.
Execute script: runpp subst [coord] [ffname]
File definitions, if different from above:
in.tp (unit 20),
molecular topology, linked to upack/data/subst/subst.tpu
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
(default ffname=gromnoh)
in.pp (unit 13),
code-directed input, linked to subst.pp
The output coordinates are transformed to the
reference axes system and written to the files 00000.spf
and 00000.mdi.
The input coordinate file
in.cor is now optional if a united-atom force
field is provided where sufficient improper dihedrals are available to
define all chiral centers and ring puckerings. In that case the parameter
coord is omitted, the target values for the
dihedral angles can be specified in in.pp
(default values 180
). It may be
necessary to specify a few more of these than would be minimally required.There
are two possibilities:
- if no energy minimization is done afterwards,
these values are enforced as well as possible.
- if energy minimization is done, the dihedral
angles are allowed to find the nearest minima for the free molecule. Hydroxyl
dihedral angles are usually irrelevant in this stage.
Only one group of molecules can be treated in
this way. Coordinate files containing groups in different conformations
must be made by repeated calls of the program, followed by cutting and
pasting.
The input data for both variants can be manipulated by cards in file in.pp:
spgr naamsp
{0}
If this card is absent or if
naamsp = 0 the calculation is done for a free
molecule. However, this information can be overridden by SPGR
cards on an spf-type
coordinate file, which may be different for each subsequent structure.
inner inner{2}
intra intra{1}
intra=0/1: disable/enable the calculation of
intramolecular nonbonded interactions
flex iflex
{1}
iflex = 0: rigid
molecules, only intermolecular nonbonded interactions
iflex = 1: fully
flexible molecules
iflex = 2: rigid
molecules with rotatable OH groups, only nonbonded interactions The option
naamsp = 0 is incompatible with
iflex = 2.
coul icoul
{1}
If icoul
= 0 no Coulomb interactions are calculated.
If icoul
= 2 the charges from the topology file are overruled by those from the
file in.cor (which
must then be in CSSR format); if that file does not provide charge
information, the file charges.cssr
is
used instead.
pres pres
{0}
The pressure in atmospheres.
ngr ngr
{0}
The number of independent groups of molecules,
Gi.
ngr >= 0: the coordinate
file need not be ordered, the non-hydrogen atoms are recognized from their
name in the topology file and the hydrogen atoms are recognized from the
distance to their carrying atom. All atoms must be given in the same order
for each group. If ngr
> 1 only the names of the first group are used. If
ngr = 0 the number of groups is determined
from the number of occurrences of the same sequence of names.
ngr < 0: the
coordinate list must contain - ngr
groups of molecules. Now that file must contain the atoms in exactly the
same order as in the topology file, and the atom names given in the coordinate
file are disregarded.
expand expand
{0} [nx ny nz]
{1 1 1}
If this code word is present, the coordinates
are expanded to P1. Random position disturbances (with maximum
expand) can be added to facilitate the minimization
leaving a saddle point; a reasonable value might be 0.0005. Optionally,
the structure is further expanded to nx, ny,
nz unit cells in the a, b, c directions,
respectively.
sd ncysd tolsd stpsd
nresd shmsd {100 0.01 0.0001 5 0.02}
cg ncycg tolcg stpcg
depcg shmcg {5000 0.0005 0.005 0.3 0.02}
print iprint
{0}
Minimal output on the screen and on
out.pri is given for iprint
= 0. A larger value gives more output.
test dtest
{0}
If dtest
> 0, the analytical energy derivatives are compared with numerically calculated
values (using parameter shifts
dtest).
reset
If this card is present, the C-H and O-H distances
are reset to values from the topology file.
cutoff cutoff[alphac alpha6 hmaxc hmax6 tole tolk]
pol npp [poltol schaal gamma nnn mmm nte nttt]
roth dihin(1...Nh)
Reset dihedrals for hydroxyl groups to
dihin (note: dihedral i corresponds
to
(Nz+i))
Vn Vn (1...Nh) in kcal/mol {0.3...0.3}
ntal ntal
(1...Nh) {3...3}
The latter two cards may be used if
iflex = 2 (cf.
pack2)
stc stcin(1...6)
{1 1 1 1 100 1}
Steepest descents weights for cell axes, cell
axis projections, centers of gravity, Euler angles, hydroxyl torsional
angles, and atomic coordinates, respectively.
files ifils ifilm
ifild ifilc {1 1 1 2}
For a free molecule only:
tors dihin(1...Nz+Nh)
{0...0}
Target values for dihedrals.
scan iscan dihin(iscan)
dscan escan
This card is only operative if tors
is present. The dihedral iscan is
scanned from the given value dihin(iscan)
(which
supersedes
the corresponding entry from tors)
in steps of dscan up
to and including escan.
If lattice vibration calculations are desired:
latvib [nq
...] {0}
If this keyword is present, lattice vibration
calculations are performed. To this end double precision is required; it
is advisable to do energy minimization first. The calculations are first
done for q=0, and subsequently for nq
vectors in q-space. Possible modes of input:
latvib n1 n2 n3
Calculations are done for nq=
n1
* n2 * n3q-vectors,
filling a regular grid in q-space.
latvib f1 f2 f3 nq
Calculations are done for the vectors defined
by f1,
f2,
f3
divided in nq
equal parts.
temp vtemp(1....)
{298, undefined ...}
The thermodynamical quantities (and the vibrational
amplitudes, if requested) are calculated at temperatures vtemp(1),
vtemp(2), vtemp(3), ...
celopt icopt dbox(1....)
{1
0.01}
if icopt
= 0, the cell axes and angles are kept fixed
if icopt
= 1, the cell axes and angles are optimized
if icopt
= 2, the cell axes and angles are optimized followed by a search for the
global minimum in the free energy. This is done by a grid search of 3 points
along each axis, separated by dbox(1).
For determination of the free energy lowering, this is usually sufficient.
For thermal expansion more iterations are needed, and to this end more
entries dbox(2...)
can be given. In that case the search is iterated once for every entry,
around the previous minimum using the corresponding new grid separation.
Since the minimum depends on temperature, this is only done for the first
temperature vtemp(1).
boxfix box(1...6)
This card starts the grid search for the global
free energy minimum directly at the given box parameters ax,
by,
cz,
bx,cx,
cy,
again for the first temperature only.
This program performs a molecular dynamics simulation for one structure. The coordinates of that structure are given in file in.xvi which should be in mdi format (the coordinates may be followed by the three velocities). After the simulation the output file out.xvo can be used as input file for a continuation run.
Execute script: runmd u/a s/d subst coord [ffname]
Input files:
in.tp (unit 20),
molecular topology, linked to upack/data/subst/subst.tpu
for united-atom force fields ,or to upack/data/subst/subst.tpa
for all-atom force fields.
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
(default
ffname=unitat3 or
ffname=opls, respectively)
in.md (unit 13),
code-directed input, linked to subst.mdu
or subst.mda,
respectively.
in.xvi (unit 21),
coordinates, linked to coord
Output files:
out.pri (unit 16),
general results, linked to md.pru
or md.pra, respectively
out.en (unit 15),
energy snapshots
out.co (unit 17),
coordinate snapshots
out.ve (unit 18),
velocity snapshots
out.xvo (unit 19),
final coordinates and velocities
out.ave (unit 14),
average results in MDI format
The input specifications are given in file in.md, using the same code words as under the option mdrun in program pack3. At least mdrunnstlim must be specified. Additional code words are now:
spgr ispgr
{0}
The number (not the name!) of the space group
ngr ngr
{1}
The number of independent molecules in the unit
cell
expand [nx ny nz]
{1 1 1}
If this card is present, the structure is expanded
to P1. Optionally, the structure is further expanded to nx,
ny, nz unit cells in the a, b, c directions,
respectively.
mdskip ig
{0}
The number of runs to be skipped before block
averages of the remaining ones are taken.
print iprint
{0}
2. P. Verwer and F. J. J. Leusen, Computer Simulation to Predict Possible Crystal Polymorphs. Rev. Comp. Chem. 12 (1998) 327-365.
3. B. P. van Eijck, W. T. M. Mooij and J. Kroon, Acta Cryst. B51 (1995) 99-103.
4. A. L. Spek, Acta Cryst. A46 (1990) C34.
5. Cerius2 (1997), Molecular Simulations Inc.
6. W. F. van Gunsteren and H. J. C. Berendsen, GROMOS, Groningen molecular simulation package, University of Groningen, 1987.
7. B. P. van Eijck and J. Kroon, J. Comput. Chem. 20 (1999) 799-812.
8. B. P. van Eijck and J. Kroon, J. Comput. Chem. 18 (1997) 1036-1042.
9. D. N. Bernardo, Y. Ding, K. Krogh-Jespersen and R. M. Levy, J. Phys. Chem. 98 (1994) 4180-4187.
10. D. E. Williams and T. L. Starr, Comput. Chem. 1 (1977) 173-177.
11. S. N. Ha, A. Giammona, M. Field and J. W. Brady, Carboh. Res. 180 (1988) 207-221.
12. M. L. C. E. Kouwijzer, B. P. van Eijck, S. J. Kroes and J. Kroon, J. Comput. Chem. 14 (1993) 1281-1289.
13. W. Damm, A. Frontera, J. Tirado-Rives and W. L. Jorgensen, J. Comput. Chem. 16 (1997) 1955-1970.
14. B. P. van Eijck and J. Kroon, J. Phys. Chem. B101 (1997) 1096-1100.
15. W. F. van Gunsteren and H. J. C. Berendsen, Angew. Chem. Int. Ed. Engl. 29 (1990) 992-1023.
16. B. P. van Eijck and J. Kroon, Acta Cryst. B56 (2000) 535-542.
17. B. P. van Eijck, J. Comput. Chem. 22 (2001) 816-826.
18. B. P. van Eijck, J. Comput. Chem. 23 (2002) 456-462.
19. B. P. van Eijck, Phys. Chem. Chem. Phys. 4 (2002) 4789-4794.