Bouke P.
van Eijck, Department of Crystal and Structural
Chemistry, Bijvoet Center for Biomolecular
Research, Utrecht University, Padualaan 8, 3584 CH
Utrecht, The Netherlands. Email: vaneyck@chem.uu.nl
or vaneijck@xs4all.nl.
Introduction
Procedure
Example: glucose
Detailed instructions for the search stage
This part
of the manual describes the grid search facility in UPACK. It is now largely
replaced by the random search method, but it may still be useful for some
special applications, for instance rigid molecules. Definitions and general
introductions are given in the main part of the
manual.
The grid search is only
possible for structures with one molecule in the asymmetric unit (G_{i}_{ }= 1, M_{g}
= 1).
A systematic search is carried out by the program pack1.
Each conformation is considered as a rigid molecule, usually using a
unitedatom force field. Hydroxyl groups with unknown orientation may be
treated as united OH atoms. The results are clustered by the program clus. This is a very fast
routine [1] which can only be used for structures with one
molecule in the asymmetric unit.
Explicit hydroxyl groups
can be introduced in the program pack2. Their
torsional angles are optimized, considering the molecule to be built from rigid
fragments. (Of course, totally rigid molecules can also be studied.)
Optionally, the calculation is followed by an energy minimization where all
atoms (or united pseudoatoms) are allowed to find their optimum positions.
Again, the results are clustered.
The nonrigid minimization
is continued with the program pack3, as
discussed in the main part of the manual.
If the experimental
structure is known, one may want to know whether or not it was found by the
packing procedure. To this end the programs zoek and dist are available, which can be used in
each stage of the analysis.
The programs were developed
aiming at structure prediction of carbohydrates. Here for the first stage (program
pack1) we recommend the standard GROMOS87
force field [2] with all charges set to zero, so hydrogen
atoms are effectively ignored. This is roughly compensated by using OA...OA nonbonded parameters that mimick
a hydrogen bond without the hydrogen atom (minimum energy 5.2 kcal/mol at 2.8 A). In this stage rigid molecules are used, so
only the nonbonded parameters are in effect. The
relevant force field file is gromnoh.ff,
with a cutoff radius of 8 A.
For a subsequent calculation
we recommend the force field UNITAT, which was developed from GROMOS87 by
optimizing geometric crystal data for 23 carbohydrate molecules [3].
It may be used in pack2 with a cutoff radius
of 7 A to save computer time (file unitat2.ff).
In subsequent stages Ewald summation is recommended (unitat3.ff).
These are unitedatom force
fields; a suitable allatom force field for carbohydrates is OPLS [4].
1. Decide which
conformations are to be used for crystal packing, and prepare a coordinate file
cor.# for each of them (# = 001, 002, 003,
...). Allowed are the spf,
mdi, and cssr formats.
2. First stage. Generate
structures (united atoms, no hydroxyl groups)
runpa1 u subst gromnoh (the actual
search, united atoms, no hydroxyl groups)
Output files are pack.001, pack.002, ... which must be clustered to pack.101, pack.102,...
by:
runcl 0
3. Second stage. Place
hydroxyl hydrogen atoms:
runpa2 u subst unitat2
Output files are pack.20 (parameters) and pack.29 (coordinates); the parameter file must be
clustered to pack.21:
runcl 20
4. Continue the energy
minimization. For instance:
runpa3 u 21 subst unitat3
Output files are pack.30 (parameters) and pack.39 (coordinates); the parameter file must be
clustered to pack.31:
runcl 30
5. Further refinement is
possible, files being numbered consecutively following the same principle. For
instance, to continue in an allatom force field:
runpa3 a 31 subst opls
runcl 40
with final results on files pack.41 and pack.49.
An important application is
the energy minimization of the experimental structure. This gives an indication
about the quality of the force field. Also, the energyminimized experimental
structure should be present in the parameter files from the search procedure.
As indicated in the main
part of the manual, the program prep can be
used to minimize the energy of a given individual crystal packing. Let the
experimental structure be given in the input file subst.cor. Run the program:
runpp u subst unitat3 (for
unitedatom topologies subst.tpu)
runpp a subst opls
(for allatom topologies subst.tpa)
Output coordinate files are given in various formats. In this case also the
files prep.exp and prep.tg# are produced, which contain a
summary of parameters for the input structure and the energyminimized
structure, respectively. These files should be placed in the subdirectory subst/exp. Then the search can be directly carried out
by the program zoek:
runzk filenum prep.tg#
where filenum gives
the number of the pack file (0 and 1
representing the sets of files pack.001, pack.002, ... and pack.101,
pack.102, ..., respectively).
The substance
is αDglucose, and the computing time is artificially and dishonestly
reduced by considering only eight Euler angles close to the experimental
structure rather than scanning all 504 orientations which would be considered
in a real search. Also the maximum number of structures used in the next stage
is unrealistically low.
Go to upack/data/glucsa. Reproduce the topology files by running:
runcod glucsa.con.u unitat
(allatom topology)
runcod glucsa.con.a opls
(unitedatom topology)
and compare the resulting files out.tp with glucsa.tpu and glucsa.tpa, respectively. Note the discrepancies for glucsa.tpu: the improper dihedrals have been adjusted manually for
this specific hexapyranose, and the improper
dihedrals are not automatically conform the GROMOS force field in the present
UPACK version.
Go to upack/data/glucsa/exp. In the database we find the neutron diffraction
structure GLUCSA10.CIF. Perform energy
minimization:
runpp u glucsa GLUCSA10.CIF unitat3
runpp a glucsa GLUCSA10.CIF opls
This creates the energyminimized structure in four 0001.* files.
Compare 0001.spf with prep.spf.u or prep.spf.a. There are also the target structures prep.tgu and prep.tga.
Now create a
subdirectory upack/data/glucsa/test, copy the files cor.* and glucsa.pa* from upack/data/glucsa/p212121, and run:
runpa1 u glucsa gromnoh (the actual
search, united atoms, no hydroxyl groups)
runcl 0 (clustering)
runpa2 u glucsa unitat2 (determine OH orientations)
runcl 20 (second clustering)
runpa3 u 21 glucsa unitat3
(first energy minimization, united atoms)
runcl 30 (final clustering)
runzk 31 prep.tgu (search for experimental unitedatom structure)
runpa3 a 31 glucsa opls (second energy
minimization, allatoms)
runcl 40 (last clustering)
runzk 41 prep.tga (search for experimental allatom structure)
This should
find the glucose structure. Alternatively, that structure can also be
recognized by:
rundist u 31 glucsa prep.spf.u
rundist a 41 glucsa prep.spf.a
Compare the
resulting files with the corresponding ones in upack/data/glucsa/p212121. There will probably be discrepancies, since the
results are sensitive to processordependent roundoff
errors. In this case
the rankings for OPLS are better than for UNITAT (about 3 and 8, respectively).
A number of
code words occurs in several programs. They were already discussed in the main
part of the manual, and are repeated here for convenience.
spgr naamsp
The space group name (upper or lower case, slashes may be omitted).
inner inner
A surface charge on polar crystals causes a shapedependent energy term.
Normally it is assumed that this term is compensated by external charges; this
corresponds to omitting the h=0 term in Ewald summation [5].
If the same situation is to be simulated without Ewald summation, a term
2πp^{2} / (3 V) must be added. In UPACK this is done
if inner > 0; if alphac > 0 inner
is set to zero.
coul icoul
If icoul = 0 no
Coulomb interactions are calculated.
intra intra
intra = 0/1: disable/enable the calculation of
intramolecular nonbonded interactions
pres pres
The pressure in atmospheres.
print iprint
Minimal output on the screen and on out.pri is given for iprint = 0. A larger value gives more
output.
test dtest
If dtest > 0,
the analytical energy derivatives are compared with numerically calculated
values (using parameter shifts dtest).
cutoff cutoff [alphac alpha6 hmaxc hmax6 tole tolk]
Overrides the cutoff value given in the force field file, and possibly also the
settings for Ewald summation.
pol npp [poltol schaal gamma nnn mmm nte nttt]
Include polarization, possibly overriding the default values given in the force
field file.
sd# ncy#sd tol#sd stp#sd
nre#sd shm#sd
Specifies steepest descents energy minimization in various stages (#) of a
program: the minimization is stopped after ncy#sd cycles or earlier if the energy decreases
less than tol#sd.
The initial step size is determined by stp#sd. A new pair list is made after nre#sd cycles or if any
parameter changes more than shm#sd.
The steepest descents minimization is usually followed by:
cg# ncy#cg
tol#cg stp#cg dep#cg shm#cg
Specifies conjugate gradients energy minimization in various stages (#) of a
program: the minimization is stopped after ncy#cg cycles, or earlier if the root mean square
value of the energy derivatives is less than tol#cg and a recalculation with a new pair
list does not change the energy more than dep#cg. The initial step size is determined by stp#cg. A restart with
a new pair list is made if any parameter changes more than shm#cg.
stc stcin
Weight parameter with which a certain derivative in the steepest descents
procedure is multiplied.
files ifils
ifilm ifild ifilc
Produce output files if these parameters are larger than 0:
ifils = 1: a
separate file (format spf)
is made for each structure
ifils = 2: the results
are also written to one continuous file (all.spf)
ifils = 3: the
results are only written to one continuous file (all.spf)
ifilm = 1: format mdi.
ifild = 1: format fdat.
ifilc = 1: format cssr.
ifilc = 2: format cssr, the charges are
written out too.
This program varies the
structure parameters and calculates the intermolecular energy for each
generated structure, assuming rigid molecules. The parameters can be varied by
performing a systematic grid search, or by taking random values. In both cases,
an arbitrary number of conformations can be studied consecutively.
Execute script:
runpa1 s/a/... subst ffname
Input files:
in.tp (unit 20), molecular topology,
linked to upack/data/subst/subst.tp#
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
in.pa (unit 13), codedirected input, linked
to subst.pa1
cor.001, cor.002, cor.003, ... (unit 14), coordinates for various
conformations.
Output files:
out.pri
(unit 16), general results, linked to pack.pr1
pack.# (unit 10), generated structure
parameters
Grid search
In the grid search a
maximum of twelve parameters is varied. We shall denote the lowest value, the
step size, and the highest value by:
lowest (step) highest.
First the Euler angles are
varied in the following way:
φ= phimin (dphi) phimax, θ = themin (dthe) themax, ψ= psimin (dψ) psimax
where dψ is (roughly) the input value dpsi divided by sin
θ. Since an equivalent crystal can be obtained by a rotation of 180° about
each of the three Cartesian axes, followed by a translation and resetting to
the standard orientation, the search range can be divided by four. For example, phimax = 360°, themax = 90° , psimax = 180° . If the free molecule has an nfold
rotational symmetry axis parallel to the zaxis, the φscan can be
limited to 360° / n.
The cell axes are also
scanned:
a_{x} = amin
(dax) amax, b_{y}
= bmin (dby) bmax, c_{z} = cmin (dcz) cmax.
Upper limits must be estimated according to the dimensions of the molecule and
the number of molecules in the unit cell. If spacegroup symmetry permits,
structures with a_{x} > b_{y}, b_{y}
> c_{z} and/or a_{x}
> c_{z} are skipped (as indicated
by itest in all.spg). It is strongly
recommended to use the density in order to calculate a_{x} from
the cell volume V: a_{x} = V / (b_{y }×
c_{z}). The density may be
experimentally known; if not, it can be estimated from an initial test run.
In the triclinic system the
cell angles, represented by the projections of the cell axes on the Cartesian
axes, are also varied:
b_{x}= (bxmin (dbx) bxmax) × a_{x}, c_{x}=
(cxmin (dcx) cxmax) × a_{x}, c_{y}=
(cymin (dcy) cymax) × b_{y}.
Upper limits are taken as 1. Undoubtedly, this generates too many equivalent
structures. For the monoclinic space groups only c_{x} is
varied. Here we have not been able to prove that every possibility for space
groups like P2_{1}/c and C2/c will always
be covered. Comments on this subject will be welcomed.
Finally, the centers of
gravity are varied:
X = (xmin (dx) xmax) × a_{x}, Y = (ymin (dy) ymax) × b_{y},
Z = (zmin (dz) zmax) × c_{z}
Again, this is not done for parameters that can be freely chosen in the space group
under consideration. For the upper limits ½ is sufficient: except in space
group Fdd2, a shift of half a cell axis is always possible.
Energy minimization
Only nonbonded
intermolecular energy is calculated. Each energy calculation is discontinued immediately
if any repulsion term exceeds the value ebig. This will occur in the majority of cases. If
not, and if the resulting energy is less than emax1,
a first cycle of energy minimization follows:
ncy1sd cycles of steepest descents and
ncy1cg cycles of conjugate gradients. If the resulting energy is less
than emax2, a new series of ncy2sd and ncy2cg
cycles of energy minimization is performed. For every set of Euler angles, all
results with energy less than emax3 are
clustered together and only the structures with lowest energy are retained.
After a final ncy3sd and ncy3cg cycles of energy minimization, the
structures are collected in output files pack.001, pack.002, pack.003,
...
Appropriate values for the
three emax#
parameters are chosen automatically (unless these values are provided by the
user). To this end a preliminary calculation is carried out starting from nmax# randomly chosen
points in parameter space for every conformation to be studied. Although in
principle every conformation has a different internal energy, the
intermolecular energies are considered only and taken as one ensemble. For each
threshold the corresponding value of emax# is chosen in such a way that a given fraction frac# of the energy
calculations leads to an energy lower than emax#. After the threshold values are determined,
the actual grid search is started.
Input file
The input data discussed
above can be manipulated by cards in file in.pa:
spgr naamsp {this card must always be present}
emax emax1 emax2 emax3
{only if no automatic adjustment is desired}
axes amin
dax amax bmin dby bmax
cmin dcz cmax
{0.4 0.15 2.5 0.4 0.15 2.5 0.4 0.15 2.5}
eulers phimin dphi phimax themin dthe
themax psimin dpsi psimax
{0 20 359 0 20 89 0 20 179}
angles bxmin
dbx bxmax cxmin dcx cxmax cymin dcy cymax
{0 0.2 0.999 0 0.2 0.999 0 0.2 0.999}
gravs xmin dx xmax ymin dy ymax
zmin dz zmax
{0 0.1 0.499 0 0.1 0.499 0 0.1 0.499}
sd1 ncy1sd tol1sd stp1sd
nre1sd shm1sd { 5 10
0.005 50 1}
sd2 ncy2sd tol2sd
stp2sd nre2sd shm2sd {10 1 0.002 50 1}
sd3 ncy3sd tol3sd
stp3sd nre3sd shm3sd {10 0.1 0.001 10 1}
cg1 ncy1cg tol1cg
stp1cg dep1cg shm1cg { 0 10 0.05 1 1}
cg2 ncy2cg tol2cg
stp2cg dep2cg shm2cg {10 1 0.05 1 1}
cg3 ncy3cg tol3cg
stp3cg dep3cg shm3cg {10 0.1 0.01 1 0.05}
stat nmax1 nmax2 nmax3
frac1 frac2 frac3
{20000 15000 10000 0.020 0.015 0.014}
ebig ebig {2000}
dexp dexp {100}
dexp > 0: The
experimental density dexp
(in gram cm^{3}) is used to find a_{x}.
dexp = 0: No such
restriction on a_{x}. However, this is usually very inefficient
since much time is wasted on unproductive starting structures.
dexp < 0: The
density is estimated from dexp
random structures (for each conformation) before starting the calculations. Its
value is found from the smallest volume encountered.
rand nrand {0}
If nrand > 0,
the search is based on nrand
random structures (for every conformation) rather than by a grid search.
Additional possible
input:
inner inner {0}
seed iseed {281197}
The integer iseed
initializes the random number sequence.
coul icoul {0}
pres pres {0}
print iprint {0}
test dtest {0}
cutoff cutoff [tole] {change default from force field
file; no Ewald summation}
We advise cutoff = 0.8, and tole = 0.1 for a force field
without charges. But watch out for a large number of error messages: it may be
necessary to increase cutoff and/or to
decrease tole.
clus dclin(1...4) {0.05 0.05 0.05 10}
Clustering limits for cell axes, cell axis projections, centers of gravity, and
Euler angles, respectively.
stc stcin(1...4) {1 1 1 1}
Steepest descents weights for the same four groups of parameters.
Output files
The files pack.001, pack.002,
... contain first data concerning space group symmetry, torsional degrees of freedom
and atomic coordinates of the free molecule. For explanation of the symbols see
the space group and topology files. The following lines occur:
 ispgr icode
Space group number and sequence code. The sequence code starts here at 1 and is
increased by 1 in every subsequent calculation.  ndihz ndihh: N_{z}, N_{h}
(= 0). For every dihedral 1...N_{z}:
 iome jome kome lome
omega
 nrp ireftr(1...3):
A_{g}, then the three numbers of the reference atoms. For every
atom i = 1... nrp:
 i panm
iac xr(1...3),
where panm is the
atom name corresponding to the topology file, iac is the sequence number of the atom type code,
and xr gives the
Cartesian coordinates x_{i}, y_{i},
z_{i} in nm.
 cel(1...12): starting
values for the parameters
 iref(1...12) to control
the optimization of the 12 parameters
After these initial data,
each line characterizes a certain structure:

code1: the number of times this structure occurred before clustering
code2: N_{1}, the sequence
number in this file
en4: the final nonbonded
energy (kJ/mol)
par(1...): a_{x}, b_{y},
c_{z}, b_{x},
c_{x}, c_{y}, X, Y, Z, φ,
θ, ψ, unless constrained by symmetry (i.e,
the corresponding iref
must not be zero). Lengths in pm, angles in degrees multiplied by 10.
ntry: sequence
number in the search
en1 en2 en3 vol: the
energy values at the three thresholds, and the cell volume (all multiplied by
1000 to obtain an integer value).
Clustering yields the files pack.101, pack.102,
... which have a similar structure. Only code1
now gives the sequence number in the clustered file (code2
still refers to N_{1}) and the five data following the
parameters are replaced by one: the number of structures that have been
clustered together for this entry.
This program scans the
possible orientations of hydroxyl groups that were left undetermined in pack1. This is a semirigid energy minimization,
where only the hydroxyl dihedral angles and the crystallographic parameters are
optimized. In the end a full structure relaxation may be performed.
Execute script:
runpa2 s/a/... subst ffname
Input files:
in.tp (unit 20), molecular topology,
linked to upack/data/subst/subst.tp#
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
in.pa (unit 13), codedirected input, linked
to subst.pa2
pack.101, pack.102,
...(unit 10), structure parameters as produced by
pack1.
Output files:
out.pri
(unit 16), general results, linked to pack.pr2
out.par (unit 20),
generated structure parameters, linked to pack.20
out.cor (unit 29),
generated coordinates, linked to pack.29
For every structure (identified with a fivedigit number #) there is the
possibility for coordinate output in four formats (unit 17), linked to #.spf, #.mdi, #.dat, #.cssr.
Preliminary energy
calculations are done for various settings of the N_{h}
hydroxyl groups: each of the dihedral angles is varied systematically by
internal rotations about 360/nsteph
degrees. The starting angles dihin
can be chosen for testing purposes, but are normally set to zero since they are
unknown in an actual search. A barrier to internal rotation Vn with periodicity ntal can be specified,
the value from the force field being inoperative in the semirigid calculations.
To speed up the calculations only Coulomb energies are calculated, so a force
field with a zero van der Waals energy from hydrogen atoms should be used.
After this preliminary
calculation only structures with energy less than
emax1 are retained. Then ncy1sd
cycles of steepest descents energy minimization are performed. If this brings
the energy below emax2, ncy1cg cycles of conjugate gradient minimization
follow. In this stage a certain dihedral angle is often reached more than once,
so the results are clustered with a tolerance of tclus degrees.
A second semirigid energy
minimization (ncy2sd and ncy2cg cycles) may be performed for the remaining
structures with an energy less than emax3.
Finally, the energies can be further minimized with full geometry relaxation (ncy3sd and ncy3cg
cycles). In any case, the final energy refers to a complete force field,
including all intramolecular contributions.
The threshold energies can
be determined automatically before the actual calculation. To this end the first
nmax1 structures are read, and emax1 is adjusted to retain approximately f × nmax1
structures after the preliminary calculation. Here, f is a number which is larger than 1 but smaller
than the total number of generated structures which is the product of the N_{h} factors nsteph. It is found that f should increase with N_{h}
and it is taken as f = frac1 × N_{h},
where frac1 is an adjustable parameter.
Likewise, the threshold emax2 is determined by asking that it should be
passed by a fraction frac2 of nmax2 structures that have passed emax1. Finally, emax3
is adjusted in the same way to retain a fraction
frac3 of nmax3 structures that have
passed emax2. Since these limits are found
from the first structures on the input file, which are biased towards low energy,
the statistics may come out not too well, and after the actual calculation a
revised estimate is given to start again in case not enough structures have
been produced. The energy range that makes the difference between obtaining
insufficient structures and wasting computer time in producing an unmanageable
multitude is rather small!
Variant for geometry
optimization only.
Alternatively, the program
can skip the search for hydroxyl group orientations and only perform semirigid
and/or complete geometry optimizations. This is useful for molecules without
hydroxyl groups or for continued energy minimization of the clustered results
from the previous run. No threshold energies are operative.
Execute script:
runpa2 u/a/... filenum subst ffname
Input files:
For ease of discussion, we
take filenum = 21, the
result of a previous run with pack2 followed
by clustering. Then:
in.tp (unit 20), molecular topology,
linked to upack/data/subst/subst.tp#
in.spg (unit 11),
space group, linked to upack/tops/all.spg
in.ff (unit 12),
force field, linked to upack/tops/ffname.ff
in.pa (unit 13), codedirected input, linked
to subst.pa3
in.par (unit 10),
parameter file, linked to pack.21
pack.101, pack.102,
...(unit 10), structure parameters as produced by
pack1, are still necessary in this variant.
Output files: as above, but:
out.pri
(unit 16), general results, is now linked to
pack.pr3
out.par (unit 20),
generated structure parameters, is now linked to
pack.30
out.cor (unit 29),
generated coordinates, is now linked to pack.39
Input file
The input data discussed
above can be manipulated by cards in file in.pa:
steph nsteph(1...N_{h})
{3...3}
roth dihin(1...N_{h})
{0...0}
Vn Vn (1...N_{h}) in kcal/mol {0.3...0.3}
ntal ntal (1...N_{h}) {3...3}
clus tclus {10}
emax emax1 emax2 emax3
{only if no automatic adjustment is desired}
sd1 ncy1sd tol1sd stp1sd
nre1sd shm1sd {25 0.6
0.005 8 10}
sd2 ncy2sd tol2sd
stp2sd nre2sd shm2sd { 0 0.1 0.001 10 0.02}
sd3 ncy3sd tol3sd
stp3sd nre3sd shm3sd {10 0.1 0.001 10 0.02}
cg1 ncy1cg tol1cg
stp1cg dep1cg shm1cd {10 0.1 0.01 10 0.02}
cg2 ncy2cg tol2cg
stp2cg dep2cg shm2cg { 0 0.5 0.01 0.3 0.02}
cg3 ncy3cg tol3cg
stp3cg dep3cg shm3cg {10 0.01 0.01 0.3 0.02}
stat nmax1 nmax2 nmax3
frac1 frac2 frac3 {1000
500 100 10 0.1 0.9}
Additional possible input:
inner inner {0}
intra intra {1}
coul icoul {1}
pres pres {0}
print iprint {0}
maxinp maxinp {2000}
The maximum number of input structures to be read from each file pack.101, pack.102,
... To select certain structures from these files, set
maxinp < 0 and then give on the same line
any desired quantity of sequence numbers N_{1}. (As this option
is normally only used for testing with one file, these numbers are simply
selected for every input file.)
cutoff cutoff [alphac alpha6 hmaxc hmax6 tole tolk]
The default values from unitat2.ff are 0.7 0
0 0 0 0 0: no Ewald summation and a very small cutoff radius. This is fast, but
may cause instabilities in the energy minimization: watch out for a large
number of error messages and, if in doubt, increasecutoff.
pol npp [poltol schaal gamma nnn mmm nte nttt]
test dtest1 dtest2 {0 0}
Test derivatives for semirigid and fullbody minimization, respectively.
stc stcin(1...6) {1 1 1 1 100 1}
Steepest descents weights for cell axes, cell axis projections, centers of
gravity, Euler angles, hydroxyl torsional angles, and atomic coordinates,
respectively.
files ifils
ifilm ifild ifilc
{0 0 0 0}
Output files
The file pack.20 contains first data concerning space group
symmetry, torsional degrees of freedom and atomic coordinates of the free
molecule. This is quite similar to the output from
pack1, but note that only the coordinates from
pack.001 are given in pack.20. After
these initial data, each line again characterizes a certain structure:

code1: N_{1}, the sequence number from pack1.
The same value may occur more than once: in groups, since one structure can
give rise to quite a few different combinations of hydroxyl dihedrals, and with
intervals, since pack.101, pack.102, ... are combined to this one file.
code2: N_{2}, the sequence
number in this file.
This is kept as identification number in all subsequent calculations.
epot: the energy
(kJ/mol)
par(1...): a_{x}, b_{y},
c_{z}, b_{x},
c_{x}, c_{y}, X, Y, Z,
φ, θ, ψ, unless constrained by symmetry (i.e,
the corresponding iref
must not be zero). Lengths in pm, angles in degrees multiplied by 10.
omega(1...N_{h}+N_{z}):
torsional angles (degrees)
idum en1 en2 en3: the ranking in the input file, and
the energy values at the three thresholds (multiplied by 1000 to obtain an
integer value).
imult: the number
of structures that have been clustered together for this entry in previous
runs. (It is set to 1 for the first call of pack2).
Clustering yields the file pack.21 which has the same structure. Only code1 now gives the sequence number in the
clustered file (code2 still refers to N_{2})
and the four data following the torsional angles are replaced by two:
imult1: the number of structures that have
been collected together in this clustering
imult2: the cumulative number of structures
that have been clustered together for this entry in previous runs. (It is set
to 1 after the first call of pack2 since
clustered sets may well originate from one structure made in pack1).
The file pack.29 contains Cartesian atomic coordinates for
every structure:

icod2 box(1...6): N_{2}, a_{x}, b_{y},
c_{z}, b_{x},
c_{x}, c_{y}
 x(1...3A_{g}): x_{1},
y_{1}, z_{1}, x_{2}, y_{2},
z_{2}, ...
This program removes
identical solutions from pack files and
prepares new files, sorted to energy. A preselection may be done, which uses a
straightforward comparison of cell parameters and torsional angles, but in many
cases this turns out to take more computer time than is gained later. The main
selection is done following the method of Ref. [1]. It is
limited to structures with one molecule in the asymmetric unit, and the program
is not (yet) suitable for Fcentered lattices. For a more general problem the
(slower) program dist
should be used.
Execute script:
runcl 0 (to cluster pack.0#
to pack.1#), or, for instance:
runcl 30 (to cluster pack.30
to pack.31), etc
Input files:
in.spg
(unit 11), space group, linked to upack/tops/all.spg
in.clu (unit 13), codedirected
input, linked to all.cl
pack file(s) (unit 10)
Output files:
out.pri
(unit 16), linked to clus.pr
pack file(s) (unit 20). The format of these
output files has been discussed above: see pack1
and pack2.
cluslist (unit 18)
contains the remaining structures, sorted to energy.
Input from keyboard:
ifs
ifs < 2: read all files pack.001, pack.002,
...
write to files pack.101, pack.102, ...
ifs > 1: read the file pack. ifs
write to file pack.(ifs+1)
The calculation can be
manipulated by cards in file in.clu:
tol toln tolb tolz tolh
Maximum allowable deviations:
toln {0.20} from
integer value in matrix N
tolb {0.05} in
cell parameters and centers of gravity
tolz {10} in
dihedrals without Hatoms
tolh {30} in
dihedrals with Hatoms
pre [der(1...4)]
Do preselection, with maximum allowable crystallographic deviations:
der(1) {0.10} in cell axes (a_{x},
b_{y}, c_{z})
der(2) {0.08} in orthogonality (b_{x}, c_{x}, c_{y})
der(3) {0.08} in centers of gravity (X,
Y, Z)
der(4) {5} Euler angles φ, θ,
ψ
If the card pre is not present, no
preselection is done.
sym nrot {1} msym(1...3), msym(1...3), ....
Effects of symmetry in the basis molecule: every msym(1...3) is +1 or 1, depending on the sign
change in each coordinate upon executing that symmetry operation. The molecular
zaxis in the free molecule has nrotfold rotational symmetry.
sort isort {1}
If isort = 0,
input data are not sorted to energy. This is useful if data from different
force fields are clustered.
sel n1 n2 {1
100000}
Select input data: n1 is the number of the
first entry to be read from every input pack
file, n2 the last one.
maxinp maxinp {50000}
Maximum number of entries to be considered after sorting; must be less than mxdata (which is
default 100000, defined in file params).
block nblok {100000}
The clustering is done in blocks with nblok entries, which is necessary if there are more
structures then mxdata.
One might lower the value of nblok
to gain some computing time. The results are stored in scratch files fort.21, fort.22,
..., from which they are retrieved in a final sorting and clustering operation
of at most mxdata
structures.
main imain {1}
imain = 0: omit
the main selection.
imain = 1: omit
the main selection only when making scratch files
imain = 2: do the main
selection in every stage.
The last option will produce the largest number of structures, but at the cost
of some computer time  and that many structures cannot be managed anyway.
print iprint {0}
form iform {0}
iform = 0/1: use
unformatted/formatted scratch files.
scr iscrfl {0}
If scratch files are already prepared, use them up to and including file number iscrfl.
This program compares
solutions from pack files with the
experimental structure. The algorithm is described in Ref. [1].
It is limited to structures with one molecule in the asymmetric unit, and the
program is not (yet) suitable for Fcentered lattices. For a more general
problem the (slower) program dist
should be used.
Execute script:
runzk 0/1 coord (to search in all
files pack.0# /
pack.1#) or:
runzk filenum coord
(to search in an other specified pack file)
Input files:
in.spg
(unit 11), space group, linked to upack/tops/all.spg
in.exp (unit 12),
data for the experimental structure, linked to coord
(If coord is not
found in the working directory, it is sought in upack/data/subst/exp)
in.zk (unit 13),
codedirected input, linked to all.zk
pack file(s) (unit 10)
Output files:
zoek
file(s) (unit 20): a packtype parameter file
with only the hits.
Input from keyboard:
ifs
ifs = 0: read all files pack.001, pack.002,
...
ifs = 1: read all files pack.101, pack.102,
...
ifs > 1: read the file pack. ifs
Output of the "hits" in the same format to corresponding files zoek.#.
Input from file in.exp:
This file, which may be
prepared by program prep, must contain the
three following lines:

box(1...6): experimental crystal cell data, a_{x}, b_{y},
c_{z}, b_{x},
c_{x}, c_{y}
 box(7...12): experimental crystal contents
data, X, Y, Z, φ, θ, ψ
 ntau tau(1... ntau): experimental
dihedral angles ω (1...N_{z}+N_{h}),
if present.
The calculation can be
manipulated by cards in file in.zk:
tol toln tolb tolz tolh
Maximum allowable deviations:
toln {0.40} from
integer value in matrix N
tolb {0.20} in cell
parameters and centers of gravity
tolz {30} in
dihedrals without Hatoms
tolh {60} in
dihedrals with Hatoms
The default tolerances are very large, so as not to miss any conceivable
candidate. Therefore, each "hit" should be inspected for its quality.
If toln < 0, tolz = 60, tolh > 180 all
structures with one specific conformation (determined by in.exp) are selected.
sym nrot {1} msym(1...3), msym(1...3), ....
Effects of symmetry in the basis molecule: every msym(1...3) is +1 or 1, depending on the sign
change in each coordinate upon executing that symmetry operation. The molecular
zaxis in the free molecule has nrotfold rotational symmetry.
print iprint {1}
1. B. P. van Eijck and J. Kroon, J. Comput. Chem. 18 (1997) 10361042.
2. W. F. van Gunsteren and H. J. C. Berendsen, GROMOS, Groningen molecular simulation package,
University of Groningen, 1987.
3. B. P. van Eijck and J. Kroon, J. Comput. Chem. 20 (1999) 799812.
4. W. Damm, A. Frontera,
J. TiradoRives and W. L. Jorgensen, J. Comput. Chem.
16 (1997) 19551970.
5. B. P. van Eijck and J. Kroon, J.
Phys. Chem. B101 (1997) 10961100.