THE UPACK PROGRAM PACKAGE - GRID SEARCH

Bouke P. van Eijck, Department of Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands. E-mail: vaneyck@chem.uu.nl or vaneijck@xs4all.nl.
 

Contents

Introduction
Procedure
Example: glucose

Detailed instructions for the search stage

Program pack1

Program pack2

Program clus

Program zoek

References

Introduction

This part of the manual describes the grid search facility in UPACK. It is now largely replaced by the random search method, but it may still be useful for some special applications, for instance rigid molecules. Definitions and general introductions are given in the main part of the manual.

The grid search is only possible for structures with one molecule in the asymmetric unit (Gi = 1, Mg = 1).
A systematic search is carried out by the program pack1. Each conformation is considered as a rigid molecule, usually using a united-atom force field. Hydroxyl groups with unknown orientation may be treated as united OH atoms. The results are clustered by the program clus. This is a very fast routine [1] which can only be used for structures with one molecule in the asymmetric unit.

Explicit hydroxyl groups can be introduced in the program pack2. Their torsional angles are optimized, considering the molecule to be built from rigid fragments. (Of course, totally rigid molecules can also be studied.) Optionally, the calculation is followed by an energy minimization where all atoms (or united pseudo-atoms) are allowed to find their optimum positions. Again, the results are clustered.

The non-rigid minimization is continued with the program pack3, as discussed in the main part of the manual.

If the experimental structure is known, one may want to know whether or not it was found by the packing procedure. To this end the programs zoek and dist are available, which can be used in each stage of the analysis.

The programs were developed aiming at structure prediction of carbohydrates. Here for the first stage (program pack1) we recommend the standard GROMOS87 force field [2] with all charges set to zero, so hydrogen atoms are effectively ignored. This is roughly compensated by using OA...OA nonbonded parameters that mimick a hydrogen bond without the hydrogen atom (minimum energy 5.2 kcal/mol at 2.8 A). In this stage rigid molecules are used, so only the nonbonded parameters are in effect. The relevant force field file is gromnoh.ff, with a cutoff radius of 8 A.

For a subsequent calculation we recommend the force field UNITAT, which was developed from GROMOS87 by optimizing geometric crystal data for 23 carbohydrate molecules [3]. It may be used in pack2 with a cutoff radius of 7 A to save computer time (file unitat2.ff). In subsequent stages Ewald summation is recommended (unitat3.ff).

These are united-atom force fields; a suitable all-atom force field for carbohydrates is OPLS [4].
 

Procedure

1. Decide which conformations are to be used for crystal packing, and prepare a coordinate file cor.# for each of them (# = 001, 002, 003, ...). Allowed are the  spf, mdi, and cssr formats.

2. First stage. Generate structures (united atoms, no hydroxyl groups)
runpa1 u subst gromnoh (the actual search, united atoms, no hydroxyl groups)
Output files are pack.001, pack.002, ... which must be clustered to pack.101, pack.102,... by:
runcl 0

3. Second stage. Place hydroxyl hydrogen atoms:
runpa2 u subst unitat2
Output files are pack.20 (parameters) and pack.29 (coordinates); the parameter file must be clustered to pack.21:
runcl 20

4. Continue the energy minimization. For instance:
runpa3 u 21 subst unitat3
Output files are pack.30 (parameters) and pack.39 (coordinates); the parameter file must be clustered to pack.31:
runcl 30

5. Further refinement is possible, files being numbered consecutively following the same principle. For instance, to continue in an all-atom force field:
runpa3 a 31 subst opls
runcl 40
with final results on files pack.41 and pack.49.

The experimental structure

An important application is the energy minimization of the experimental structure. This gives an indication about the quality of the force field. Also, the energy-minimized experimental structure should be present in the parameter files from the search procedure.

As indicated in the main part of the manual, the program prep can be used to minimize the energy of a given individual crystal packing. Let the experimental structure be given in the input file subst.cor. Run the program:
runpp u subst unitat3 (for united-atom topologies subst.tpu)
runpp a subst opls (for all-atom topologies subst.tpa)
Output coordinate files are given in various formats. In this case also the files prep.exp and prep.tg#  are produced, which contain a summary of parameters for the input structure and the energy-minimized structure, respectively. These files should be placed in the subdirectory subst/exp. Then the search can be directly carried out by the program zoek:
runzk filenum prep.tg#
where filenum gives the number of the pack file (0 and 1 representing the sets of files pack.001, pack.002, ... and pack.101, pack.102, ..., respectively).

Example: glucose

The substance is α-D-glucose, and the computing time is artificially and dishonestly reduced by considering only eight Euler angles close to the experimental structure rather than scanning all 504 orientations which would be considered in a real search. Also the maximum number of structures used in the next stage is unrealistically low.

Go to upack/data/glucsa. Reproduce the topology files by running:
runcod glucsa.con.u unitat (all-atom topology)
runcod glucsa.con.a opls    (united-atom topology)
and compare the resulting files out.tp with glucsa.tpu and glucsa.tpa, respectively. Note the discrepancies for glucsa.tpu: the improper dihedrals have been adjusted manually for this specific hexapyranose, and the improper dihedrals are not automatically conform the GROMOS force field in the present UPACK version.

Go to upack/data/glucsa/exp. In the database we find the neutron diffraction structure GLUCSA10.CIF. Perform energy minimization:
runpp u glucsa GLUCSA10.CIF unitat3
runpp a glucsa GLUCSA10.CIF opls
This creates the energy-minimized structure in four 0001.* files. Compare  0001.spf with prep.spf.u or prep.spf.a. There are also the target structures prep.tgu and prep.tga.

Now create a subdirectory upack/data/glucsa/test, copy the files cor.* and glucsa.pa* from upack/data/glucsa/p212121, and run:
runpa1  u glucsa gromnoh (the actual search, united atoms, no hydroxyl groups)
runcl 0 (clustering)
runpa2 u glucsa unitat2 (determine OH orientations)
runcl 20 (second clustering)
runpa3 u 21 glucsa unitat3 (first energy minimization, united atoms)
runcl 30 (final clustering)
runzk 31 prep.tgu (search for experimental united-atom structure)
runpa3 a 31 glucsa opls (second energy minimization, all-atoms)
runcl 40 (last clustering)
runzk 41 prep.tga (search for experimental all-atom structure)

This should find the glucose structure. Alternatively, that structure can also be recognized by:
rundist u 31 glucsa prep.spf.u
rundist a 41 glucsa prep.spf.a

Compare the resulting files with the corresponding ones in upack/data/glucsa/p212121. There will probably be discrepancies, since the results are sensitive to processor-dependent roundoff errors. In this case the rankings for OPLS are better than for UNITAT (about 3 and 8, respectively).

Detailed instructions for the grid search

A number of code words occurs in several programs. They were already discussed in the main part of the manual, and are repeated here for convenience.

spgr naamsp
The space group name (upper or lower case, slashes may be omitted).

inner inner
A surface charge on polar crystals causes a shape-dependent energy term. Normally it is assumed that this term is compensated by external charges; this corresponds to omitting the h=0 term in Ewald summation [5]. If the same situation is to be simulated without Ewald summation, a term 2πp2 / (3 V) must be added. In UPACK this is done if inner > 0; if alphac > 0 inner is set to zero.

coul icoul
If icoul = 0 no Coulomb interactions are calculated.

intra intra
intra = 0/1: disable/enable the calculation of intramolecular nonbonded interactions

pres pres
The pressure in atmospheres.

print iprint
Minimal output on the screen and on out.pri is given for iprint = 0. A larger value gives more output.

test dtest
If dtest > 0, the analytical energy derivatives are compared with numerically calculated values (using parameter shifts dtest).

cutoff cutoff [alphac alpha6 hmaxc hmax6 tole tolk]
Overrides the cutoff value given in the force field file, and possibly also the settings for Ewald summation.

pol npp [poltol schaal gamma nnn mmm nte nttt]
Include polarization, possibly overriding the default values given in the force field file.

sd# ncy#sd tol#sd stp#sd nre#sd shm#sd
Specifies steepest descents energy minimization in various stages (#) of a program: the minimization is stopped after ncy#sd cycles or earlier if the energy decreases less than tol#sd. The initial step size is determined by stp#sd. A new pair list is made after nre#sd cycles or if any parameter changes more than shm#sd. The steepest descents minimization is usually followed by:

cg# ncy#cg tol#cg stp#cg dep#cg shm#cg
Specifies conjugate gradients energy minimization in various stages (#) of a program: the minimization is stopped after ncy#cg cycles, or earlier if the root mean square value of the energy derivatives is less than tol#cg and a recalculation with a new pair list does not change the energy more than dep#cg. The initial step size is determined by stp#cg. A restart with a new pair list is made if any parameter changes more than shm#cg.

stc stcin
Weight parameter with which a certain derivative in the steepest descents procedure is multiplied.

files ifils ifilm ifild ifilc
Produce output files if these parameters are larger than 0:
ifils = 1: a separate file (format spf) is made for each structure
ifils = 2: the results are also written to one continuous file (all.spf)
ifils = 3: the results are only written to one continuous file (all.spf)
ifilm = 1: format mdi.
ifild = 1: format fdat.
ifilc = 1: format cssr.
ifilc = 2: format cssr, the charges are written out too.
 

Program pack1

This program varies the structure parameters and calculates the intermolecular energy for each generated structure, assuming rigid molecules. The parameters can be varied by performing a systematic grid search, or by taking random values. In both cases, an arbitrary number of conformations can be studied consecutively.

Execute script: runpa1  s/a/...  subst  ffname

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tp#
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff
in.pa (unit 13), code-directed input, linked to subst.pa1
cor.001, cor.002, cor.003, ... (unit 14), coordinates for various conformations.

Output files:

out.pri (unit 16), general results, linked to pack.pr1
pack.# (unit 10), generated structure parameters

Grid search

In the grid search a maximum of twelve parameters is varied. We shall denote the lowest value, the step size, and the highest value by:
lowest (step) highest.

First the Euler angles are varied in the following way:
φ= phimin (dphi) phimax, θ = themin (dthe) themax, ψ= psimin () psimax
where is (roughly) the input value dpsi divided by sin θ. Since an equivalent crystal can be obtained by a rotation of 180° about each of the three Cartesian axes, followed by a translation and resetting to the standard orientation, the search range can be divided by four. For example, phimax = 360
°, themax = 90° , psimax = 180° . If the free molecule has an n-fold rotational symmetry axis parallel to the z-axis, the φ-scan can be limited to 360° / n.

The cell axes are also scanned:
ax = amin (dax) amax, by = bmin (dby) bmax, cz = cmin (dcz) cmax.
Upper limits must be estimated according to the dimensions of the molecule and the number of molecules in the unit cell. If space-group symmetry permits, structures with ax > by, by > cz and/or ax > cz are skipped (as indicated by itest in all.spg). It is strongly recommended to use the density in order to calculate ax from the cell volume V: ax = V / (by × cz). The density may be experimentally known; if not, it can be estimated from an initial test run.

In the triclinic system the cell angles, represented by the projections of the cell axes on the Cartesian axes, are also varied:
bx= (bxmin (dbx) bxmax) × ax, cx= (cxmin (dcx) cxmax) × ax, cy= (cymin (dcy) cymax) × by.
Upper limits are taken as 1. Undoubtedly, this generates too many equivalent structures. For the monoclinic space groups only cx is varied. Here we have not been able to prove that every possibility for space groups like P21/c and C2/c will always be covered. Comments on this subject will be welcomed.

Finally, the centers of gravity are varied:
X = (xmin (dx) xmax) × ax, Y = (ymin (dy) ymax) × by, Z = (zmin (dz) zmax) × cz
Again, this is not done for parameters that can be freely chosen in the space group under consideration. For the upper limits ½ is sufficient: except in space group Fdd2, a shift of half a cell axis is always possible.

Energy minimization

Only nonbonded intermolecular energy is calculated. Each energy calculation is discontinued immediately if any repulsion term exceeds the value ebig. This will occur in the majority of cases. If not, and if the resulting energy is less than emax1, a first cycle of energy minimization follows: ncy1sd cycles of steepest descents and ncy1cg cycles of conjugate gradients. If the resulting energy is less than emax2, a new series of ncy2sd and ncy2cg cycles of energy minimization is performed. For every set of Euler angles, all results with energy less than emax3 are clustered together and only the structures with lowest energy are retained. After a final ncy3sd and ncy3cg cycles of energy minimization, the structures are collected in output files pack.001, pack.002, pack.003, ...

Appropriate values for the three emax# parameters are chosen automatically (unless these values are provided by the user). To this end a preliminary calculation is carried out starting from nmax# randomly chosen points in parameter space for every conformation to be studied. Although in principle every conformation has a different internal energy, the intermolecular energies are considered only and taken as one ensemble. For each threshold the corresponding value of emax# is chosen in such a way that a given fraction frac# of the energy calculations leads to an energy lower than emax#. After the threshold values are determined, the actual grid search is started.

Input file

The input data discussed above can be manipulated by cards in file in.pa:

spgr naamsp {this card must always be present}

emax emax1 emax2 emax3 {only if no automatic adjustment is desired}

axes amin dax amax bmin dby bmax cmin dcz cmax
{0.4 0.15 2.5 0.4 0.15 2.5 0.4 0.15 2.5}

eulers phimin dphi phimax themin dthe themax psimin dpsi psimax
{0 20 359 0 20 89 0 20 179}

angles bxmin dbx bxmax cxmin dcx cxmax cymin dcy cymax
{0 0.2 0.999 0 0.2 0.999 0 0.2 0.999}

gravs xmin dx xmax ymin dy ymax zmin dz zmax
{0 0.1 0.499 0 0.1 0.499 0 0.1 0.499}

sd1 ncy1sd tol1sd stp1sd nre1sd shm1sd { 5 10 0.005 50 1}
sd2 ncy2sd tol2sd stp2sd nre2sd shm2sd {10 1 0.002 50 1}
sd3 ncy3sd tol3sd stp3sd nre3sd shm3sd {10 0.1 0.001 10 1}
cg1 ncy1cg tol1cg stp1cg dep1cg shm1cg { 0 10 0.05 1 1}
cg2 ncy2cg tol2cg stp2cg dep2cg shm2cg {10 1 0.05 1 1}
cg3 ncy3cg tol3cg stp3cg dep3cg shm3cg {10 0.1 0.01 1 0.05}

stat nmax1 nmax2 nmax3 frac1 frac2 frac3
{20000 15000 10000 0.020 0.015 0.014}

ebig ebig {2000}

dexp dexp {-100}
dexp > 0: The experimental density dexp (in gram cm-3) is used to find ax.
dexp = 0: No such restriction on ax. However, this is usually very inefficient since much time is wasted on unproductive starting structures.
dexp < 0: The density is estimated from -dexp random structures (for each conformation) before starting the calculations. Its value is found from the smallest volume encountered.

rand nrand {0}
If nrand > 0, the search is based on nrand random structures (for every conformation) rather than by a grid search.

Additional possible input:

inner inner {0}

seed iseed {281197}
The integer iseed initializes the random number sequence.

coul icoul {0}

pres pres {0}

print iprint {0}

test dtest {0}

cutoff cutoff [tole] {change default from force field file; no Ewald summation}
We advise cutoff = 0.8, and tole = 0.1 for a force field without charges. But watch out for a large number of error messages: it may be necessary to increase cutoff and/or to decrease tole.

clus dclin(1...4) {0.05 0.05 0.05 10}
Clustering limits for cell axes, cell axis projections, centers of gravity, and Euler angles, respectively.

stc stcin(1...4) {1 1 1 1}
Steepest descents weights for the same four groups of parameters.

Output files

The files pack.001, pack.002, ... contain first data concerning space group symmetry, torsional degrees of freedom and atomic coordinates of the free molecule. For explanation of the symbols see the space group and topology files. The following lines occur:

- ispgr icode
Space group number and sequence code. The sequence code starts here at 1 and is increased by 1 in every subsequent calculation. - ndihz ndihh: Nz, Nh (= 0). For every dihedral 1...Nz:
- iome jome kome lome omega

- nrp ireftr(1...3): Ag, then the three numbers of the reference atoms. For every atom i = 1... nrp:
- i panm iac xr(1...3),
where panm is the atom name corresponding to the topology file, iac is the sequence number of the atom type code, and xr gives the Cartesian coordinates xi, yi, zi in nm.

- cel(1...12): starting values for the parameters

- iref(1...12) to control the optimization of the 12 parameters

After these initial data, each line characterizes a certain structure:

- code1: the number of times this structure occurred before clustering
code2: N1, the sequence number in this file
en4: the final nonbonded energy (kJ/mol)
par(1...): ax, by, cz, bx, cx, cy, X, Y, Z, φ, θ, ψ, unless constrained by symmetry (i.e, the corresponding iref must not be zero). Lengths in pm, angles in degrees multiplied by 10.
ntry: sequence number in the search
en1 en2 en3 vol: the energy values at the three thresholds, and the cell volume (all multiplied by 1000 to obtain an integer value).

Clustering yields the files pack.101, pack.102, ... which have a similar structure. Only code1 now gives the sequence number in the clustered file (code2 still refers to N1) and the five data following the parameters are replaced by one: the number of structures that have been clustered together for this entry.
 
 

Program pack2

This program scans the possible orientations of hydroxyl groups that were left undetermined in pack1. This is a semi-rigid energy minimization, where only the hydroxyl dihedral angles and the crystallographic parameters are optimized. In the end a full structure relaxation may be performed.

Execute script: runpa2  s/a/...  subst  ffname

Input files:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tp#
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff
in.pa (unit 13), code-directed input, linked to subst.pa2
pack.101, pack.102, ...(unit 10), structure parameters as produced by pack1.

Output files:

out.pri (unit 16), general results, linked to pack.pr2
out.par (unit 20), generated structure parameters, linked to pack.20
out.cor (unit 29), generated coordinates, linked to pack.29
For every structure (identified with a five-digit number #) there is the possibility for coordinate output in four formats (unit 17), linked to #.spf, #.mdi, #.dat, #.cssr.

Preliminary energy calculations are done for various settings of the Nh hydroxyl groups: each of the dihedral angles is varied systematically by internal rotations about 360/nsteph degrees. The starting angles dihin can be chosen for testing purposes, but are normally set to zero since they are unknown in an actual search. A barrier to internal rotation Vn with periodicity ntal can be specified, the value from the force field being inoperative in the semi-rigid calculations. To speed up the calculations only Coulomb energies are calculated, so a force field with a zero van der Waals energy from hydrogen atoms should be used.

After this preliminary calculation only structures with energy less than emax1 are retained. Then ncy1sd cycles of steepest descents energy minimization are performed. If this brings the energy below emax2, ncy1cg cycles of conjugate gradient minimization follow. In this stage a certain dihedral angle is often reached more than once, so the results are clustered with a tolerance of tclus degrees.

A second semi-rigid energy minimization (ncy2sd and ncy2cg cycles) may be performed for the remaining structures with an energy less than emax3. Finally, the energies can be further minimized with full geometry relaxation (ncy3sd and ncy3cg cycles). In any case, the final energy refers to a complete force field, including all intramolecular contributions.

The threshold energies can be determined automatically before the actual calculation. To this end the first nmax1 structures are read, and emax1 is adjusted to retain approximately f  × nmax1 structures after the preliminary calculation. Here, f  is a number which is larger than 1 but smaller than the total number of generated structures which is the product of the Nh factors nsteph. It is found that f  should increase with Nh and it is taken as  f = frac1 × Nh, where frac1 is an adjustable parameter.

Likewise, the threshold emax2 is determined by asking that it should be passed by a fraction frac2 of nmax2 structures that have passed emax1. Finally, emax3 is adjusted in the same way to retain a fraction frac3 of nmax3 structures that have passed emax2. Since these limits are found from the first structures on the input file, which are biased towards low energy, the statistics may come out not too well, and after the actual calculation a revised estimate is given to start again in case not enough structures have been produced. The energy range that makes the difference between obtaining insufficient structures and wasting computer time in producing an unmanageable multitude is rather small!
 
 

Variant for geometry optimization only.

Alternatively, the program can skip the search for hydroxyl group orientations and only perform semi-rigid and/or complete geometry optimizations. This is useful for molecules without hydroxyl groups or for continued energy minimization of the clustered results from the previous run. No threshold energies are operative.

Execute script: runpa2  u/a/...  filenum  subst  ffname

Input files:

For ease of discussion, we take filenum = 21, the result of a previous run with pack2 followed by clustering. Then:

in.tp (unit 20), molecular topology, linked to upack/data/subst/subst.tp#
in.spg (unit 11), space group, linked to upack/tops/all.spg
in.ff (unit 12), force field, linked to upack/tops/ffname.ff
in.pa (unit 13), code-directed input, linked to subst.pa3
in.par (unit 10), parameter file, linked to pack.21
pack.101, pack.102, ...(unit 10), structure parameters as produced by pack1, are still necessary in this variant.

Output files: as above, but:

out.pri (unit 16), general results, is now linked to pack.pr3
out.par (unit 20), generated structure parameters, is now linked to pack.30
out.cor (unit 29), generated coordinates, is now linked to pack.39

Input file

The input data discussed above can be manipulated by cards in file in.pa:

steph nsteph(1...Nh) {3...3}

roth dihin(1...Nh) {0...0}

Vn Vn (1...Nh) in kcal/mol {0.3...0.3}

ntal ntal (1...Nh) {3...3}

clus tclus {10}

emax emax1 emax2 emax3 {only if no automatic adjustment is desired}

sd1 ncy1sd tol1sd stp1sd nre1sd shm1sd {25 0.6 0.005 8 10}
sd2 ncy2sd tol2sd stp2sd nre2sd shm2sd { 0 0.1 0.001 10 0.02}
sd3 ncy3sd tol3sd stp3sd nre3sd shm3sd {10 0.1 0.001 10 0.02}
cg1 ncy1cg tol1cg stp1cg dep1cg shm1cd {10 0.1 0.01 10 0.02}
cg2 ncy2cg tol2cg stp2cg dep2cg shm2cg { 0 0.5 0.01 0.3 0.02}
cg3 ncy3cg tol3cg stp3cg dep3cg shm3cg {10 0.01 0.01 0.3 0.02}

stat nmax1 nmax2 nmax3 frac1 frac2 frac3 {1000 500 100 10 0.1 0.9}

Additional possible input:

inner inner {0}

intra intra {1}

coul icoul {1}

pres pres {0}

print iprint {0}

maxinp maxinp {2000}
The maximum number of input structures to be read from each file pack.101, pack.102, ... To select certain structures from these files, set maxinp < 0 and then give on the same line any desired quantity of sequence numbers N1. (As this option is normally only used for testing with one file, these numbers are simply selected for every input file.)

cutoff cutoff [alphac alpha6 hmaxc hmax6 tole tolk]
The default values from unitat2.ff are 0.7 0 0 0 0 0 0: no Ewald summation and a very small cutoff radius. This is fast, but may cause instabilities in the energy minimization: watch out for a large number of error messages and, if in doubt, increasecutoff.

pol npp [poltol schaal gamma nnn mmm nte nttt]

test dtest1 dtest2 {0 0}
Test derivatives for semi-rigid and full-body minimization, respectively.

stc stcin(1...6) {1 1 1 1 100 1}
Steepest descents weights for cell axes, cell axis projections, centers of gravity, Euler angles, hydroxyl torsional angles, and atomic coordinates, respectively.

files ifils ifilm ifild ifilc {0 0 0 0}

Output files

The file pack.20 contains first data concerning space group symmetry, torsional degrees of freedom and atomic coordinates of the free molecule. This is quite similar to the output from pack1, but note that only the coordinates from pack.001 are given in pack.20. After these initial data, each line again characterizes a certain structure:

- code1: N1, the sequence number from pack1.
The same value may occur more than once: in groups, since one structure can give rise to quite a few different combinations of hydroxyl dihedrals, and with intervals, since pack.101, pack.102, ... are combined to this one file.
code2: N2, the sequence number in this file.
This is kept as identification number in all subsequent calculations.
epot: the energy (kJ/mol)
par(1...): ax, by, cz, bx, cx, cy, X, Y, Z, φ, θ, ψ, unless constrained by symmetry (i.e, the corresponding iref must not be zero). Lengths in pm, angles in degrees multiplied by 10.
omega(1...Nh+Nz): torsional angles (degrees)
idum en1 en2 en3: the ranking in the input file, and the energy values at the three thresholds (multiplied by 1000 to obtain an integer value).
imult: the number of structures that have been clustered together for this entry in previous runs. (It is set to 1 for the first call of pack2).

Clustering yields the file pack.21 which has the same structure. Only code1 now gives the sequence number in the clustered file (code2 still refers to N2) and the four data following the torsional angles are replaced by two:

imult1: the number of structures that have been collected together in this clustering
imult2: the cumulative number of structures that have been clustered together for this entry in previous runs. (It is set to 1 after the first call of pack2 since clustered sets may well originate from one structure made in pack1).

The file pack.29 contains Cartesian atomic coordinates for every structure:

- icod2 box(1...6): N2, ax, by, cz, bx, cx, cy
- x(1...3Ag): x1, y1, z1, x2, y2, z2, ...
 
 

Program clus

This program removes identical solutions from pack files and prepares new files, sorted to energy. A preselection may be done, which uses a straightforward comparison of cell parameters and torsional angles, but in many cases this turns out to take more computer time than is gained later. The main selection is done following the method of Ref. [1]. It is limited to structures with one molecule in the asymmetric unit, and the program is not (yet) suitable for F-centered lattices. For a more general problem the (slower) program dist should be used.

Execute script:
runcl 0 (to cluster pack.0# to pack.1#), or, for instance:
runcl 30 (to cluster pack.30 to pack.31), etc

Input files:

in.spg (unit 11), space group, linked to upack/tops/all.spg
in.clu (unit 13), code-directed input, linked to all.cl
pack file(s) (unit 10)

Output files:

out.pri (unit 16), linked to clus.pr
pack file(s) (unit 20). The format of these output files has been discussed above: see pack1 and pack2.
cluslist (unit 18) contains the remaining structures, sorted to energy.
 

Input from keyboard:

ifs
ifs < 2: read all files pack.001, pack.002, ...
write to files pack.101, pack.102, ...
ifs > 1: read the file pack. ifs
write to file pack.(ifs+1)

The calculation can be manipulated by cards in file in.clu:

tol toln tolb tolz tolh
Maximum allowable deviations:
toln {0.20} from integer value in matrix N
tolb {0.05} in cell parameters and centers of gravity
tolz {10} in dihedrals without H-atoms
tolh {30} in dihedrals with H-atoms

pre [der(1...4)]
Do preselection, with maximum allowable crystallographic deviations:
der(1) {0.10} in cell axes (ax, by, cz)
der(2) {0.08} in orthogonality (bx, cx, cy)
der(3) {0.08} in centers of gravity (X, Y, Z)
der(4) {5} Euler angles φ, θ, ψ
If the card pre is not present, no preselection is done.

sym nrot {1} msym(1...3), msym(1...3), ....
Effects of symmetry in the basis molecule: every msym(1...3) is +1 or -1, depending on the sign change in each coordinate upon executing that symmetry operation. The molecular z-axis in the free molecule has nrot-fold rotational symmetry.

sort isort {1}
If isort = 0, input data are not sorted to energy. This is useful if data from different force fields are clustered.

sel n1 n2 {1 100000}
Select input data: n1 is the number of the first entry to be read from every input pack file, n2 the last one.

maxinp maxinp {50000}
Maximum number of entries to be considered after sorting; must be less than mxdata (which is default 100000, defined in file params).

block nblok {100000}
The clustering is done in blocks with nblok entries, which is necessary if there are more structures then mxdata. One might lower the value of nblok to gain some computing time. The results are stored in scratch files fort.21, fort.22, ..., from which they are retrieved in a final sorting and clustering operation of at most mxdata structures.

main imain {1}
imain = 0: omit the main selection.
imain = 1: omit the main selection only when making scratch files
imain = 2: do the main selection in every stage.
The last option will produce the largest number of structures, but at the cost of some computer time - and that many structures cannot be managed anyway.

print iprint {0}

form iform {0}
iform = 0/1: use unformatted/formatted scratch files.

scr iscrfl {0}
If scratch files are already prepared, use them up to and including file number iscrfl.
 
 

Program zoek

This program compares solutions from pack files with the experimental structure. The algorithm is described in Ref. [1]. It is limited to structures with one molecule in the asymmetric unit, and the program is not (yet) suitable for F-centered lattices. For a more general problem the (slower) program dist should be used.

Execute script:
runzk 0/1 coord (to search in all files pack.0# / pack.1#) or:
runzk filenum coord (to search in an other specified pack file)

Input files:

in.spg (unit 11), space group, linked to upack/tops/all.spg
in.exp (unit 12), data for the experimental structure, linked to coord
(If coord is not found in the working directory, it is sought in upack/data/subst/exp)
in.zk (unit 13), code-directed input, linked to all.zk
pack file(s) (unit 10)

Output files:

zoek file(s) (unit 20): a pack-type parameter file with only the hits.

Input from keyboard:

ifs
ifs = 0: read all files pack.001, pack.002, ...
ifs = 1: read all files pack.101, pack.102, ...
ifs > 1: read the file pack. ifs
Output of the "hits" in the same format to corresponding files zoek.#.

Input from file in.exp:

This file, which may be prepared by program prep, must contain the three following lines:

- box(1...6): experimental crystal cell data, ax, by, cz, bx, cx, cy
- box(7...12): experimental crystal contents data, X, Y, Z, φ, θ, ψ
- ntau tau(1... ntau): experimental dihedral angles ω (1...Nz+Nh), if present.

The calculation can be manipulated by cards in file in.zk:

tol toln tolb tolz tolh
Maximum allowable deviations:
toln {0.40} from integer value in matrix N
tolb {0.20} in cell parameters and centers of gravity
tolz {30} in dihedrals without H-atoms
tolh {60} in dihedrals with H-atoms
The default tolerances are very large, so as not to miss any conceivable candidate. Therefore, each "hit" should be inspected for its quality.
If toln < 0, tolz = 60, tolh > 180 all structures with one specific conformation (determined by in.exp) are selected.

sym nrot {1} msym(1...3), msym(1...3), ....
Effects of symmetry in the basis molecule: every msym(1...3) is +1 or -1, depending on the sign change in each coordinate upon executing that symmetry operation. The molecular z-axis in the free molecule has nrot-fold rotational symmetry.

print iprint {1}
 
 

References


1. B. P. van Eijck and J. Kroon, J. Comput. Chem. 18 (1997) 1036-1042.
2. W. F. van Gunsteren and H. J. C. Berendsen, GROMOS, Groningen molecular simulation package, University of Groningen, 1987.
3. B. P. van Eijck and J. Kroon, J. Comput. Chem. 20 (1999) 799-812.
4. W. Damm, A. Frontera, J. Tirado-Rives and W. L. Jorgensen, J. Comput. Chem. 16 (1997) 1955-1970.
5. B. P. van Eijck and J. Kroon, J. Phys. Chem. B101 (1997) 1096-1100.
 

Back to beginning