align


SYNOPSIS

align  [type={ATOM | PHAR | MIXED | RANDOM}; defaults to ATOM]  \
    [{object_list=<comma/hyphen separated list or ALL>  \
    | most_active_percent=<percent of most active compounds used as alignment templates>; defaults to 25  \
    [ref_y_var=<reference Y variable used to find most active compounds>]  \
    [criterion={LOWEST | HIGHEST; defaults to HIGHEST}]}  \
    [hybrid={YES | NO; defaults to NO}]  \
    [merge={YES | NO; defaults to NO}]  \
    [template={SINGLE [keep_best_template={YES | NO}; defaults to NO] | MULTI}; defaults to SINGLE]  \
    [candidate={SINGLE [file=<SDF file with candidate conformations to be aligned>] | MULTI}; defaults to SINGLE]  \
    [{conf_dir=<directory from which SDF conformational databases are retrieved>; defaults to qmd_dir, if defined]  \
    | template_conf_dir=<directory from which SDF conformational databases to be used for template objects are retrieved>  \
    candidate_conf_dir=<directory from which SDF conformational databases to be used for candidate objects are retrieved>}]  \
    [align_dir=<directory where SDF databases of aligned compounds are stored>]


DESCRIPTION

Through the align keyword ligands can be aligned to one or more templates using different algorithms. Single- or multi-conformational alignments may be carried out; if multi-conformational candidate ligands are used (candidate=MULTI parameter), then the conformer which best fits the template(s) will be picked. If multi-conformational templates are chosen (template=MULTI parameter), then each of the available conformations will be used as template, generating an SDF database of aligned ligands for each of the template conformers. If a multi-conformational alignment is selected, either for the template(s) or for the candidates, then in addition to the SDF file imported through the import type=SDF command, SDF databases for each of the ligands must be available. Such conformational databases may have been generated by the qmd keyword or by means of third party software (e.g., MOE, OMEGA). Conformational databases filenames are identified by the object ID number (which usually coincides with the object number, except if objects have been removed using the remove_object keyword) formatted according to the %04d format as used by the printf function, e.g. 0001.sdf, 0002.sdf, etc. The folder where template and/or candidate conformational databases are located may be specified through the conf_dir parameter; it defaults to the qmd_dir if a QMD conformational search has been previously carried out with the qmd keyword in the same Open3DALIGN session (see the qmd keyword for more details). If one wishes to use different conformational databases for template and candidate objects (e.g., to use only a subset of conformations as templates), different folders may be specified for template and candidate databases through the template_conf_dir and candidate_conf_dir parameters, respectively.

If candidate=single it is possible to align an external SDF file (specified through the file parameter) instead of the currently loaded dataset, while using the latter only for templates.

If template=single and multiple template objects are chosen, then there is the possibility to specify whether, in addition to the individual conformational databases aligned on each of the template objects, also a "best" database where the best-fitting template for each of the candidate molecules is chosen among all available templates (keep_best_template=YES). This option is useful whenever multiple X-ray structures are available for a certain target protein with different co-crystallized ligands; in this case Open3DALIGN for each of the dataset molecules chooses the template which has the best-fit (i.e., is most reminiscent). In this case, it is very important that all the template objects be pre-aligned themselves, or the "best" databases will end up being costituted by multiple clusters, where single objects are consistently aligned inside the cluster but misaligned among different clusters.

The folder where aligned SDF databases will be put may be specified through the align_dir parameter; if not specified, the latter defaults to a folder named O3A.####.align_dir.######, where the #'s stand for random alphanumeric characters, located in the same folder from which the dataset SDF file was imported by the import type=SDF command or an Open3DALIGN .dat file was loaded with the load command. Objects to be used as templates may be specified as object numbers or object IDs through the object_list or id_list parameters, respectively. Alternatively, if the SDF database originally imported by the import type=SDF command or the Open3DALIGN .dat file loaded with the load command, included biological activities, a selected percentage of the templates may be selected by the most_active_percent parameter, having the highest (criterion=HIGHEST, the default) or the lowest (criterion=LOWEST) values of biological activity. If single-conformation, multiple templates are chosen, there is also the possibility to select, for each of the candidate objects, the template which gives the best alignment score (keep_best_template=YES); in this case the templates need to be themselves aligned, or the aligned candidates might occupy different regions of Cartesian space according to which template they have been aligned to.

Structure alignment can be carried out using different methods; namely, an atom-based method inspired to the LAMDA algorithm [1] (type=ATOM, the default), a pharmacophore-based method implemented using Pharao [2] routines (type=PHAR) or a combination of the latter two (type=MIXED). A fourth method, useful mostly for validation purposes, can be invoked with type=RANDOM: this parameter triggers scrambling of the currently loaded dataset by random rigid-body roto-translation of each structure.
Two Pharao-specific parameters can be specified, namely hybrid and merge. The hybrid parameter controls whether hybrid pharmacophores are generated (YES) or not (NO, the default) by merging closely positioned pharmacophoric centers; see the Pharao documentation for further details about hybrid pharmacophores. The merge parameter controls whether neighboring pharmacophore points of the same category should be merged (see the Pharao documentation for further details); this parameter may prove useful to save some CPU time when dealing with large molecules characterized by many pharmacophoric points.

Regarding templates, one may choose whether compounds belonging to the dataset should be best-fitted to selected compounds (that is, to all conformers available for each compound) through the object_list parameter, or rather to a percentage of the most active compounds belonging to the training set using the most_active_percent parameter. In the latter case, it is necessary to import into Open3DALIGN a text file with activity data by the import type=DEPENDENT keyword, or alternatively to include activity data in the SDF file imported by the import type=SDF command. It is possible to decide whether the most active compounds are the ones with the highest activity values (criterion=HIGHEST, the default) or rather the lowest (criterion=LOWEST).

By default, the align module operates in parallel fashion on multiprocessor machines, using all the CPUs available in the system; if one wishes to run the computation on a lower number of CPUs, this may be specified before calling align with the env n_cpus keyword.

Since alignments can be time-consuming, especially when large, multi-conformational datasets and multiple templates are used, unfinished runs which have been stopped by CTRL-C, logout, shutdown, etc. can be very easily restarted using the same keywords as in the interrupted run, simply specifying the align_dir where SDF aligned databases have been previously written; Open3DALIGN will be able to automatically restart the alignment from the point where it had been interrupted.

EXAMPLES

# the following command best-fits the currently loaded dataset with the atom-based method onto each of the 25% most active compounds used a templates; results are stored in a folder named O3A.####.align_dir.######
align

# the following command best-fits with the mixed method all conformations available for the currently loaded dataset (taken from the ligand_databases folder) onto the 10% most active compounds of the currently loaded dataset, using the currently loaded conformations as templates; results are stored in folder named O3A.####.align_dir.######. Most active compounds are those with the smallest activity values; all available CPUs are used
align  type=MIXED  most_active_percent=10  candidate=MULTI  \
    conf_dir=ligand_databases  criterion=LOWEST

# the following command best-fits with the mixed method all conformations available for the currently loaded dataset (taken from the candidate_databases folder) onto all conformers available for compounds 3, 7, 11, 17, 21 (taken from the template_databases folder), getting conformations from the SDF databases stored in the conformational_databases folder, storing results in the aligned_databases folder
align  type=MIXED  object_list=3,7,11,17,21  candidate=MULTI  candidate_conf_dir=candidate_databases  \
    template=MULTI  template_conf_dir=template_databases  align_dir=aligned_databases


REFERENCES

  1. Richmond, N. J.; Willett, P.; Clark, R. D. J. Mol. Graph. Model. 2004, 23, 199-209.   DOI
  2. Taminau, J.; Thijs, G.; De Winter, H. J. Mol. Graph. Model. 2008, 27, 161-169.   DOI

Sitemap
Print version
Contact
Mailing list


Last update:
May 31. 2015 20:39:55

Powered by
CMSimple - CMSimple-Styles


Get Open3DALIGN at SourceForge.net. Fast, secure and Free Open Source software downloads



Would you like to test your
alignment in a 3D-QSAR
model? Try Open3DQSAR
Just wish to compute a MIF?
Try Open3DGRID