nuri.tools

TM-tools

from nuri.tools import tm as tmtools

This module provides ground-up reimplementation of TM-align algorithm based on the original TM-align code (version 20220412) by Yang Zhang. This implementation aims to reproduce the results of the original code while providing improved user interface and maintainability. Refer to the following paper for details of the algorithm. [1]

All input structures must have only single atom per residue (usually CA atom), as the original TM-align algorithm assumes this.

class nuri.tools.tm.TMAlign
__init__(self: TMAlign, query: object, templ: object, query_ss: str | None = None, templ_ss: str | None = None, *, gapless: bool = True, sec_str: bool = True, local_sup: bool = True, local_with_ss: bool = True, fragment_gapless: bool = True) None

Prepare TM-align algorithm with the given structures.

Parameters:
  • query – The query structure, in which each residue is represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (N, 3), where N is the number of residues.

  • templ – The template structure, in which each residue is represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (M, 3), where M is the number of residues.

  • query_ss – The secondary structure of the query structure. When provided, must be an ASCII string of length N.

  • templ_ss – The secondary structure of the template structure. When provided, must be an ASCII string of length M.

  • gapless – Enable gapless threading.

  • sec_str – Enable secondary structure assignment.

  • local_sup – Enable local superposition. Note that this is the most expensive initialization method due to the exhaustive pairwise distance calculation. Consider disabling this flag if alignment takes too long.

  • local_with_ss – Enable local superposition with secondary structure-based alignment.

  • fragment_gapless – Enable fragment gapless threading.

Raises:

ValueError

If:

  • The query or template structure has less than 5 residues.

  • The secondary structure of the query or template structure has a different length than the structure.

  • No initialization flag is set.

  • The initialization fails (for any other reason).

Note

If the secondary structure is not provided, it will be assigned using the approximate secondary structure assignment algorithm defined in the TM-align code. When both sec_str and local_with_ss flags are not set, the secondary structures are ignored.

static from_alignment(query: object, templ: object, alignment: object = None, keep_alignment: bool = True) TMAlign

Prepare TM-align algorithm with the given structures and user-provided alignment.

Parameters:
  • query – The query structure, in which each residue is represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (N, 3), where N is the number of residues.

  • templ – The template structure, in which each residue is represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (M, 3), where M is the number of residues.

  • alignment – Pairwise alignment of the query and template structures. Must be in a form representable as a 2D numpy array of shape (L, 2), in which rows must contain (query index, template index) pairs. If not provided, query and template must have same length and assumed to be aligned in order.

  • keep_alignment – Whether to keep the given alignment without pruning during realignment step (-i vs -I in the original TM-align program). Due to compatibility with the TM-score program (and thus tm_score()), this is set to True by default.

Returns:

A TMAlign object initialized with the given alignment.

Raises:

ValueError

If:

  • The query or template structure has less than 5 residues.

  • The alignment contains out-of-range indices.

  • Alignment is not provided and the query and template structures have different lengths.

  • The initialization fails (for any other reason).

Tip

When initialized with keep_alignment set to True, the result is equivalent to the “TM-score” program in the TM-tools suite.

Note

Duplicate values in alignment are not checked and may result in invalid alignment.

aligned_pairs(self: TMAlign) ndarray[numpy.int32]

Get pairwise alignment of the query and template structures.

Returns:

A 2D numpy array of shape (L, 2), where L is the number of aligned pairs. Each row is a (query index, template index) pair.

Tip

This will always return the same alignment once the TMAlign object is created.

Note

If the TMAlign object was created via from_alignment() with keep_alignment set to False, the returned pairs may differ from the input alignment.

This is because TM-align algorithm filters out pairs that are far apart when computing the final alignment. This behavior is, in turn, similar to the original TM-align program when run with the -i flag.

rmsd(self: TMAlign) float

The RMSD of the aligned pairs.

score(self: TMAlign, l_norm: int | None = None, *, d0: float | None = None) tuple[ndarray[numpy.float64], float]

Calculate TM-score using the current alignment.

Parameters:
  • l_norm – Length normalization factor. If not specified, the length of the template structure is used.

  • d0 – Distance scale factor. If not specified, calculated based on the length normalization factor.

Returns:

A pair of the transformation tensor and the TM-score of the alignment.

nuri.tools.tm.tm_align(query: object, templ: object, l_norm: int | None = None, query_ss: str | None = None, templ_ss: str | None = None, *, d0: float | None = None, gapless: bool = True, sec_str: bool = True, local_sup: bool = True, local_with_ss: bool = True, fragment_gapless: bool = True) tuple[ndarray[numpy.float64], float]

Run TM-align algorithm with the given structures and parameters.

Parameters:
  • query – The query structure, in which each residue is represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (N, 3), where N is the number of residues.

  • templ – The template structure, in which each residue is represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (M, 3), where M is the number of residues.

  • l_norm – Length normalization factor. If not specified, the length of the template structure is used.

  • query_ss – The secondary structure of the query structure. When provided, must be an ASCII string of length N.

  • templ_ss – The secondary structure of the template structure. When provided, must be an ASCII string of length M.

  • d0 – Distance scale factor. If not specified, calculated based on the length normalization factor.

  • gapless – Enable gapless threading.

  • sec_str – Enable secondary structure assignment.

  • local_sup – Enable local superposition. Note that this is the most expensive initialization method due to the exhaustive pairwise distance calculation. Consider disabling this flag if alignment takes too long.

  • local_with_ss – Enable local superposition with secondary structure-based alignment.

  • fragment_gapless – Enable fragment gapless threading.

Returns:

A pair of the transformation tensor and the TM-score of the alignment.

Raises:

ValueError

If:

  • The query or template structure has less than 5 residues.

  • The secondary structure of the query or template structure has a different length than the structure.

  • No initialization flag is set.

  • The initialization fails (for any other reason).

Tip

If want to calculate TM-score for multiple l_norm or d0 values, or want more details such as RMSD or aligned pairs, consider using the TMAlign object directly.

Note

If the secondary structure is not provided, it will be assigned using the approximate secondary structure assignment algorithm defined in the TM-align code. When both sec_str and local_with_ss flags are not set, the secondary structures are ignored.

nuri.tools.tm.tm_score(query: object, templ: object, alignment: object = None, l_norm: int | None = None, *, d0: float | None = None, keep_alignment: bool = True) tuple[ndarray[numpy.float64], float]

Run TM-align algorithm with the given structures and alignment. This is also known as the “TM-score” program in the TM-tools suite, from which the function got its name.

Parameters:
  • query – The query structure, in which residues are represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (N, 3) where N is the number of residues.

  • templ – The template structure, in which residues are represented by a single atom (usually CA). Must be representable as a 2D numpy array of shape (M, 3) where M is the number of residues.

  • alignment – Pairwise alignment of the query and template structures. Must be in a form representable as a 2D numpy array of shape (L, 2), in which rows must contain (query index, template index) pairs. If not provided, query and template must have same length and assumed to be aligned in order.

  • l_norm – Length normalization factor. If not specified, the length of the template structure is used.

  • d0 – Distance scale factor. If not specified, calculated based on the length normalization factor.

  • keep_alignment – Whether to keep the given alignment without pruning during realignment step (-i vs -I in the TM-align program). Due to compatibility with the TM-score program, this is set to True by default.

Returns:

A pair of the transformation tensor and the TM-score of the alignment.

Raises:

ValueError

If:

  • The query or template structure has less than 5 residues.

  • The alignment contains out-of-range indices.

  • Alignment is not provided and the query and template structures have different lengths.

  • The initialization fails (for any other reason).

Tip

If want to calculate TM-score for multiple l_norm or d0 values, or want more details such as RMSD or aligned pairs, consider using the TMAlign object directly.

Note

Duplicate values in alignment are not checked and may result in invalid alignment.

GAlign

This module provides a Python interface to the GAlign flexible molecular alignment algorithm. The paper describing the GAlign algorithm is under preparation and will be cited here once available.

class nuri.tools.galign.GAlign
__init__(self: GAlign, templ: Molecule, *, conf: int | None = None, vdw_scale: float = 0.8, hetero_scale: float = 0.7, dcut: int = 6) None

Prepare GAlign algorithm with the given template structure.

Parameters:
  • templ – The template structure. Must have at least 3 atoms and 3D coordinates.

  • conf – The conformation index to use as the template. If not provided, the first conformation is used.

  • vdw_scale – The scale factor for van der Waals radii when calculating shape overlap score.

  • hetero_scale – The scale factor for atom type mismatch when calculating shape overlap score.

  • dcut – The distance cutoff for neighbor search, in angstroms.

Raises:
  • ValueError – If the template structure has less than 3 atoms or no 3D conformation, or if invalid parameters are provided (e.g., negative dcut).

  • IndexError – If the provided conformation index is out of range.

align(self: GAlign, query: Molecule, flexible: bool = True, max_confs: int = 1, *, conf: int | None = None, max_translation: float = 2.5, max_rotation: float = 2.0943951023931953, max_torsion: float = 2.0943951023931953, rigid_min_msd: float = 9.0, rigid_max_confs: int = 4, pool_size: int = 10, sample_size: int = 30, max_generations: int = 50, patience: int = 5, n_mutation: int = 5, p_mutation: float = 0.5, opt_ftol: float = 0.01, opt_max_iters: int = 300) list[GAlignResult]

Align the given query molecule to the template structure.

Parameters:
  • query – The query molecule to be aligned. Must have at least one 3D conformation.

  • flexible – Whether to perform flexible alignment. When False, only rigid alignment is performed and the flexible alignment parameters are ignored.

  • max_confs – The maximum number of alignment results to return.

  • conf – The conformation index to use as the query structure. If not provided, the first conformation is used.

  • vdw_scale – The scale factor for van der Waals radii when calculating shape overlap score.

  • hetero_scale – The scale factor for atom type mismatch when calculating shape overlap score.

  • dcut – The distance cutoff for neighbor search, in angstroms.

  • max_translation – The maximum translation allowed during flexible alignment, in angstroms.

  • max_rotation – The maximum rotation allowed during flexible alignment, in radians.

  • max_torsion – The maximum torsion angle change allowed during flexible alignment, in radians.

  • rigid_min_rmsd – The minimum root-mean-squared deviation between different conformations to consider them as distinct during rigid alignment.

  • rigid_max_confs – The maximum number of conformations to consider for initial rigid alignment. Ignored if in rigid mode; set max_confs instead.

  • pool_size – The size of the population pool during flexible alignment.

  • sample_size – The number of new trial conformations to sample in each generation.

  • max_generations – The maximum number of generations to run.

  • patience – The number of generations to wait for improvement before early stopping.

  • n_mutation – The number of mutation operations to perform when generating new trial conformations.

  • p_mutation – The probability of mutation when generating new trial conformations.

  • opt_ftol – The function tolerance for the Nelder-Mead optimization.

  • opt_max_iters – The maximum number of iterations for the Nelder-Mead optimization.

Returns:

At most max_confs alignment results as a list of GAlignResult objects, sorted by their alignment scores in descending order.

Raises:
  • ValueError – If the query molecule has no 3D conformation, or if invalid parameters are provided (e.g., negative max_translation).

  • IndexError – If the provided conformation index is out of range.

class nuri.tools.galign.GAlignResult
property pos

A copy of the aligned conformation as a 2D numpy array of shape (N, 3), where N is the number of atoms in the query molecule.

property score

The alignment score (shape overlap) of this result.

nuri.tools.galign.galign(query: Molecule, templ: Molecule, flexible: bool = True, max_confs: int = 1, *, qconf: int | None = None, tconf: int | None = None, vdw_scale: float = 0.8, hetero_scale: float = 0.7, dcut: int = 6, max_translation: float = 2.5, max_rotation: float = 2.0943951023931953, max_torsion: float = 2.0943951023931953, rigid_min_msd: float = 9.0, rigid_max_confs: int = 4, pool_size: int = 10, sample_size: int = 30, max_generations: int = 50, patience: int = 5, n_mutation: int = 5, p_mutation: float = 0.5, opt_ftol: float = 0.01, opt_max_iters: int = 300) list[GAlignResult]

Align the given query molecule to the template structure.

Parameters:
  • query – The query molecule to be aligned. Must have at least one 3D conformation.

  • templ – The template structure. Must have at least 3 atoms and 3D coordinates.

  • flexible – Whether to perform flexible alignment. When False, only rigid alignment is performed and the flexible alignment parameters are ignored.

  • max_confs – The maximum number of alignment results to return.

  • qconf – The conformation index to use as the query structure. If not provided, the first conformation is used.

  • tconf – The conformation index to use as the template structure. If not provided, the first conformation is used.

  • vdw_scale – The scale factor for van der Waals radii when calculating shape overlap score.

  • hetero_scale – The scale factor for atom type mismatch when calculating shape overlap score.

  • dcut – The distance cutoff for neighbor search, in angstroms.

  • max_translation – The maximum translation allowed during flexible alignment, in angstroms.

  • max_rotation – The maximum rotation allowed during flexible alignment, in radians.

  • max_torsion – The maximum torsion angle change allowed during flexible alignment, in radians.

  • rigid_min_rmsd – The minimum root-mean-squared deviation between different conformations to consider them as distinct during rigid alignment.

  • rigid_max_confs – The maximum number of conformations to consider for initial rigid alignment. Ignored if in rigid mode; set max_confs instead.

  • pool_size – The size of the population pool during flexible alignment.

  • sample_size – The number of new trial conformations to sample in each generation.

  • max_generations – The maximum number of generations to run.

  • patience – The number of generations to wait for improvement before early stopping.

  • n_mutation – The number of mutation operations to perform when generating new trial conformations.

  • p_mutation – The probability of mutation when generating new trial conformations.

  • opt_ftol – The function tolerance for the Nelder-Mead optimization.

  • opt_max_iters – The maximum number of iterations for the Nelder-Mead optimization.

Returns:

At most max_confs alignment results as a list of GAlignResult objects, sorted by their alignment scores in descending order.

Raises:
  • ValueError – If the query or template molecule is invalid, or if any of the parameters are invalid (e.g., negative max_translation).

  • IndexError – If the provided conformation index is out of range.