Skip to content

Assigners

lahuta.core.assigners

Contains the abstract base class (ABC) for assigning atom types to proteins. It also contains two child classes that implement the abstract method from the ABC.

Classes:

Name Description
ProteinTypeAssignerBase

Abstract base class for assigning atom types to proteins.

VectorizedProteinTypeAssigner

Efficient, vectorized assignment of atom types.

LegacyProteinTypeAssigner

Traditional, loop-based assignment of atom types.

ProteinTypeAssignerBase

Bases: ABC

Abstract Base Class for assigning atom types to proteins.

This abstract base class (ABC) outlines the necessary structure and interface for child classes that handle the assignment of atom types to proteins.

Attributes:

Name Type Description
protein_ag AtomGroupType

Group of atoms in a protein that will be assigned atom types.

Child classes
VectorizedProteinTypeAssigner: Efficient, vectorized assignment of atom types.
LegacyProteinTypeAssigner: Traditional, loop-based assignment of atom types.
Source code in lahuta/core/assigners.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class ProteinTypeAssignerBase(ABC):
    """Abstract Base Class for assigning atom types to proteins.

    This abstract base class (ABC) outlines the necessary structure and interface for child classes that handle
    the assignment of atom types to proteins.

    Attributes:
        protein_ag (AtomGroupType): Group of atoms in a protein that will be assigned atom types.

    Child classes:
        ```
        VectorizedProteinTypeAssigner: Efficient, vectorized assignment of atom types.
        LegacyProteinTypeAssigner: Traditional, loop-based assignment of atom types.
        ```
    """

    def __init__(self, protein_ag: AtomGroupType) -> None:
        self.protein_ag = protein_ag

    @abstractmethod
    def compute(self, atypes_array: dok_matrix) -> dok_matrix:
        """Abstract method to compute atom types.

        Must be implemented by child classes.

        Args:
            atypes_array (dok_matrix): Sparse array of atom types.

        Raises:
            NotImplementedError: If not implemented by child class.
        """
        raise NotImplementedError

compute abstractmethod

compute(atypes_array)

Abstract method to compute atom types.

Must be implemented by child classes.

Parameters:

Name Type Description Default
atypes_array dok_matrix

Sparse array of atom types.

required

Raises:

Type Description
NotImplementedError

If not implemented by child class.

Source code in lahuta/core/assigners.py
46
47
48
49
50
51
52
53
54
55
56
57
58
@abstractmethod
def compute(self, atypes_array: dok_matrix) -> dok_matrix:
    """Abstract method to compute atom types.

    Must be implemented by child classes.

    Args:
        atypes_array (dok_matrix): Sparse array of atom types.

    Raises:
        NotImplementedError: If not implemented by child class.
    """
    raise NotImplementedError

VectorizedProteinTypeAssigner

Bases: ProteinTypeAssignerBase

Assigns atom types to proteins in a vectorized manner.

Child class of ProteinTypeAssignerBase that uses NumPy array manipulations for efficient assignment of atom types.

Source code in lahuta/core/assigners.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
class VectorizedProteinTypeAssigner(ProteinTypeAssignerBase):
    """Assigns atom types to proteins in a vectorized manner.

    Child class of ProteinTypeAssignerBase that uses NumPy array manipulations
    for efficient assignment of atom types.

    """

    def compute(self, atypes_array: dok_matrix) -> dok_matrix:
        """Compute atom types in a vectorized manner.

        Uses NumPy array manipulations for efficient assignment of atom types.

        Args:
            atypes_array (dok_matrix): Sparse array of atom types.

        Returns:
            (dok_matrix): Sparse array of assigned atom types.
        """
        resname_str = self.protein_ag.resnames.astype(str)
        atom_name_str = self.protein_ag.names.astype(str)
        atype_names = [member.lower() for member in list(ATypes)]

        atom_id_labels: NDArray[np.str_] = np.core.defchararray.add(
            np.core.defchararray.strip(resname_str),
            np.core.defchararray.strip(atom_name_str),
        )

        prot_atom_types_array = [list(PROT_ATOM_TYPES[key]) for key in atype_names]
        assert isinstance(atom_id_labels, np.ndarray)
        mask: NDArray[np.bool_] = np.array(
            [np.isin(atom_id_labels, prot_atom_types) for prot_atom_types in prot_atom_types_array]
        )

        true_indices = np.argwhere(mask)

        original_indices = self.protein_ag.indices[true_indices[:, 1]]
        atypes_array[original_indices, true_indices[:, 0]] = 1

        return atypes_array

compute

compute(atypes_array)

Compute atom types in a vectorized manner.

Uses NumPy array manipulations for efficient assignment of atom types.

Parameters:

Name Type Description Default
atypes_array dok_matrix

Sparse array of atom types.

required

Returns:

Type Description
dok_matrix

Sparse array of assigned atom types.

Source code in lahuta/core/assigners.py
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def compute(self, atypes_array: dok_matrix) -> dok_matrix:
    """Compute atom types in a vectorized manner.

    Uses NumPy array manipulations for efficient assignment of atom types.

    Args:
        atypes_array (dok_matrix): Sparse array of atom types.

    Returns:
        (dok_matrix): Sparse array of assigned atom types.
    """
    resname_str = self.protein_ag.resnames.astype(str)
    atom_name_str = self.protein_ag.names.astype(str)
    atype_names = [member.lower() for member in list(ATypes)]

    atom_id_labels: NDArray[np.str_] = np.core.defchararray.add(
        np.core.defchararray.strip(resname_str),
        np.core.defchararray.strip(atom_name_str),
    )

    prot_atom_types_array = [list(PROT_ATOM_TYPES[key]) for key in atype_names]
    assert isinstance(atom_id_labels, np.ndarray)
    mask: NDArray[np.bool_] = np.array(
        [np.isin(atom_id_labels, prot_atom_types) for prot_atom_types in prot_atom_types_array]
    )

    true_indices = np.argwhere(mask)

    original_indices = self.protein_ag.indices[true_indices[:, 1]]
    atypes_array[original_indices, true_indices[:, 0]] = 1

    return atypes_array

LegacyProteinTypeAssigner

Bases: ProteinTypeAssignerBase

Assign atom types to proteins using a loop-based method.

Child class of ProteinTypeAssignerBase that uses a traditional, loop-based approach for assignment of atom types.

Source code in lahuta/core/assigners.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
class LegacyProteinTypeAssigner(ProteinTypeAssignerBase):
    """Assign atom types to proteins using a loop-based method.

    Child class of ProteinTypeAssignerBase that uses a traditional, loop-based approach for assignment of atom types.

    """

    def compute(self, atypes_array: dok_matrix) -> dok_matrix:
        """Compute atom types using a loop-based method.

        Uses a traditional, loop-based approach for assignment of atom types.

        Args:
            atypes_array (dok_matrix): Sparse array of atom types.

        Returns:
            (dok_matrix): Sparse array of assigned atom types.
        """
        for residue in self.protein_ag.residues:
            for atom in residue.atoms:
                for atom_type in list(PROT_ATOM_TYPES.keys()):
                    atypes_array[atom.index, ATypes[atom_type.upper()]] = 0

                for atom_type, atom_ids in PROT_ATOM_TYPES.items():
                    atom_id = residue.resname.strip() + atom.name.strip()
                    if atom_id in atom_ids:
                        atypes_array[atom.index, ATypes[atom_type.upper()]] = 1

        return atypes_array

compute

compute(atypes_array)

Compute atom types using a loop-based method.

Uses a traditional, loop-based approach for assignment of atom types.

Parameters:

Name Type Description Default
atypes_array dok_matrix

Sparse array of atom types.

required

Returns:

Type Description
dok_matrix

Sparse array of assigned atom types.

Source code in lahuta/core/assigners.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def compute(self, atypes_array: dok_matrix) -> dok_matrix:
    """Compute atom types using a loop-based method.

    Uses a traditional, loop-based approach for assignment of atom types.

    Args:
        atypes_array (dok_matrix): Sparse array of atom types.

    Returns:
        (dok_matrix): Sparse array of assigned atom types.
    """
    for residue in self.protein_ag.residues:
        for atom in residue.atoms:
            for atom_type in list(PROT_ATOM_TYPES.keys()):
                atypes_array[atom.index, ATypes[atom_type.upper()]] = 0

            for atom_type, atom_ids in PROT_ATOM_TYPES.items():
                atom_id = residue.resname.strip() + atom.name.strip()
                if atom_id in atom_ids:
                    atypes_array[atom.index, ATypes[atom_type.upper()]] = 1

    return atypes_array