Skip to content

Matchers

lahuta.core.matchers

The SMARTS pattern matching classes are used to match SMARTS patterns to atoms in a molecule. This is how we assign atom types to molecules.

Classes:

Name Description
SmartsMatcherBase

Abstract base class for SMARTS pattern matching.

SmartsMatcher

Sequential SMARTS pattern matching.

ParallelSmartsMatcher

Parallel SMARTS pattern matching.

SmartsMatcherBase

Bases: ABC

A base class for different implementations of SMARTS pattern matching on molecules.

This abstract class needs to be inherited by any class that implements SMARTS pattern matching. The subclass must implement the compute method.

Source code in lahuta/core/matchers.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class SmartsMatcherBase(ABC):
    """A base class for different implementations of SMARTS pattern matching on molecules.

    This abstract class needs to be inherited by any class that implements SMARTS pattern matching.
    The subclass must implement the compute method.
    """

    def __init__(self, n_atoms: int) -> None:
        self.n_atoms = n_atoms

    @abstractmethod
    def compute(self, mol: MolType) -> dok_matrix:
        """Abstract method for SMARTS pattern matching.

        Args:
            mol (MolType): A molecule object to match patterns on.

        Raises:
            NotImplementedError: This is an abstract method that needs to be implemented in the subclass.

        Returns:
            (dok_matrix): A sparse matrix of atom types that match the SMARTS patterns in the given molecule.
        """
        raise NotImplementedError("Subclasses must implement this method")

compute abstractmethod

compute(mol)

Abstract method for SMARTS pattern matching.

Parameters:

Name Type Description Default
mol MolType

A molecule object to match patterns on.

required

Raises:

Type Description
NotImplementedError

This is an abstract method that needs to be implemented in the subclass.

Returns:

Type Description
dok_matrix

A sparse matrix of atom types that match the SMARTS patterns in the given molecule.

Source code in lahuta/core/matchers.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@abstractmethod
def compute(self, mol: MolType) -> dok_matrix:
    """Abstract method for SMARTS pattern matching.

    Args:
        mol (MolType): A molecule object to match patterns on.

    Raises:
        NotImplementedError: This is an abstract method that needs to be implemented in the subclass.

    Returns:
        (dok_matrix): A sparse matrix of atom types that match the SMARTS patterns in the given molecule.
    """
    raise NotImplementedError("Subclasses must implement this method")

SmartsMatcher

Bases: SmartsMatcherBase

Matches SMARTS patterns to atoms in a molecule.

This class performs sequential SMARTS pattern matching on atoms in a molecule. It inherits from the SmartsMatcherBase abstract base class.

Source code in lahuta/core/matchers.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class SmartsMatcher(SmartsMatcherBase):
    """Matches SMARTS patterns to atoms in a molecule.

    This class performs sequential SMARTS pattern matching on atoms in a molecule.
    It inherits from the SmartsMatcherBase abstract base class.
    """

    def compute(self, mol: MolType) -> dok_matrix:
        """Perform SMARTS pattern matching on a molecule.

        Args:
            mol (MolType): A molecule object to match patterns on.

        Returns:
            (dok_matrix): A sparse matrix of atom types that match the SMARTS patterns in the given molecule.
        """
        atom_types = dok_matrix((self.n_atoms, len(ATypes)), dtype=np.int8)

        for atom_type in SmartsPatternRegistry:
            smartsdict = SmartsPatternRegistry[atom_type.name].value
            for smarts in smartsdict.values():
                ob_smart: ObSmartPatternType = OBSmartsPatternWrapper(ob.OBSmartsPattern())
                ob_smart.Init(str(smarts))
                ob_smart.Match(mol)

                matches = [x[0] for x in ob_smart.GetMapList()]
                for match in matches:
                    atom = mol.GetAtom(match)

                    if atom.GetResidue().GetName() not in STANDARD_AMINO_ACIDS:
                        atom_types[atom.GetId(), ATypes[atom_type.name]] = 1

        return atom_types

compute

compute(mol)

Perform SMARTS pattern matching on a molecule.

Parameters:

Name Type Description Default
mol MolType

A molecule object to match patterns on.

required

Returns:

Type Description
dok_matrix

A sparse matrix of atom types that match the SMARTS patterns in the given molecule.

Source code in lahuta/core/matchers.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def compute(self, mol: MolType) -> dok_matrix:
    """Perform SMARTS pattern matching on a molecule.

    Args:
        mol (MolType): A molecule object to match patterns on.

    Returns:
        (dok_matrix): A sparse matrix of atom types that match the SMARTS patterns in the given molecule.
    """
    atom_types = dok_matrix((self.n_atoms, len(ATypes)), dtype=np.int8)

    for atom_type in SmartsPatternRegistry:
        smartsdict = SmartsPatternRegistry[atom_type.name].value
        for smarts in smartsdict.values():
            ob_smart: ObSmartPatternType = OBSmartsPatternWrapper(ob.OBSmartsPattern())
            ob_smart.Init(str(smarts))
            ob_smart.Match(mol)

            matches = [x[0] for x in ob_smart.GetMapList()]
            for match in matches:
                atom = mol.GetAtom(match)

                if atom.GetResidue().GetName() not in STANDARD_AMINO_ACIDS:
                    atom_types[atom.GetId(), ATypes[atom_type.name]] = 1

    return atom_types

ParallelSmartsMatcher

Bases: SmartsMatcherBase

Matches SMARTS patterns to atoms in a molecule using multiple threads.

This class performs SMARTS pattern matching on atoms in a molecule using multiple threads for improved performance. It inherits from the SmartsMatcherBase abstract base class.

Source code in lahuta/core/matchers.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
class ParallelSmartsMatcher(SmartsMatcherBase):
    """Matches SMARTS patterns to atoms in a molecule using multiple threads.

    This class performs SMARTS pattern matching on atoms in a molecule using multiple threads for
    improved performance. It inherits from the SmartsMatcherBase abstract base class.
    """

    def __init__(self) -> None:
        self.precomputed_ob_smarts = self.precompute_ob_smarts()

    def precompute_ob_smarts(self) -> dict[str, list[ObSmartPatternType]]:
        """Precompute and stores the Open Babel SMARTS patterns for all atom types.

        Returns:
            (dict[str, list[ObSmartPatternType]]): A dictionary with atom type names as keys and lists of
                                                precomputed Open Babel SMARTS patterns as values.
        """
        precomputed_ob_smarts: dict[str, list[ObSmartPatternType]] = {}
        for atom_type in SmartsPatternRegistry:
            smartsdict = SmartsPatternRegistry[atom_type.name].value
            precomputed_ob_smarts[atom_type.name] = []
            for smarts in smartsdict.values():
                ob_smart: ObSmartPatternType = OBSmartsPatternWrapper(ob.OBSmartsPattern())
                ob_smart.Init(str(smarts))
                precomputed_ob_smarts[atom_type.name].append(ob_smart)
        return precomputed_ob_smarts

    def match_ob_smarts(
        self,
        ob_smart: ObSmartPatternType,
        mol: MolType,
        atypes: dict[str, int],
        atom_type: str,
    ) -> list[tuple[Any, int]]:
        """Match an Open Babel SMARTS pattern to a molecule.

        Args:
            ob_smart (ObSmartPatternType): An Open Babel SMARTS pattern.
            mol (MolType): A molecule object to match the pattern on.
            atypes (dict[str, int]): A dictionary of atom types.
            atom_type (str): The name of the atom type that the SMARTS pattern represents.

        Returns:
            list[tuple[Any, int]]: A list of tuples, where each tuple contains the matched atom's
                                    index and the corresponding atom type.
        """
        ob_smart.Match(mol)
        matches = [x[0] for x in ob_smart.GetMapList()]
        return [(match, atypes[atom_type]) for match in matches]

    def compute(self, mol: MolType) -> dok_matrix:
        """Perform SMARTS pattern matching on a molecule using multiple threads.

        Args:
            mol (MolType): A molecule object to match patterns on.

        Returns:
            (dok_matrix): A sparse matrix of atom types that match the SMARTS patterns in the given molecule.
        """
        atom_types = dok_matrix((self.n_atoms, len(ATypes)), dtype=np.int8)

        num_threads = os.cpu_count()

        with ThreadPoolExecutor(max_workers=num_threads) as executor:
            for atom_type, ob_smarts_list in self.precomputed_ob_smarts.items():
                future_matches = [
                    executor.submit(self.match_ob_smarts, ob_smart, mol, ATypes, atom_type)
                    for ob_smart in ob_smarts_list
                ]

                for future in future_matches:
                    matches = future.result()
                    for match, atype in matches:
                        atom = mol.GetAtom(match)
                        if atom.GetResidue().GetName() not in STANDARD_AMINO_ACIDS:
                            atom_types[atom.GetIdx() - 1, atype] = 1

        return atom_types

precompute_ob_smarts

precompute_ob_smarts()

Precompute and stores the Open Babel SMARTS patterns for all atom types.

Returns:

Type Description
dict[str, list[ObSmartPatternType]]

A dictionary with atom type names as keys and lists of precomputed Open Babel SMARTS patterns as values.

Source code in lahuta/core/matchers.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
def precompute_ob_smarts(self) -> dict[str, list[ObSmartPatternType]]:
    """Precompute and stores the Open Babel SMARTS patterns for all atom types.

    Returns:
        (dict[str, list[ObSmartPatternType]]): A dictionary with atom type names as keys and lists of
                                            precomputed Open Babel SMARTS patterns as values.
    """
    precomputed_ob_smarts: dict[str, list[ObSmartPatternType]] = {}
    for atom_type in SmartsPatternRegistry:
        smartsdict = SmartsPatternRegistry[atom_type.name].value
        precomputed_ob_smarts[atom_type.name] = []
        for smarts in smartsdict.values():
            ob_smart: ObSmartPatternType = OBSmartsPatternWrapper(ob.OBSmartsPattern())
            ob_smart.Init(str(smarts))
            precomputed_ob_smarts[atom_type.name].append(ob_smart)
    return precomputed_ob_smarts

match_ob_smarts

match_ob_smarts(ob_smart, mol, atypes, atom_type)

Match an Open Babel SMARTS pattern to a molecule.

Parameters:

Name Type Description Default
ob_smart ObSmartPatternType

An Open Babel SMARTS pattern.

required
mol MolType

A molecule object to match the pattern on.

required
atypes dict[str, int]

A dictionary of atom types.

required
atom_type str

The name of the atom type that the SMARTS pattern represents.

required

Returns:

Type Description
list[tuple[Any, int]]

list[tuple[Any, int]]: A list of tuples, where each tuple contains the matched atom's index and the corresponding atom type.

Source code in lahuta/core/matchers.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def match_ob_smarts(
    self,
    ob_smart: ObSmartPatternType,
    mol: MolType,
    atypes: dict[str, int],
    atom_type: str,
) -> list[tuple[Any, int]]:
    """Match an Open Babel SMARTS pattern to a molecule.

    Args:
        ob_smart (ObSmartPatternType): An Open Babel SMARTS pattern.
        mol (MolType): A molecule object to match the pattern on.
        atypes (dict[str, int]): A dictionary of atom types.
        atom_type (str): The name of the atom type that the SMARTS pattern represents.

    Returns:
        list[tuple[Any, int]]: A list of tuples, where each tuple contains the matched atom's
                                index and the corresponding atom type.
    """
    ob_smart.Match(mol)
    matches = [x[0] for x in ob_smart.GetMapList()]
    return [(match, atypes[atom_type]) for match in matches]

compute

compute(mol)

Perform SMARTS pattern matching on a molecule using multiple threads.

Parameters:

Name Type Description Default
mol MolType

A molecule object to match patterns on.

required

Returns:

Type Description
dok_matrix

A sparse matrix of atom types that match the SMARTS patterns in the given molecule.

Source code in lahuta/core/matchers.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def compute(self, mol: MolType) -> dok_matrix:
    """Perform SMARTS pattern matching on a molecule using multiple threads.

    Args:
        mol (MolType): A molecule object to match patterns on.

    Returns:
        (dok_matrix): A sparse matrix of atom types that match the SMARTS patterns in the given molecule.
    """
    atom_types = dok_matrix((self.n_atoms, len(ATypes)), dtype=np.int8)

    num_threads = os.cpu_count()

    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        for atom_type, ob_smarts_list in self.precomputed_ob_smarts.items():
            future_matches = [
                executor.submit(self.match_ob_smarts, ob_smart, mol, ATypes, atom_type)
                for ob_smart in ob_smarts_list
            ]

            for future in future_matches:
                matches = future.result()
                for match, atype in matches:
                    atom = mol.GetAtom(match)
                    if atom.GetResidue().GetName() not in STANDARD_AMINO_ACIDS:
                        atom_types[atom.GetIdx() - 1, atype] = 1

    return atom_types