Skip to content

Luni

lahuta.Luni

The main class of the Lahuta package. It represents a universe of atoms and provides methods for computing various properties of the universe.

The Luni class is the entry point for all computations. It provides an interface for loading files, or for initializing the Luni from an existing MDAnalysis.AtomGroup instance. This way we can support both file-based loading and indicrectly support all MDAnalysis formats, as well as provide support for reading MD trajectories.

Parameters:

Name Type Description Default
*args LuniInputType

Either an MDAnalysis.AtomGroup instance or a list of file names.

()

Attributes:

Name Type Description
atom_types csc_array

A sparse array containing the atom types of the universe.

arc ARC

The ARC instance used to load the files.

sequence str

The sequence of the universe.

Raises:

Type Description
ValueError

If no input is provided or if invalid types of inputs are provided.

Source code in lahuta/core/luni.py
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
class Luni:
    """The main class of the Lahuta package. It represents a universe of atoms and provides
    methods for computing various properties of the universe.

    The Luni class is the entry point for all computations. It provides an interface for loading files,
    or for initializing the Luni from an existing MDAnalysis.AtomGroup instance. This way we can support
    both file-based loading and indicrectly support all MDAnalysis formats, as well as provide support
    for reading MD trajectories.

    Args:
        *args (LuniInputType): Either an MDAnalysis.AtomGroup instance or a list of file names.

    Attributes:
        atom_types (csc_array): A sparse array containing the atom types of the universe.
        arc (ARC): The ARC instance used to load the files.
        sequence (str): The sequence of the universe.

    Raises:
        ValueError: If no input is provided or if invalid types of inputs are provided.
    """

    def __init__(self, *args: LuniInputType) -> None:
        self._mol: Optional[MolType] = None
        self._ready = False
        self._topattr_handler = AtomAttrClassHandler()
        self.atom_types: csc_array = csc_array((0, 0), dtype=np.int32)

        initializer = self._validate_input(*args)
        self._file_loader, self._mda = initializer(*args)

        assert self._mda is not None
        assert self._file_loader is not None

    def _validate_input(self, *args: LuniInputType) -> Callable[..., tuple[BaseLoader, AtomGroupType]]:
        """Validate the input files for Luni initialization.

        Args:
        ----
        *args (LuniInputType): Either an MDAnalysis.AtomGroup instance or a list of file names.

        Raises:
            ValueError: If no input is provided or if invalid types of inputs are provided.

        Returns:
            func: A function to initialize the Luni either from an existing Luni or from files.
        """
        if not args:
            raise ValueError("No input provided")

        if isinstance(args[0], mda.AtomGroup):
            if len(args) != 1:
                raise ValueError("When passing an MDAnalysis.AtomGroup instance, no other arguments are allowed")
            return self._initialize_from_universe

        if not all(isinstance(arg, str) for arg in args):
            raise ValueError("Input must be either an MDAnalysis.AtomGroup instance or a list of file names")
        return self._initialize_from_files

    def _initialize_from_universe(self, *args: AtomGroupType) -> tuple[BaseLoader, AtomGroupType]:
        """Initialize the universe from an existing Luni.

        Args:
            *args (AtomGroupType): An MDAnalysis.AtomGroup instance.

        Returns:
            tuple: A tuple of the file loader and the AtomGroup instance.
        """
        _file_loader = TopologyLoader.from_mda(args[0])
        _mda = _file_loader.to("mda")
        return _file_loader, _mda

    def _initialize_from_files(self, files: str) -> tuple[BaseLoader, AtomGroupType]:
        """Initialize the universe from provided files.

        Args:
            files (str): The file name(s).

        Returns:
            tuple: A tuple of the file loader and the AtomGroup instance.
        """
        _file_loader = self._get_file_loader(files)
        _mda = _file_loader.to("mda")
        return _file_loader, _mda

    def _get_file_loader(self, *files: str) -> BaseLoader:
        """Retrieve the appropriate file loader.

        Args:
            *files (str): The file name(s).

        Raises:
            ValueError: If no file name is provided or if multiple files are provided.

        Returns:
            BaseLoader: The appropriate file loader.
        """
        # GemmiLoader can only handle one file and its format should be supported
        if len(files) == 1:
            file_name = files[0]
            file_format, is_pdb = Luni.get_format(file_name)
            if file_format:
                return GemmiLoader(file_name, is_pdb=is_pdb)

        # If there are multiple files or the single file is not supported by GemmiLoader,
        # then use TopologyLoader
        return TopologyLoader(*files)

    def _extend_topology(self, attrname: str, values: NDArray[Any]) -> None:
        """Add new topology attributes to the Luni.

        Args:
            attrname (str): The name of the attribute.
            values (NDArray[Any]): The values of the attribute.
        """
        self._topattr_handler.init_topattr(attrname, attrname)
        self._mda.universe.add_TopologyAttr(attrname, values)

    def ready(self) -> None:
        """Prepare instance for computations by transforming the molecule and assigning atom types."""
        assert self._file_loader is not None
        self._mol = self._file_loader.to("mol")

        assert self.arc is not None
        atomtype_assigner = AtomTypeAssigner(self._mda, self._mol, legacy=False)
        ag_types = atomtype_assigner.assign_atom_types()
        og_atoms = self._mda.universe.atoms
        self.atom_types = ag_types

        self._mda.universe.add_TopologyAttr("vdw_radii", v_radii_assignment(og_atoms.elements))

        self._ready = True

    # TODO @bisejdiu: rename to
    # https://github.com/bisejdiu/lahuta/issues/52
    def compute_neighbors(
        self,
        radius: float = 5.0,
        res_dif: int = 1,
    ) -> NeighborPairs:
        """Compute the neighbors of each atom in the Luni.

        This method calculates the neighbors for each atom based on the given radius and residue difference parameters.
        It returns an object of type `NeighborPairs` where each row in the underlying NumPy array contains the indices
        of the neighbors for the atom corresponding to that row.

        Ensures that the Luni instance is ready for computations by calling the `ready` method if needed.

        Args:
            radius (float, optional): The cutoff radius for considering two atoms as neighbors. Default is 5.0.
            res_dif (int, optional): The minimum difference in residue numbers for two atoms to be \
            considered neighbors. Default is 1.

        Returns:
            NeighborPairs: An object containing a 2D NumPy array with shape (n_atoms, n_neighbors). \
            Each row in the array contains the indices of the neighbors for the atom corresponding to that row.
        """
        if not self._ready:
            self.ready()

        neighbors = NeighborSearch(self.to("mda"))
        pairs, distances = neighbors.compute(
            radius=radius,
            res_dif=res_dif,
        )

        return NeighborPairs(self.to("mda"), self.to("mol"), self.atom_types, pairs, distances)

    @property
    def sequence(self) -> str:
        """Retrieve the sequence of the Luni.

        This method retrieves the sequence of the Luni from the underlying MDA AtomGroup instance.
        It returns a NumPy array of shape (n_atoms,) containing the one-letter amino acid codes.

        Returns:
            NDArray[np.str_]: A NumPy array containing the one-letter amino acid codes of the Luni.
        """
        assert self.arc is not None
        three_letter_codes = self.to("mda").select_atoms("protein").residues.resnames
        conversion_dict: dict[str, str] = {}
        for key, synonyms in RESIDUE_SYNONYMS.items():
            for synonym in synonyms:
                conversion_dict[synonym] = BASE_AA_CONVERSION[key]

        single_letter_codes = np.vectorize(lambda x: conversion_dict.get(x, "X"))(three_letter_codes)

        return "".join(single_letter_codes)

    @staticmethod
    def get_format(file_name: str) -> tuple[str | None, bool]:
        """Retrieve the file format from a file name.

        This static method checks the file extension of the provided file name against the list of formats
        supported by GEMMI (stored in `GEMMI_SUPPRTED_FORMATS`). If the extension matches a supported format,
        it returns the format and a boolean indicating whether the format is 'pdb' or 'pdb.gz'. If the file
        extension doesn't match any supported formats, it returns None and False.

        Args:
            file_name (str): The name of the file.

        Returns:
            tuple: A tuple containing the file format (str or None) and a boolean indicating if it is 'pdb' or 'pdb.gz'.
        """
        file_name_lower = file_name.lower()
        for fmt in GEMMI_SUPPRTED_FORMATS:
            if file_name_lower.endswith("." + fmt):
                is_pdb = fmt in {"pdb", "pdb.gz"}
                return fmt, is_pdb
        return None, False

    @overload
    def to(self, fmt: Literal["mda"]) -> AtomGroupType:
        ...

    @overload
    def to(self, fmt: Literal["mol"]) -> MolType:
        ...

    def to(self, fmt: Literal["mda", "mol"]) -> MolType | AtomGroupType:
        """Convert the Luni to a different format.

        This method converts the internal representation of the Luni to the specified format.
        Currently, the supported formats are "mda" and "mol". If the desired format is "mol" and
        the Luni already has a "mol" representation, it returns the existing representation.
        Otherwise, it uses the `to` method of the file loader to perform the conversion.

        Args:
            fmt (str): The format to convert to. Currently supported formats are "mda" and "mol".

        Returns:
            MolType | AtomGroupType: A new Luni instance in the specified format.
        """
        if fmt not in {"mda", "mol"}:
            raise ValueError(f"Invalid format: {fmt}, must be one of 'mda' or 'mol'")

        if fmt == "mol" and self._mol is not None:
            return self._mol
        if fmt == "mol":
            self._mol = self._file_loader.to(fmt)
        return getattr(self, f"_{fmt}")  # type: ignore

    @property
    def arc(self) -> None | ARC:
        """Retrieve the ARC instance used to load the files.

        Returns:
            None | ARC: The ARC instance used to load the files.
        """
        return self._file_loader.arc

    def __repr__(self) -> str:
        return f"<Luni with {self._mda.n_atoms} atoms>"

    def __str__(self) -> str:
        return self.__repr__()

sequence property

sequence

Retrieve the sequence of the Luni.

This method retrieves the sequence of the Luni from the underlying MDA AtomGroup instance. It returns a NumPy array of shape (n_atoms,) containing the one-letter amino acid codes.

Returns:

Type Description
str

NDArray[np.str_]: A NumPy array containing the one-letter amino acid codes of the Luni.

arc property

arc

Retrieve the ARC instance used to load the files.

Returns:

Type Description
None | ARC

None | ARC: The ARC instance used to load the files.

ready

ready()

Prepare instance for computations by transforming the molecule and assigning atom types.

Source code in lahuta/core/luni.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
def ready(self) -> None:
    """Prepare instance for computations by transforming the molecule and assigning atom types."""
    assert self._file_loader is not None
    self._mol = self._file_loader.to("mol")

    assert self.arc is not None
    atomtype_assigner = AtomTypeAssigner(self._mda, self._mol, legacy=False)
    ag_types = atomtype_assigner.assign_atom_types()
    og_atoms = self._mda.universe.atoms
    self.atom_types = ag_types

    self._mda.universe.add_TopologyAttr("vdw_radii", v_radii_assignment(og_atoms.elements))

    self._ready = True

compute_neighbors

compute_neighbors(radius=5.0, res_dif=1)

Compute the neighbors of each atom in the Luni.

This method calculates the neighbors for each atom based on the given radius and residue difference parameters. It returns an object of type NeighborPairs where each row in the underlying NumPy array contains the indices of the neighbors for the atom corresponding to that row.

Ensures that the Luni instance is ready for computations by calling the ready method if needed.

Parameters:

Name Type Description Default
radius float

The cutoff radius for considering two atoms as neighbors. Default is 5.0.

5.0
res_dif int

The minimum difference in residue numbers for two atoms to be considered neighbors. Default is 1.

1

Returns:

Name Type Description
NeighborPairs NeighborPairs

An object containing a 2D NumPy array with shape (n_atoms, n_neighbors). Each row in the array contains the indices of the neighbors for the atom corresponding to that row.

Source code in lahuta/core/luni.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
def compute_neighbors(
    self,
    radius: float = 5.0,
    res_dif: int = 1,
) -> NeighborPairs:
    """Compute the neighbors of each atom in the Luni.

    This method calculates the neighbors for each atom based on the given radius and residue difference parameters.
    It returns an object of type `NeighborPairs` where each row in the underlying NumPy array contains the indices
    of the neighbors for the atom corresponding to that row.

    Ensures that the Luni instance is ready for computations by calling the `ready` method if needed.

    Args:
        radius (float, optional): The cutoff radius for considering two atoms as neighbors. Default is 5.0.
        res_dif (int, optional): The minimum difference in residue numbers for two atoms to be \
        considered neighbors. Default is 1.

    Returns:
        NeighborPairs: An object containing a 2D NumPy array with shape (n_atoms, n_neighbors). \
        Each row in the array contains the indices of the neighbors for the atom corresponding to that row.
    """
    if not self._ready:
        self.ready()

    neighbors = NeighborSearch(self.to("mda"))
    pairs, distances = neighbors.compute(
        radius=radius,
        res_dif=res_dif,
    )

    return NeighborPairs(self.to("mda"), self.to("mol"), self.atom_types, pairs, distances)

get_format staticmethod

get_format(file_name)

Retrieve the file format from a file name.

This static method checks the file extension of the provided file name against the list of formats supported by GEMMI (stored in GEMMI_SUPPRTED_FORMATS). If the extension matches a supported format, it returns the format and a boolean indicating whether the format is 'pdb' or 'pdb.gz'. If the file extension doesn't match any supported formats, it returns None and False.

Parameters:

Name Type Description Default
file_name str

The name of the file.

required

Returns:

Name Type Description
tuple tuple[str | None, bool]

A tuple containing the file format (str or None) and a boolean indicating if it is 'pdb' or 'pdb.gz'.

Source code in lahuta/core/luni.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
@staticmethod
def get_format(file_name: str) -> tuple[str | None, bool]:
    """Retrieve the file format from a file name.

    This static method checks the file extension of the provided file name against the list of formats
    supported by GEMMI (stored in `GEMMI_SUPPRTED_FORMATS`). If the extension matches a supported format,
    it returns the format and a boolean indicating whether the format is 'pdb' or 'pdb.gz'. If the file
    extension doesn't match any supported formats, it returns None and False.

    Args:
        file_name (str): The name of the file.

    Returns:
        tuple: A tuple containing the file format (str or None) and a boolean indicating if it is 'pdb' or 'pdb.gz'.
    """
    file_name_lower = file_name.lower()
    for fmt in GEMMI_SUPPRTED_FORMATS:
        if file_name_lower.endswith("." + fmt):
            is_pdb = fmt in {"pdb", "pdb.gz"}
            return fmt, is_pdb
    return None, False

to

to(fmt)

Convert the Luni to a different format.

This method converts the internal representation of the Luni to the specified format. Currently, the supported formats are "mda" and "mol". If the desired format is "mol" and the Luni already has a "mol" representation, it returns the existing representation. Otherwise, it uses the to method of the file loader to perform the conversion.

Parameters:

Name Type Description Default
fmt str

The format to convert to. Currently supported formats are "mda" and "mol".

required

Returns:

Type Description
MolType | AtomGroupType

MolType | AtomGroupType: A new Luni instance in the specified format.

Source code in lahuta/core/luni.py
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
def to(self, fmt: Literal["mda", "mol"]) -> MolType | AtomGroupType:
    """Convert the Luni to a different format.

    This method converts the internal representation of the Luni to the specified format.
    Currently, the supported formats are "mda" and "mol". If the desired format is "mol" and
    the Luni already has a "mol" representation, it returns the existing representation.
    Otherwise, it uses the `to` method of the file loader to perform the conversion.

    Args:
        fmt (str): The format to convert to. Currently supported formats are "mda" and "mol".

    Returns:
        MolType | AtomGroupType: A new Luni instance in the specified format.
    """
    if fmt not in {"mda", "mol"}:
        raise ValueError(f"Invalid format: {fmt}, must be one of 'mda' or 'mol'")

    if fmt == "mol" and self._mol is not None:
        return self._mol
    if fmt == "mol":
        self._mol = self._file_loader.to(fmt)
    return getattr(self, f"_{fmt}")  # type: ignore