Loading Data
The entry point to Lahuta is provided via its Luni class from its core module. Think of this class as the main interface to Lahuta, similar to MDAnalysis.Universe. In fact the name Luni stands for Lahuta Universe.
There are two ways to initialize the Luni class. The first way is to pass a path to a PDB or PDBx/mmCIF file to it. The second way is to pass an MDAnalysis.AtomGroup object to it.
Example - Initializing the Luni Class Using File Paths
from lahuta import Luni
luni_pdb = Luni("path/to/file.pdb") # (1)!
luni_cif = Luni("path/to/file.cif") # (2)!
luni_pdb_gz = Luni("path/to/file.pdb.gz") # (3)!
luni_cif_gz = Luni("path/to/file.cif.gz") # (4)!
luni_gro = Luni("path/to/file.gro") # (5)!
- The
Luniclass is initialized by passing a path to a PDB file to it. - The
Luniclass is initialized by passing a path to a PDBx/mmCIF file to it. - The
Luniclass is initialized by passing a path to a compressed PDB or PDBx/mmCIF file to it. - The
Luniclass is initialized by passing a path to a compressed PDB or PDBx/mmCIF file to it. - The
Luniclass is initialized by passing a path to a GRO file to it.
Example - Initializing the Luni Class Using MDAnalysis.AtomGroup Objects
import MDAnalysis as mda
from lahuta import Luni
u = mda.Universe("path/to/file.pdb") # (1)!
u_traj = mda.Universe("path/to/file.pdb", "path/to/file.xtc") # (2)!
luni = Luni(u.atoms) # (3)!
luni_traj = Luni(u_traj.atoms) # (4)!
- The
MDAnalysis.Universeclass is used to load the PDB file. - The
MDAnalysis.Universeclass is used to load the PDB file and the XTC trajectory. - The
Luniclass is initialized by passing anMDAnalysis.AtomGroupobject to it. - The
Luniclass is initialized by passing anMDAnalysis.AtomGroupobject to it.
When using Lahuta with MDAnalysis, the primary adjustment is to supply the Luni class with an MDAnalysis.AtomGroup object rather than a file path. When an MDAnalysis.Universe object has an associated trajectory, Lahuta recognizes and utilizes it. In fact, as we will discuss later, Lahuta provides specialized features for extracting contacts from MD trajectories.
In essence, PDB files as well as PDBx/mmCIF files are supported. Compressed files are also supported. Further, by supporting MDAnalysis, Lahuta supports all file formats supported by MDAnalysis. This includes PDB, GRO, XTC, TRR, DCD, and many more. See the MDAnalysis documentation for more information.
Accessing Information¶
Upon initializing the Luni class, the contained information is accessible through the arc attribute, specifically its atoms, residues, and chains sub-attributes. These sub-attributes are instantiated from the Atoms, Residues, and Chains classes, respectively, that adeptly encapsulate the pertinent data. Integral to these classes is their foundation on NumPy's Structured Arrays. Structured Arrays in NumPy are efficient because they allow for the flexible indexing of complex data types within a contiguous memory block, facilitating rapid access and operations on the data. This choice of data structure not only offers streamlined storage but also optimizes the retrieval and manipulation of the loaded biomolecular details.
Example - Accessing Loaded Information
from lahuta import Luni
luni = Luni("path/to/file.pdb")
# Accessing Atoms
atoms = luni.arc.atoms # (1)!
ids = atoms.ids
elements = atoms.elements
names = atoms.names
coords = atoms.coordinates
# Accessing Residues
residues = luni.arc.residues # (2)!
resids = residues.resids
resnames = residues.resnames
# Accessing Chains
chains = luni.arc.chains # (3)!
ids = chains.ids
auths = chains.auths
labels = chains.labels
print (names)
print (resids)
print (labels)
#> array(['N', 'C', 'C', ..., 'H', 'H', 'H'], dtype='<U10')
#> array([-3, -3, -3, ..., 90, 90, 90])
#> array(['A', 'A', 'A', ..., 'B', 'B', 'B'], dtype='<U10')
- The
atomsattribute contains anAtomsobject that encapsulates the atom information. - The
residuesattribute contains aResiduesobject that encapsulates the residue information. - The
chainsattribute contains aChainsobject that encapsulates the chain information.
Example - Accessing Information using Indexing
from lahuta import Luni
luni = Luni("path/to/file.pdb")
print (luni.arc.atoms.names[:3]) # (1)!
#> array(['N', 'CA', 'C'], dtype='<U10')
atom_ix0 = luni.arc[0] # (2)!
print (atom_ix0)
#> Atom(name=N, id=0, element=N, type=N, resname=ALA, resid=-3, chain_label=A, chain_id=1)
print (atom_ix0.resname)
#> ALA
atoms_ix20_ix22 = luni.arc[20:22] # (3)!
#> [
#> Atom(name=H, id=20, element=H, type=H, resname=ASP, resid=-2, chain_label=A, chain_id=1),
#> Atom(name=HA, id=21, element=H, type=H, resname=ASP, resid=-2, chain_label=A, chain_id=1)
#> ]
- The
Atomsobject can be indexed using NumPy's indexing syntax. - Indexing the
ARCobject returns anAtomobject that encapsulates that atom's information. - Indexing the
ARCobject with a slice returns a list ofAtomobjects that encapsulate the atoms' information.
Third-Party Libraries¶
Lahuta relies specifically on three libraries: MDAnalysis, Gemmi, and OpenBabel. MDAnalysis is used for loading data from MD simulations. Gemmi is used for parsing PDBx/mmCIF files. OpenBabel is used for SMARTS pattern matching and perception of chemical properties (bonds, aromaticity, etc.). Lahuta has internal API that allows it to use these libraries very efficiently and initialize the required objects depending on the data source. For example, if the data is loaded from a PDB file, then the Gemmi.Structure object is used for parsing the file. The MDAnalysis.Universe, and OpenBabel.OBMol objects are created from the Gemmi.Structure object.
Example - Accessing the Underlying Objects
from lahuta import Luni
luni = Luni("path/to/file.pdb")
# Accessing Underlying Objects
mda_ag = luni.to("mda") # (1)!
ob_obmol = luni.to("mol") # (2)!
- The
tomethod is used to convert theLuniobject to anMDAnalysis.AtomGroupobject. - The
tomethod is used to convert theLuniobject to anOpenBabel.OBMolobject.
Warning
Calling the to method with "mol" as an argument will trigger the perception of chemical properties. This is done for performance reasons, because OpenBabel does not support vectorized OBMol creation and we have to iterate over all atoms instead. To ensure we only do this once, all required chemical perceptions and SMARTS pattern matching are done the first time the to method is called with "mol" as an argument. The resulting OpenBabel.OBMol object is then cached and returned on subsequent calls.
Chemical Perception and SMARTS Pattern Matching¶
Lahuta relies on OpenBabel for chemical perception and SMARTS pattern matching. Depending on the system and the type of hardware, this process may be the most time consuming step in the analysis pipeline. We have made a few attempts to speed this process up, but challenges using OpenBabel's API and inherent difficulties in parallelizing unpicklable objects have limited our success. Regardless, for the vast majority of structures and systems, this process is very fast and should not be a bottleneck.
For MD simulations, which can get very large, and if the user wishes to compute contacts for all atoms without any pre-processing or filtering, and if this analyis is done on old hardware, then this process may take a few seconds. Even then, this process is only done once and not for all frames in the trajectory.
Example - Getting the system ready
from lahuta import Luni
luni = Luni("path/to/file.pdb")
# Getting the system ready (compute perception and SMARTS pattern matching)
luni.ready() # (1)!
# Computing Neighbors
ns = luni.compute_neighbors()
- This method is auto-invoked by
compute_neighborsor when thetomethod uses"mol"as an argument. While it's usually automatic, you can call it directly if desired to make the computation explicit.