Skip to content

Extracting Contacts Into a Tabular Format

Lahuta can extract contacts into a tabular format. This is useful for further analysis and visualization. The NeighborPairs class has a method called to_frame that can be used to extract contacts into a Pandas DataFrame. The following example shows how to extract contacts into a DataFrame and then save them to a CSV file.

Example - Extracting Contacts Into a Tabular Format
# Extracting contacts into a DataFrame
df = ns.to_frame() # (1)!

# Saving the DataFrame to a CSV file
df.to_csv("contacts.csv", index=False) # (2)!
  1. The to_frame method is used to extract contacts into a Pandas DataFrame.
  2. The to_csv method is used to save the DataFrame to a CSV file.

The following table shows the first 20 rows of the CSV file generated by the above code:

partner1_resids partner1_resnames partner1_names partner1_indices partner2_resids partner2_resnames partner2_names partner2_indices distances
7 TYR CD2 133 73 PHE CG 1076 3.96418
7 TYR CD2 133 73 PHE CD1 1077 3.6598
7 TYR CE2 135 73 PHE CG 1076 3.87264
7 TYR CE2 135 73 PHE CD1 1077 3.23808
7 TYR CE2 135 73 PHE CE1 1079 3.24311
7 TYR CE2 135 73 PHE CZ 1081 3.82791
7 TYR CZ 136 73 PHE CD1 1077 3.94162
7 TYR CZ 136 73 PHE CE1 1079 3.70295
15 HIS CD2 250 90 HEC NB 1190 3.49992
15 HIS CD2 250 90 HEC C4B 1194 3.78276
15 HIS CD2 250 90 HEC ND 1206 3.87808
15 HIS CE1 251 90 HEC NB 1190 3.75112
15 HIS CE1 251 90 HEC ND 1206 3.389
15 HIS CE1 251 90 HEC C4D 1210 3.88592
15 HIS NE2 252 90 HEC NB 1190 2.84992
15 HIS NE2 252 90 HEC C1B 1191 3.58843
15 HIS NE2 252 90 HEC C4B 1194 3.61449
15 HIS NE2 252 90 HEC ND 1206 2.85429
15 HIS NE2 252 90 HEC C1D 1207 3.55377
15 HIS NE2 252 90 HEC C4D 1210 3.77193

Note

Note that to_frame does not automatically add a label to the type of contact. This is intentional!

Compact DataFrame

Example - Compact DataFrame
from lahuta import Luni

# Extracting contacts into a DataFrame
df = ns.to_frame(df_format="compact") # (1)!
  1. df_format can be either "compact" or "expanded". The latter is the default.

The following table shows the first 20 rows of the compact DataFrame generated by the above code:

partner1 partner2 distances
7-TYR-CD2-133 73-PHE-CG-1076 3.96418
7-TYR-CD2-133 73-PHE-CD1-1077 3.6598
7-TYR-CE2-135 73-PHE-CG-1076 3.87264
7-TYR-CE2-135 73-PHE-CD1-1077 3.23808
7-TYR-CE2-135 73-PHE-CE1-1079 3.24311
7-TYR-CE2-135 73-PHE-CZ-1081 3.82791
7-TYR-CZ-136 73-PHE-CD1-1077 3.94162
7-TYR-CZ-136 73-PHE-CE1-1079 3.70295
15-HIS-CD2-250 90-HEC-NB-1190 3.49992
15-HIS-CD2-250 90-HEC-C4B-1194 3.78276
15-HIS-CD2-250 90-HEC-ND-1206 3.87808
15-HIS-CE1-251 90-HEC-NB-1190 3.75112
15-HIS-CE1-251 90-HEC-ND-1206 3.389
15-HIS-CE1-251 90-HEC-C4D-1210 3.88592
15-HIS-NE2-252 90-HEC-NB-1190 2.84992
15-HIS-NE2-252 90-HEC-C1B-1191 3.58843
15-HIS-NE2-252 90-HEC-C4B-1194 3.61449
15-HIS-NE2-252 90-HEC-ND-1206 2.85429
15-HIS-NE2-252 90-HEC-C1D-1207 3.55377
15-HIS-NE2-252 90-HEC-C4D-1210 3.77193

Adding Annotations

Some types of contacts, mainly plane-plane contacts, require additional information to be displayed. This is handled internally, but to dispaly this information in the DataFrame, you need to use the annotation argument. The following example shows how to add annotations to the DataFrame.

Example - Adding Annotations
from lahuta import Luni

# Extracting contacts into a DataFrame
df = ns.to_frame(annotation=True) # (1)!
  1. annotation is set to True to add annotations to the DataFrame.

The following table shows the DataFrame generated by the above code:

partner1_resids partner1_resnames partner1_names partner1_indices partner2_resids partner2_resnames partner2_names partner2_indices distances theta_angles normal_angles ring1_atoms ring2_atoms contact_labels
7 TYR CD1 132 73 PHE CD1 1077 4.74552 61.2032 27.9044 [132, 133, 134, 135, 136, 137] [1077, 1078, 1079, 1080, 1081, 1082] EE
15 HIS ND1 249 90 HEC C1B 1191 4.43278 41.4592 83.7813 [1191, 1192, 1193, 1194, 1195] [249, 250, 251, 252, 253] OE
15 HIS ND1 249 90 HEC C1D 1207 4.54436 49.4147 89.0591 [1207, 1208, 1209, 1210, 1211] [249, 250, 251, 252, 253] OE
31 TRP CD1 458 31 TRP NE1 460 2.18248 89.4365 4.1208 [458, 459, 460, 461, 462] [460, 462, 463, 464, 465, 466] EE

Notice how now we get additional columns for the theta_angles, normal_angles, ring1_atoms, ring2_atoms, and contact_labels. This tells us exactly the atoms involved in the contact along with the angles between the two planes and the atoms involved in the two rings. Further, we get a label for the contact type.

Note

See the API for plane-plane contacts for more information.

to_frame API

Convert the NeighborPairs object to a pandas DataFrame.

The method provides two formatting options. The 'compact' format contains two columns for atom indices and one column for distances. The 'expanded' format contains four columns for atom indices (two columns for each atom pair) and one column for distances. If annotations is True, the resulting DataFrame will also include annotation columns.

Parameters:

Name Type Description Default
df_format str

The format of the DataFrame. It can be either "compact" or "expanded". Defaults to "expanded".

'expanded'
annotations bool

Whether to include annotations in the DataFrame. Defaults to False.

False

Returns:

Type Description
DataFrame

A pandas DataFrame containing the atom pairs and their distances.

Source code in lahuta/core/neighbors.py
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
def to_frame(
    self,
    df_format: Literal["compact", "expanded"] = "expanded",
    annotations: bool = False,
) -> pd.DataFrame:
    """Convert the NeighborPairs object to a pandas DataFrame.

    The method provides two formatting options. The 'compact' format contains two columns
    for atom indices and one column for distances. The 'expanded' format contains four columns
    for atom indices (two columns for each atom pair) and one column for distances.
    If `annotations` is True, the resulting DataFrame will also include annotation columns.

    Args:
        df_format (str, optional): The format of the DataFrame. It can be either "compact" or "expanded".
                                    Defaults to "expanded".
        annotations (bool, optional): Whether to include annotations in the DataFrame. Defaults to False.

    Returns:
        A pandas DataFrame containing the atom pairs and their distances.
    """
    if annotations:
        return self._create_df(df_format, self.annotations)

    return self._create_df(df_format)

Adding Annotations

Add annotations to the existing NeighborPairs object.

Parameters:

Name Type Description Default
annotations dict[str, NDArray[Any]]

A dictionary containing the annotations to be added.

required
Source code in lahuta/core/neighbors.py
650
651
652
653
654
655
656
657
658
659
def add_annotations(self, annotations: dict[str, NDArray[Any]]) -> None:
    """Add annotations to the existing NeighborPairs object.

    Args:
        annotations (dict[str, NDArray[Any]]): A dictionary containing the annotations to be added.
    """
    for value in annotations.values():
        assert len(value) == self.pairs.shape[0]

    self._annotations.update(annotations)