Differential Conservation of GPCR Contacts - A Case Study¶
A great - and inspirational - example is provided by the following paper: https://www.nature.com/articles/nature19107
In it, the authors study shared and differential conservation of atomic contacts between GPCRs in the active and inactive states. The paper provides a great example of the usefulness of native contacts in protein, and the utility of being able to study and compare contacts across multiple structures, states, and proteins. In the following notebook, we will reproduce some of the results of this paper using Lahuta
and in the process highlight the ease with which one can study contacts in proteins using Lahuta
.
1. First, let's download the structures¶
from lahuta.api import download_structures
inactive_gpcrs = [
"1GZM", "2Z73", "2VT4", "2RH1", "3PBL", "3RZE", "3UON",
"4DAJ", "3ODU", "4MBS", "4DJH", "4DKL", "4EA3", "4EJ4",
"4S0V", "4YAY", "3VW7", "3V2Y", "4Z36", "3EML", "4XNV"
]
# "4ZWJ" has issues, so we're not using it
active_gpcrs = [
"3PQR", "2YDV", "3SN6", "4MQS", "5C1M", "4XT1"
]
# `data` contains the downloaded structures: pdb_id: paht/to/pdb_file
data = download_structures([*inactive_gpcrs, *active_gpcrs], dir_loc='data')
2. Let's get the sequences of the downloaded structures¶
We define a function that we provide to the file processor to get the sequences of the structures.
from lahuta import Luni
from lahuta.api import CachedFileProcessor
def process_sequence(file_path: str) -> str:
"""Extracts the sequence from a PDB file."""
luni = Luni(file_path)
return luni.sequence
sequence_processor = CachedFileProcessor(
file_list=list(data.values()),
worker=process_sequence,
)
sequence_processor.process()
# view the sequences
sequence_processor.results
{'1gzm.pdb': 'XMNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNDDEXMNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNDDE', '2z73.pdb': 'ETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKKWIFGFAACKVYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMAASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFDYISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHEKEMAAMAKRLNAKELRKAQAGANAEMRLAKISIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEDDKDAETEIPAGEETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKKWIFGFAACKVYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMAASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFDYISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHEKEMAAMAKRLNAKELRKAQAGANAEMRLAKISIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEDDKDAETEIP', '2vt4.pdb': 'WEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAIDRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREAKEQIREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSPDFRKAFKRLLAQWEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAIDRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREAKEQIREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSPDFRKAFKRLLAFQWEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAIDRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREAKEQIREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAIDRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREAKEQIREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSPDFRKAFKRLL', '2rh1.pdb': 'DEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQEAINCYAEETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQLNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCL', '3pbl.pdb': 'YALSYCALILAIVFGNGLVCMAVLKERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGVWNFSRICCDVFVTLDVMMCTASIWNLCAISIDRYTAVVMPVHYQHGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFNTTGDPTVCSISNPDFVIYSSVVSFYLPFGVTVLVYARIYVVLKQRRRKNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYGVPLREKKATQMVAIVLGAFIVCWLPFFLTHVLNTHCQTCHVSPELYSATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSCYALSYCALILAIVFGNGLVCMAVLKERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGVWNFSRICCDVFVTLDVMMCTASIWNLCAISIDRYTAVVMPSSCRRVALMITAVWVLAFAVSCPLLFGFNTTGDPTVCSISNPDFVIYSSVVSFYLPFGVTVLVYARIYVVLKQRRRKNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYGVPLREKKATQMVAIVLGAFIVCWLPFFLTHVLNTHCQTCHVSPELYSATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC', '3rze.pdb': 'MPLVVVLSTICLVTVGLNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGAVVMPMNILYLLMSKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLRYLKYRTKTRASATILGAWFLSFLWVIPILGWNHRREDKCETDFYDVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHCNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYLHMNRERKAAKQLGFIMAAFILCWIPYFIFFMVIAFCKNCCNEHLHMFTIWLGYINSTLNPLIYPLCNENFKKTFKRILHI', '3uon.pdb': 'TFEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLTYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIVGVRTVEDGECYIQFFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRINIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPCIPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLM', '4daj.pdb': 'IWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLTYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFVGKRTVPPGECFIQFLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYLIKEAQTLSAILLAFIITWTPYNIMVLVNTFCDSCIPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKTTIWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLTYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFVGKRTVPPGECFIQFLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYLIKEAQTLSAILLAFIITWTPYNIMVLVNTFCDSCIPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKTTIWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLTYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFVGKRTVPPGECFIQFLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYLIKEAQTLSAILLAFIITWTPYNIMVLVNTFCDSCIPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKTWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLTYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFVGKRTVPPGECFIQFLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYLIKEAAQTLSAILLAFIITWTPYNIMVLVNTFCDSCIPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKTLLL', '3odu.pdb': 'PCFREENANFNKIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVANWYFGNFLCKAVHVIYTVNLYSSVWILAFISLDRYLAIVHATNSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVSEADDRYICDRFYPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKLSHSGSNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYGSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLEIIKQGCEFENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSGRPLEVLFQCFREENANFNKIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVANWYFGNFLCKAVHVIYTVNLYSSVWILAFISLDRYLAIVHATNSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVSEADDRYICDRFYPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKLSHSGSNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYGSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLEIIKQGCEFENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTS', '4mbs.pdb': 'PCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINYKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPNIIFTRSQKEGLHYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTLLRMKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEEEKKRHRDVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFGLNNCSSSNRLDQAMQVTETLGMTHCCINPIIYAFVGEEFRNYLLVFFQPCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINYKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPNIIFTRSQKEGLHYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTLLRMKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEEEKKRHRDVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFGLNNCSSSNRLDQAMQVTETLGMTHCCINPIIYAFVGEEFRNYLLVFFQ', '4djh.pdb': 'SPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNSWPFGDVLCKIVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVREDVDVIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLLSGNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPSPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNSWPFGDVLCKIVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVREDVDVIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLLSGNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFP', '4dkl.pdb': 'MVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGTWPFGNILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVKALDFRTPRNAKIVNVCNWILSSAIGLPVMFMATTKYRQGSIDCTLTFSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRNIFEMLRIDEGLRLKIYKNTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPETTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCI', '4ea3.pdb': 'PLGLKVTIVGLYLAVCVGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGFWPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEDEEIECLVEIPTPQDYWGPVFAICIFLFSFIVPVLVISVCYSLMIRRLRGVRLLSGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLAQGLGVQPSSETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRLEDNWETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLEDKSPDSPEMKDFRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRNAYIQKYLPLGLKVTIVGLYLAVCVGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGFWPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIRALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAEIECLVEIPTPQDYWGPVFAICIFLFSFIVPVLVISVCYSLMIRRLEKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLAQGLGVQPSSETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKF', '4ej4.pdb': 'RSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKLKTATNIYIFNLALADALATSTLPFQSAKYLMETWPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTQPRDGAVVCMLQFPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRNIFEMLRIDEGLRLKIYKNTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDINRRDPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRC', '4s0v.pdb': 'PKEYEWVLIAGYIIVFVVALIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVSVSVLTLSCIALDRWYAICHPSTAKRARNSIVIIWIVSCIIMIPQAIVMECSTVFKTTLFTVCDERWGGEIYPKMYHICFFLVTYMAPLCLMVLAYLQIFRKLWCRQGIDCSFWNESYLTGSRDERKKSLLSKFGMDEGVTFMFIGRFDRGQKGVDVLLKAIEILSSKKEFQEMRFIIIGKGDPELEGWARSLEEKHGNVKVITEMLSREFVRELYGSVDFVIIPSYFEPFGLVALEAMCLGAIPIASAVGGLRDIITNETGILVKAGDPGELANAILKALELSRSDLSKFRENCKKRAMSFSKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFAHDRETVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSC', '4yay.pdb': 'DLEDNWETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLEDKSPDSPEMKDFRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRNAYIQKYLILNSSDCPKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMKSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFFIITVCAFHYETLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKNDDIFKIIMAIVLFFFFSWIPHQIFTFLDVLIQLGIIRDCRIADIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKRYFLQLL', '3vw7.pdb': 'DASGYLTSSWLTLFVPSVYTGVFVVSLPLNIMAIVVFILKMKVKKPAVVYMLHLATADVLFVSVLPFKISYYFSGSDWQFGSELCRFVTAAFYCNMYASILLMTVISIDRFLAVVYPMRTLGRASFTCLAIWALAIAGVVPLLLKEQTIQVPGLGITTCHDVLSETLLEGYYAYYFSAFSAVFFFVPLIISTVCYVSIIRCLSSSANIFEMLRIDEGLRLKIYKNTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYANRSKKSRALFLSAAVFCIFIICFGPTNVLLIAHYSFLSHTSTTEAAYFAYLLCVCVSSISCCIDPLIYYYASSEC', '3v2y.pdb': 'VSDYVNYDIIVRHYNYTGKLNISADKENSIKLTSVVFILICCFIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAGVAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKNNFRLFLLISACWVISLILGGLPIMGWNCISALSSCSTVLPLYHKHYILFCTTVFTLLLLSIVILYCRIYSLVRTRNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYASRSSENVALLKTVIIVLSVFIACWAPLFILLLLDVGCKVKTCDILFRAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIMGRPL', '4z36.pdb': 'QCFYNESIAFFYNRSGKHLATEWNTVSKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAGLAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCICDIENCSNMAPLYSCSYLVFWAIFNLVTFVVMVVLYAHIFGYVADLEDNWETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKGMKDFRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRNAYIQKYLRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDCCCPQCDVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILG', '3eml.pdb': 'IMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTGFCAACHGCLFIACFVLVLTQSSIFSLLAIAIDRYIAIRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQSQGCGEGQVACLFEDVVPMNYMVYFNFFACVLVPLLLMLGVYLRIFLAARRQLNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYRSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFRKIIRSHVLRQ', '4xnv.pdb': 'SFKCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPKSLGRLKKKNAICISVLVWLIVVVAISPILFYSGTGVRKNKTITCYDTTSDEYLRSYFIYSMCTTVAMFCVPLVLILGCYGLIVRALIYKMKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIVCPLCGVGKDQFEEVEEPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDFQTPAMCAFNDRVYATYQVTRGLASLNSCVNPILYFLAGDTFRRR', '3pqr.pdb': 'MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNILENLKDVGLF', '2ydv.pdb': 'IMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSAAAADILVGVLAIPFAIAISTGFCAACHGCLFIACFVLVLTASSIFSLLAIAIDRYIAIRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKEGKAHSQGCGEGQVACLFEDVVPMNYMVYFNFFACVLVPLLLMLGVYLRIFLAARRQLKQMESQSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFRKIIRSHVLRQQEPFKAAAAENLYFQ', '3sn6.pdb': 'TEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGAGESGKSTIVKQKATKVQDIKNNLKEAIETIVAAMSNLVPPVELANPENQFRVDYILSVMNVPDFDFPPEFYEHAKALWEDEGVRACYERSNEYQLIDCAQYFLDKIDVIKQDDYVPSDQDLLRCRVSGIFETKFQVDKVNFHMFDVGGQRDERRKWIQCFNDVTAIIFVVASSSYNMTNRLQEALNLFKSIWNNRWLRTISVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCAVDTENIRRVFNDCRDIIQRMHLRQYELLQSELDQLRQEAEQLKNQIRDARKACADATLSQITNNIDPVGRIQMRTRRTLRGHLAKIYAMHWGTDSRLLVSASQDGKLIIWDSYTTNKVHAIPLRSSWVMTCAYAPSGNYVACGGLDNICSIYNLKTREGNVRVSRELAGHTGYLSCCRFLDDNQIVTSSGDTTCALWDIETGQQTTTFTGHTGDVMSLSLAPDTRLFVSGACDASAKLWDVREGMCRQTFTGHESDINAICFFPNGNAFATGSDDATCRLFDLRADQELMTYSHDNIICGITSVSFSKSGRLLLAGYDDFNCNVWDALKADRAGVLAGHDNRVSCLGVTDDGMAVATGSWDSFLKIWNNTASIAQARKLVEQLKMEANIDRIKVSKAAADLMAYCEAHAKEDPLLTPVPASENPFRNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILTKTWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRQEAINCYAEETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQLQKIDKSEGRCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCQVQLQESGGGLVQPGGSLRLSCAASGFTFSNYKMNWVRQAPGKGLEWVSDISQSGASISYTGSVKGRFTISRDNAKNTLYLQMNSLKPEDTAVYYCARCPAPFTRDCFDVTSTTYAYRGQGTQVTVSS', '4mqs.pdb': 'KTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLTYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIVGVRTVEDGECYIQFFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPCIPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMQVQLQESGGGLVQAGDSLRLSCAASGFDFDNFDDYAIGWFRQAPGQEREGVSCIDPSDGSTIYADSAKGRFTISSDNAENTVYLQMNSLKPEDTAVYVCSAWTLFHSDEYWGQGTQVTVSS', '5c1m.pdb': 'GSHSLPQTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGTWPFGNILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVKALDFRTPRNAKIVNVCNWILSSAIGLPVMFMATTKYRQGSIDCTLTFSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRMLSGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPETTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFQVQLVESGGGLVRPGGSLRLSCVDSERTSYPMGWFRRAPGKEREFVASITWSGIDPTYADSVADRFTTSRDVANNTLYLQMNSLKHEDTAVYYCAARAPVDYDYWGQGTQVTVSSAAA', '4xt1.pdb': 'DYDEDATPCVFTDVLNQSKPVTLFLYGVVFLFGSIGNFLVIFTITWRRRIQCSGDVYFINLAAADLLFVCTLPLWMQYLLDSVPCTLLTACFYVAMFASLCFITEIALDRYYAIVYMRYRPVKQACLFSIFWWIFAVIIAIPHFMVVTKKDNQCMTDYDYLEVSYPIILNVELMLGAFVIPLSVISYCYYRISRIVAVSQSRHKGRIVRVLIAVVLVFIIFWLPYHLTLFVDTLKLLKWISSSCEFERSLKRALILTESLAFCHCCLNPLLYVFVGTKFRQELHCLLAEFRHHGVTKCAITCSKMTSKIPVALLIHYQQNQASCGKRAIILETRQHRLFCADPKEQWVKDAMQHLDRQQVQLVESGGGLVRPGGSLRLSCAASGSIFTIYAMGWYRQAPGKQRELVARITFGGDTNYADSVKGRFTISRDNAKNAVYLQMNSLKPEDTAVYYCNAEETIVEEADYWGQGTQVTV'}
3. Let's align these sequences¶
As explained in previous sections, we can align the sequences by themselves, but the alignment will be more accurate if we use a pre-aligned MSA that's fine-tuned for GPCRs.
Therefore, we'll use the alignment provided by GPCRdb.
from lahuta.msa import MSAParser
parser = MSAParser(sequences=sequence_processor.results)
ref_parser = MSAParser('data/mini_gpcr_alig.fasta')
aligned_seqs = parser.align(n_jobs=4, ref_alignment=ref_parser.sequences)
aligned_seqs = aligned_seqs - ref_parser # remove sequences from the reference alignment (not strictly necessary)
INFO:root:Alignment completed. Output file located at: /tmp/tmpddf5h5q_
4. Let's process all files and create mapped NeighborPairs
objects¶
Now we need to compute the neighbors for each structure.
We, therefore, define a function that we provide to the file processor to compute the neighbors for each structure. We'll make sure to use the aligned sequences to map the neighbors, which will allow us to compare the neighbors across structures.
from pathlib import Path
from lahuta.core.labeled_neighbors import LabeledNeighborPairs
RADIUS = 5.0
RES_DIF = 4
def process_neighbors(file_path: str) -> LabeledNeighborPairs:
"""Extracts the neighbors from a PDB file."""
luni = Luni(file_path)
ns = luni.compute_neighbors(radius=RADIUS, res_dif=RES_DIF)
basename = Path(file_path).name
mapped_ns = ns.map(aligned_seqs.sequences[basename])
mapped_ns = mapped_ns.remove('resnames').remove('names')
return mapped_ns
processor = CachedFileProcessor(
file_list=list(data.values()),
worker=process_neighbors,
)
processor.process(n_jobs=4)
processor.results
{'1gzm.pdb': <Lahuta LabeledNeighborPairs class containing 2104 pairs>, '2z73.pdb': <Lahuta LabeledNeighborPairs class containing 1992 pairs>, '2vt4.pdb': <Lahuta LabeledNeighborPairs class containing 2363 pairs>, '2rh1.pdb': <Lahuta LabeledNeighborPairs class containing 1259 pairs>, '3pbl.pdb': <Lahuta LabeledNeighborPairs class containing 1733 pairs>, '3rze.pdb': <Lahuta LabeledNeighborPairs class containing 809 pairs>, '3uon.pdb': <Lahuta LabeledNeighborPairs class containing 1044 pairs>, '4daj.pdb': <Lahuta LabeledNeighborPairs class containing 3389 pairs>, '3odu.pdb': <Lahuta LabeledNeighborPairs class containing 2515 pairs>, '4mbs.pdb': <Lahuta LabeledNeighborPairs class containing 1634 pairs>, '4djh.pdb': <Lahuta LabeledNeighborPairs class containing 1923 pairs>, '4dkl.pdb': <Lahuta LabeledNeighborPairs class containing 1122 pairs>, '4ea3.pdb': <Lahuta LabeledNeighborPairs class containing 1270 pairs>, '4ej4.pdb': <Lahuta LabeledNeighborPairs class containing 787 pairs>, '4s0v.pdb': <Lahuta LabeledNeighborPairs class containing 1214 pairs>, '4yay.pdb': <Lahuta LabeledNeighborPairs class containing 656 pairs>, '3vw7.pdb': <Lahuta LabeledNeighborPairs class containing 1469 pairs>, '3v2y.pdb': <Lahuta LabeledNeighborPairs class containing 842 pairs>, '4z36.pdb': <Lahuta LabeledNeighborPairs class containing 671 pairs>, '3eml.pdb': <Lahuta LabeledNeighborPairs class containing 1261 pairs>, '4xnv.pdb': <Lahuta LabeledNeighborPairs class containing 1023 pairs>, '3pqr.pdb': <Lahuta LabeledNeighborPairs class containing 928 pairs>, '2ydv.pdb': <Lahuta LabeledNeighborPairs class containing 761 pairs>, '3sn6.pdb': <Lahuta LabeledNeighborPairs class containing 3242 pairs>, '4mqs.pdb': <Lahuta LabeledNeighborPairs class containing 864 pairs>, '5c1m.pdb': <Lahuta LabeledNeighborPairs class containing 1669 pairs>, '4xt1.pdb': <Lahuta LabeledNeighborPairs class containing 1335 pairs>}
5. Shared contacts among active GPCRs¶
For a simple exercise, we will compute the shared contacts between the active-state GPCRs, as done in the paper. Specifically, we will do the following two steps:
- Compute the shared contacts between active-state GPCRs
- Test that we find the important 3x46 - 7x53 contact, as discussed in the paper.
In terms of actual implementation, we will do the following:
- We get only the active-state GPCRs from the
processor
object we created in the previous step - We compute the intersection of contacts between all active-state GPCRs
- We them
backmap
the values in the intersection to the original indices - We test that we find the 3x46 - 7x53 contact.
5.1 Get only the active GPCRs¶
active_lnps = [processor.results[f'{x.lower()}.pdb'] for x in active_gpcrs]
active_lnps
[<Lahuta LabeledNeighborPairs class containing 928 pairs>, <Lahuta LabeledNeighborPairs class containing 761 pairs>, <Lahuta LabeledNeighborPairs class containing 3242 pairs>, <Lahuta LabeledNeighborPairs class containing 864 pairs>, <Lahuta LabeledNeighborPairs class containing 1669 pairs>, <Lahuta LabeledNeighborPairs class containing 1335 pairs>]
5.2 Compute the intersection¶
from lahuta.api import intersection
isect = intersection(*active_lnps)
isect.pairs[:3]
array([[('', '1198', ''), ('', '1274', '')], [('', '1198', ''), ('', '5364', '')], [('', '1199', ''), ('', '1274', '')]], dtype=[('names', '<U25'), ('resids', '<U25'), ('resnames', '<U25')])
5.3 & 5.4 Backmap and Test the presence of 3x46-7x53 contacts¶
active_results = {
'3pqr.pdb': [131, 306],
'2ydv.pdb': [98, 288],
'3sn6.pdb': [127, 326],
'4mqs.pdb': [117, 440],
'5c1m.pdb': [161, 336],
'4xt1.pdb': [125, 291]
}
import numpy as np
from lahuta.utils.array_utils import issubset
active_gpcrs_data = {Path(data[key]).name: data[key] for key in active_gpcrs}
for pdb_id, file_path in active_gpcrs_data.items():
print (f'Working on {pdb_id}. Status: ', end=' ')
luni = Luni(file_path)
ns = luni.compute_neighbors(radius=RADIUS, res_dif=RES_DIF)
ns = ns.backmap(aligned_seqs.sequences[pdb_id], isect.pairs)
assert issubset(np.array([active_results.get(pdb_id)]), ns.resids)
print(f'Passed!')
Working on 3pqr.pdb. Status: Passed! Working on 2ydv.pdb. Status: Passed! Working on 3sn6.pdb. Status: Passed! Working on 4mqs.pdb. Status: Passed! Working on 5c1m.pdb. Status: Passed! Working on 4xt1.pdb. Status: Passed!
Playground¶
# helper functions
from lahuta.api import count_unique_pairs_across_keys, map_unique_pairs_to_keys
# get only the inactive GPCRs
inactive_lnps = [processor.results[f'{x.lower()}.pdb'] for x in inactive_gpcrs]
inactive_lnps
[<Lahuta LabeledNeighborPairs class containing 2104 pairs>, <Lahuta LabeledNeighborPairs class containing 1992 pairs>, <Lahuta LabeledNeighborPairs class containing 2363 pairs>, <Lahuta LabeledNeighborPairs class containing 1259 pairs>, <Lahuta LabeledNeighborPairs class containing 1733 pairs>, <Lahuta LabeledNeighborPairs class containing 809 pairs>, <Lahuta LabeledNeighborPairs class containing 1044 pairs>, <Lahuta LabeledNeighborPairs class containing 3389 pairs>, <Lahuta LabeledNeighborPairs class containing 2515 pairs>, <Lahuta LabeledNeighborPairs class containing 1634 pairs>, <Lahuta LabeledNeighborPairs class containing 1923 pairs>, <Lahuta LabeledNeighborPairs class containing 1122 pairs>, <Lahuta LabeledNeighborPairs class containing 1270 pairs>, <Lahuta LabeledNeighborPairs class containing 787 pairs>, <Lahuta LabeledNeighborPairs class containing 1214 pairs>, <Lahuta LabeledNeighborPairs class containing 656 pairs>, <Lahuta LabeledNeighborPairs class containing 1469 pairs>, <Lahuta LabeledNeighborPairs class containing 842 pairs>, <Lahuta LabeledNeighborPairs class containing 671 pairs>, <Lahuta LabeledNeighborPairs class containing 1261 pairs>, <Lahuta LabeledNeighborPairs class containing 1023 pairs>]
# compute the intersection
inactive_isect = intersection(*inactive_lnps)
import numpy as np
results = {}
inactive_gpcrs_data = {Path(data[key]).name: data[key] for key in inactive_gpcrs}
for pdb_id, file_path in inactive_gpcrs_data.items():
luni = Luni(file_path)
ns = luni.compute_neighbors(radius=RADIUS, res_dif=RES_DIF)
ns = ns.backmap(aligned_seqs.sequences[pdb_id], isect.pairs)
results[pdb_id] = ns.resids
# view the results
# results
# count_unique_pairs_across_keys(results)
# map_unique_pairs_to_keys(results)
Important Notes¶
A few things to note:
- There is a lot of room for performance improvement. We are computing
NeighbPairs
twice above (once formapping
and once forbackmapping
). We could avoid this by computingNeighbPairs
once and then usingNeighbPairs.map
andNeighbPairs.backmap
to map the pairs. - We are removing both
names
andresnames
before computing the intersection. This is important if we was to commpute contacts between "topologically equivalent" residues. This "topological equivalency" in our case is provided by the MSA. - We have barely scratched the surface of what we can do. I hope this notebook provides a good starting point for you to use
Lahuta
to explore your own questions.