# conkit.core.contactmap module¶

ContactMap container used throughout ConKit

class ContactMap(id)[source]

A contact map object representing a single prediction

The ContactMap class represents a data structure to hold a single contact map prediction in one place. It contains functions to store, manipulate and organise Contact instances.

Examples

>>> from conkit.core import Contact, ContactMap
>>> contact_map = ContactMap("example")
>>> print(contact_map)
ContactMap(id="example" ncontacts=2)

coverage

float – The sequence coverage score

id

str – A unique identifier

ncontacts

int – The number of Contact instances in the ContactMap

precision

float – The precision (Positive Predictive Value) score

repr_sequence

Sequence – The representative Sequence associated with the ContactMap

repr_sequence_altloc

Sequence – The representative altloc Sequence associated with the ContactMap

sequence

Sequence – The Sequence associated with the ContactMap

top_contact

Contact – The first Contact entry

as_list(altloc=False)[source]

The ContactMap as a 2D-list containing contact-pair residue indexes

Parameters: altloc (bool) – Use the res_altloc positions [default: False]
assign_sequence_register(*args, **kwargs)
calculate_jaccard_index(*args, **kwargs)
calculate_kernel_density(*args, **kwargs)
calculate_scalar_score(*args, **kwargs)
coverage

The sequence coverage score

The coverage score is calculated by dividing the number of residues covered by the predicted contact pairs $$x_{cov}$$ by the number of residues in the sequence $$L$$.

$Coverage=\frac{x_{cov}}{L}$
Returns: The calculated coverage score float
empty

Empty contact map

find(register, altloc=False, strict=False)[source]

Find all contacts with one or both residues in register

Parameters: register (int, list, tuple) – A list of residue register to find altloc (bool) – Use the res_altloc positions [default: False] strict (bool) – Both residues of Contact in register [default: False] A modified version of the ContactMap containing the found contacts ContactMap
get_contact_density(bw_method=’amise’)[source]

Calculate the contact density in the contact map using Gaussian kernels

Various algorithms can be used to estimate the bandwidth. To calculate the bandwidth for an 1D data array X with n data points and d dimensions, the listed algorithms have been implemented. Please note, in rules 2 and 3, the value of $$\sigma$$ is the smaller of the standard deviation of X or the normalized interquartile range.

Parameters: bw_method (str, optional) – The bandwidth estimator to use [default: amise] The list of per-residue density estimates list ImportError – Cannot find scikit-learn package ValueError – Undefined bandwidth method ValueError – ContactMap is empty
get_jaccard_index(other)[source]

Calculate the Jaccard index between two ContactMap instances

This score analyzes the difference of the predicted contacts from two maps,

$J_{x,y}=\frac{\left|x \cap y\right|}{\left|x \cup y\right|}$

where $$x$$ and $$y$$ are the sets of predicted contacts from two different predictors, $$\left|x \cap y\right|$$ is the number of elements in the intersection of $$x$$ and $$y$$, and the $$\left|x \cup y\right|$$ represents the number of elements in the union of $$x$$ and $$y$$.

The J-score has values in the range of $$[0, 1]$$, with a value of $$1$$ corresponding to identical contact maps and $$0$$ to dissimilar ones.

Parameters: other (ContactMap) – A ConKit ContactMap The Jaccard index float

Warning

The Jaccard distance ranges from $$[0, 1]$$, where $$1$$ means the maps contain identical contacts pairs.

Note

The Jaccard index is different from the Jaccard distance mentioned in [1]. The Jaccard distance corresponds to $$1-Jaccard_{index}$$.

 [1] Q. Wuyun, W. Zheng, Z. Peng, J. Yang (2016). A large-scale comparative assessment of methods for residue-residue contact prediction. Briefings in Bioinformatics, [doi: 10.1093/bib/bbw106].
long_range

The long range contacts found ContactMap

Long range contacts are defined as 24 <= x residues apart

Returns: A copy of the ContactMap with long-range contacts only ContactMap
long_range_contacts
match(other, add_false_negatives=False, match_other=False, remove_unmatched=False, renumber=False, inplace=False)[source]

Modify both hierarchies so residue numbers match one another.

This function is key when plotting contact maps or visualising contact maps in 3-dimensional space. In particular, when residue numbers in the structure do not start at count 0 or when peptide chain breaks are present.

Parameters: add_false_negatives (bool) – Add false negatives to the self, which are contacts in other but not in self Required for recall() and can be undone with remove_false_negatives() other (ContactMap) – A ConKit ContactMap match_other (bool, optional) – Match other to self [default: False] remove_unmatched (bool, optional) – Remove all unmatched contacts [default: False] renumber (bool, optional) – Renumber the res_seq entries [default: False] If True, res1_seq and res2_seq changes but id remains the same inplace (bool, optional) – Replace the saved order of contacts [default: False] ContactMap instance, regardless of inplace ContactMap ValueError – Error creating reliable keymap matching the sequence in ContactMap
medium_range

The medium range contacts found ContactMap

Medium range contacts are defined as 12 <= x <= 23 residues apart

Returns: A copy of the ContactMap with medium-range contacts only ContactMap
medium_range_contacts
ncontacts

The number of Contact instances

Returns: The number of contacts in the ContactMap int
precision

The precision (Positive Predictive Value) score

The precision value is calculated by analysing the true and false postive contacts.

$Precision=\frac{TruePositives}{TruePositives + FalsePositives}$

The status of each contact, i.e true or false positive status, can be determined by running the match() function providing a reference structure.

Returns: The calculated precision score float

recall

The Recall (Sensitivity) score

The recall value is calculated by analysing the true positive and false negative contacts.

$Recall=\frac{TruePositives}{TruePositives + FalseNegatives}$

The status of each contact, i.e true positive and false negative status, can be determined by running the match() function providing a reference structure.

Note

To determine and save the false negatives, please use the add_false_negatives keyword when running the match() function.

You may wish to run remove_false_negatives() afterwards.

Returns: The calculated recall score float

reindex(index, altloc=False, inplace=False)[source]

Re-index the ContactMap

Parameters: index (int) – The new starting index [assigned to the lowest existing index in the contact map] altloc (bool) – Use the res_altloc positions [default: False] inplace (bool) – Replace the saved order of contacts [default: False] The reference to the ContactMap, regardless of inplace ContactMap ValueError – Index must be positive
remove_false_negatives(inplace=False)[source]

Remove false negatives from the contact map

Parameters: min_distance (int, optional) – The minimum number of residues between contacts [default: 5] max_distance (int, optional) – The maximum number of residues between contacts [default: sys.maxsize] inplace (bool, optional) – Replace the saved order of contacts [default: False] The reference to the ContactMap, regardless of inplace ContactMap
remove_neighbors(min_distance=5, max_distance=9223372036854775807, inplace=False)[source]

Remove contacts between neighboring residues

The algorithm works by keeping contact pairs that satisfy

min_distance <= x <= max_distance
Parameters: min_distance (int, optional) – The minimum number of residues between contacts [default: 5] max_distance (int, optional) – The maximum number of residues between contacts [default: sys.maxsize] inplace (bool, optional) – Replace the saved order of contacts [default: False] The reference to the ContactMap, regardless of inplace ContactMap
repr_sequence

The representative Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the normal res_seq positions

Returns: Sequence TypeError – Sequence undefined
repr_sequence_altloc

The representative altloc Sequence associated with the ContactMap

The peptide sequence constructed from the available contacts using the res_altseq positions

Returns: Sequence TypeError – Sequence undefined
rescale(inplace=False)[source]

Rescale the raw scores in ContactMap

Parameters: inplace (bool, optional) – Replace the saved order of contacts [default: False] The reference to the ContactMap, regardless of inplace ContactMap
sequence

The Sequence associated with the ContactMap

Returns: A Sequence object Sequence
set_scalar_score()[source]

Calculate and set the scalar_score for the ContactMap

This score is a scaled score for all raw scores in a contact map. It is defined by the formula

${x}'=\frac{x}{\overline{d}}$

where $$x$$ corresponds to the raw score of each predicted contact and $$\overline{d}$$ to the mean of all raw scores.

This score is described in more detail in [2].

 [2] S. Ovchinnikov, L. Kinch, H. Park, Y. Liao, J. Pei, D.E. Kim, H. Kamisetty, N.V. Grishin, D. Baker (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248.
set_sequence_register(altloc=False)[source]

Assign the amino acids from Sequence to all Contact instances

Parameters: altloc (bool) – Use the res_altloc positions [default: False]
short_range

The short range contacts found ContactMap

Short range contacts are defined as 6 <= x <= 11 residues apart

Returns: A copy of the ContactMap with short-range contacts only ContactMap
short_range_contacts
singletons

Singleton contact pairs in the current ContactMap

Contacts are identified by a distance-based grouping analysis. A Contact is classified as singleton if not other contacts are found within 2 residues.

Returns: ContactMap
sort(kword, reverse=False, inplace=False)[source]

Sort the ContactMap

Parameters: kword (str) – The dictionary key to sort contacts by reverse (bool, optional) – Sort the contact pairs in descending order [default: False] inplace (bool, optional) – Replace the saved order of contacts [default: False] The reference to the ContactMap, regardless of inplace ContactMap ValueError – kword not in ContactMap
to_string()[source]

Return the ContactMap as str

top_contact

The first Contact entry

Returns: The first Contact entry in ContactFile Contact