mgkit.snps.classes module¶
Manage SNP data.
-
class
mgkit.snps.classes.
GeneSNP
(gene_id='', taxon_id=0, exp_syn=0, exp_nonsyn=0, coverage=None, snps=None, uid=None, json_data=None)[source]¶ Bases:
mgkit.snps.classes.RatioMixIn
New in version 0.1.13.
Class defining gene and synonymous/non-synonymous SNPs.
It defines background synonymous/non-synonymous attributes and only has a method right now, which calculate pN/pS ratio. The method is added through a mixin object, so the ratio can be customised and be shared with the old implementation.
-
snps
¶ list of SNPs associated with the gene, each element is a tuple with the position (relative to the gene start), the second is the nucleotidic change and the third is the aa SNP type as defined by
SNPType
.Type: list
Note
The main difference with the
GeneSyn
is that all snps are kept and syn and nonsyn are not attributes but properties that return the count of synonymous and non-synonymous SNPs in the snps list.Warning
This class uses more memory than
GeneSyn
because it doesn’t use __slots__, it may be changed in later versions.-
add
(other)[source]¶ Inplace addition of another instance values. No check for them being the same gene/taxon, it’s up to the user to check that they can be added together.
Parameters: other – instance of GeneSyn
to add
-
add_snp
(position, change, snp_type=<SNPType.unknown: 0>)[source]¶ Adds a SNP to the list
Parameters:
-
coverage
= None
-
exp_nonsyn
= None
-
exp_syn
= None
-
from_json
(data)[source]¶ Instantiate the instance with values from a json definition
Parameters: data (str) – json representation, as returned by GeneSNP.to_json()
-
gene_id
= None
-
nonsyn
¶ Returns the expected non-synonymous changes
-
snps
= None
-
syn
¶ Returns the expected synonymous changes
-
taxon_id
= None
-
to_json
()[source]¶ Returns a json definition of the instance
Returns: json representation of the instance Return type: str
-
uid
= None
-
-
class
mgkit.snps.classes.
RatioMixIn
[source]¶ Bases:
object
-
calc_ratio
(haplotypes=False)[source]¶ Changed in version 0.2.2: split the function to handle flag_value in another method
Calculate \(\frac {pN}{pS}\) for the gene.
(1)¶\[\frac {pN}{pS} = \frac{ ^{oN}/_{eN}}{ ^{oS}/_{eS}}\]WHere:
- oN (number of non-synonymous - nonsyn)
- eN (expected number of non-synonymous - exp_nonsyn)
- oS (number of synonymous - syn)
- eS (expected number of synonymous - exp_syn)
Parameters: - flag_value (bool) – when there’s no way to calculate the ratio, the possible cases will be flagged with a negative number. This allows to make substitutions for these values
- haplotypes (bool) – if true, coverage information is not used, because the SNPs are assumed to come from an alignment that has sequences having haplotypes
Returns: the \(\frac {pN}{pS}\) for the gene.
Note
Because pN or pS can be 0, and the return value would be NaN, we take in account some special cases. The default return value in this cases is
numpy.nan
.Both synonymous and non-synonymous values are 0:
- if both the syn and nonsyn attributes are 0 but there’s
coverage for this gene, we return a 0, as there’s no
evolution in this gene. Before, the coverage was checked by
this method against either the passed min_cov parameter
that was equal to
MIN_COV
. Now the case is for the user to check the coverage and functions inmgkit.snps.conv_func
do that. If enough coverage was achieved, the haplotypes parameter can be used to return a 0
- if both the syn and nonsyn attributes are 0 but there’s
coverage for this gene, we return a 0, as there’s no
evolution in this gene. Before, the coverage was checked by
this method against either the passed min_cov parameter
that was equal to
All other cases return a NaN value
Return type:
-
calc_ratio_flag
()[source]¶ New in version 0.2.2.
Handles cases where it’s important to flag the returned value, as explained in
GeneSNP.calc_ratio()
, and when the both the number of synonymous and non-synonymous is greater than 0, the pN/pS value is returned.- The number of non-synonymous is greater than 0 but the number of
synonymous is 0:
- if flag_value is True, the returned value is -1
The number of synonymous is greater than 0 but the number of non-synonymous is 0:
- if flag_value is True, the returned value is -2
\(oS\) \(oN\) return value >0 >0 pN/pS 0 0 -3 >0 0 -1 0 >0 -2
-