mgkit.align module¶
Module dealing with BAM/SAM files
-
class
mgkit.align.
SamtoolsDepth
(file_handle, num_seqs=10000, max_size=1000000, max_size_dict=None)[source]¶ Bases:
object
Changed in version 0.4.0: uses pandas.SparseArray now. It should use less memory, but needs pandas version > 0.24
New in version 0.3.0.
A class used to cache the results of
read_samtools_depth()
, while reading only the necessary data from a`samtools depth -aa` file.-
data
= None¶
-
file_handle
= None¶
-
max_size
= None¶
-
max_size_dict
= None¶
-
region_coverage
(seq_id, start, end)[source]¶ Returns the mean coverage of a region. The start and end parameters are expected to be 1-based coordinates, like the correspondent attributes in
mgkit.io.gff.Annotation
ormgkit.io.gff.GenomicRange
.If the sequence for which the coverage is requested is not found, the depth file is read (and cached) until it is found.
Parameters: Returns: mean coverage of the requested region
Return type:
-
-
mgkit.align.
add_coverage_info
(annotations, bam_files, samples, attr_suff='_cov')[source]¶ Changed in version 0.3.4: the coverage now is returned as floats instead of int
Adds coverage information to annotations, using BAM files.
The coverage information is added for each sample as a ‘sample_cov’ and the total coverage as as ‘cov’ attribute in the annotations.
Note
The bam_files and sample variables must have the same order
Parameters: - annotations (iterable) – iterable of annotations
- bam_files (iterable) – iterable of
pysam.Samfile
instances - sample (iterable) – names of the samples for the BAM files
-
mgkit.align.
covered_annotation_bp
(files, annotations, min_cov=1, progress=False)[source]¶ New in version 0.1.14.
Returns the number of base pairs covered of annotations over multiple samples.
Parameters: Returns: a dictionary whose keys are the uid and the values the number of bases that are covered by reads among all samples
Return type:
-
mgkit.align.
get_region_coverage
(bam_file, seq_id, feat_from, feat_to)[source]¶ Return coverage for an annotation.
Note
feat_from and feat_to are 1-based indexes
Parameters: Return int: coverage array for the annotation
-
mgkit.align.
read_samtools_depth
(file_handle, num_seqs=10000, seq_ids=None)[source]¶ Changed in version 0.4.0: now returns 3 array, instead of 2. Also added seq_ids to skip lines
Changed in version 0.3.4: num_seqs can be None to avoid a log message
New in version 0.3.0.
Reads a samtools depth file, returning a generator that yields the array of each base coverage on a per-sequence base.
Note
The information on position is not used, to use numpy and save memory. samtools depth should be called with the -aa option:
`samtools depth -aa bamfile`
This options will output both base position with 0 coverage and sequneces with no aligned reads
Parameters: Yields: tuple – the first element is the sequence identifier, the second one is the numpy array with the positions, the third element is the numpy array with the coverages