[Bioc-sig-seq] mappable length of a genome

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Jul 7 15:50:34 CEST 2010


On Wed, Jul 7, 2010 at 9:36 AM, Cei Abreu-Goodger <cei at ebi.ac.uk> wrote:
> After short-read alignment, one post-processing step might be to normalize
> by the length (e.g. of an individual exon, of all genes, etc). This should
> actually be the mappable length of these portions of the genome, not the
> real length. Mappable length could be defined as the number of distinct
> k-mers that uniquely align in a given portion of the genome.

Of course you want to use mappable length, as we did a long time ago:

http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000299

You can also find the Broad "mappability" track on UCSC.

In my opinion, mappable length ought to be computed under the same
alignment strategy and aligner as you are using to map the reads.  Of
course, one could claim that mappability under one strategy ought to
be pretty similar to mappability under another strategy, but I have
never seen any real investigation into these claims.

It is pretty easy to compute for small genomes, and it is computable
for larger genomes, although it does involve a lot of scripting and
postproccesing.  You can cut down your time if your're only interested
in say mappability for all ensembl genes (which is about 100x faster
than mappability for the entire human genome).

I have always used custom scripts for this.

Kasper

> In a previous thread, Simon Andrews mentioned a Bowtie perl wrapper:
>
> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-May/000315.html
>
> I seem to recall another post suggesting using the BSgenome packages for a
> similar purpose...
>
> Perhaps I'm missing something obvious and this functionality is already
> included in one of the many sequencing-related packages out there.
>
> Any thoughts?
>
> Cheers,
>
> Cei
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list