[Bioc-sig-seq] mappable length of a genome

Cei Abreu-Goodger cei at ebi.ac.uk
Wed Jul 7 17:16:08 CEST 2010


Thanks Kasper for pointing out the Mapability tracks on UCSC.

Fishing around a bit more, I found that the Bowtie distribution comes 
with a "mapability.pl" script. It makes use of an undocumented (?) -F 
flag that fragments the input sequences given a window and step sizes 
(-F win,step).

For those using Bowtie, this would allow using the same alignment 
strategy for mappable length and actual mapping, as Kasper suggests.

Cheers,

Cei

Kasper Daniel Hansen wrote:
> On Wed, Jul 7, 2010 at 9:36 AM, Cei Abreu-Goodger <cei at ebi.ac.uk> wrote:
>> After short-read alignment, one post-processing step might be to normalize
>> by the length (e.g. of an individual exon, of all genes, etc). This should
>> actually be the mappable length of these portions of the genome, not the
>> real length. Mappable length could be defined as the number of distinct
>> k-mers that uniquely align in a given portion of the genome.
> 
> Of course you want to use mappable length, as we did a long time ago:
> 
> http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000299
> 
> You can also find the Broad "mappability" track on UCSC.
> 
> In my opinion, mappable length ought to be computed under the same
> alignment strategy and aligner as you are using to map the reads.  Of
> course, one could claim that mappability under one strategy ought to
> be pretty similar to mappability under another strategy, but I have
> never seen any real investigation into these claims.
> 
> It is pretty easy to compute for small genomes, and it is computable
> for larger genomes, although it does involve a lot of scripting and
> postproccesing.  You can cut down your time if your're only interested
> in say mappability for all ensembl genes (which is about 100x faster
> than mappability for the entire human genome).
> 
> I have always used custom scripts for this.
> 
> Kasper
> 
>> In a previous thread, Simon Andrews mentioned a Bowtie perl wrapper:
>>
>> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-May/000315.html
>>
>> I seem to recall another post suggesting using the BSgenome packages for a
>> similar purpose...
>>
>> Perhaps I'm missing something obvious and this functionality is already
>> included in one of the many sequencing-related packages out there.
>>
>> Any thoughts?
>>
>> Cheers,
>>
>> Cei
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>



More information about the Bioc-sig-sequencing mailing list