[Bioc-devel] From Biostring matching to short read mapping

Thu Nov 7 11:11:10 CET 2019

Dear bioc-devel,

multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr> provides functions for Crispr/Cas9 gRNA design (and is being prepared for BioC). One task involves finding all genomic (mis)matches of a 23-bp candidate Cas9 sequence. Currently this is done with `Biostrings::vcountPDict`, an approach that is successful, though not fast. An alternative would be to switch to short read mapping rather than (Bio)string matching, which involves a one-time indexing effort, but subsequent fast alignment.

`Rsubread::align` seems to be limited to max. 16 `nBestLocations`, whereas I know from vcountPDict that some Cas9 candidates have hundreds of genomic matches.

`QuasR::qAlign` (connecting to Bowtie) does not mention an upper limit on `maxHits`.

Feedback request...

Michael, would QuasR/(R)bowtie be a good approach to do this?
Wei, did I overlook a way to do this with Rsubread?
Herve, is there an elegant way to speed up vcountPDict (parallelize?)

Thankyou :)

Aditya

	[[alternative HTML version deleted]]