[BioC] Finding my un-counted reads
Steve Lianoglou
lianoglou.steve at gene.com
Fri Aug 30 19:16:02 CEST 2013
Hi,
On Fri, Aug 30, 2013 at 9:38 AM, Sam McInturf <smcinturf at gmail.com> wrote:
> Valarie,
>
> By calling:
> olap <- findOverlaps(gff0, reads)
> foo <- countSubjectHits(olap)
>
> `countSubjectHits(x): Counts the number of hits for each subject, returning
> an integer vector.`
> foo is a vector with a length equal to the number of reads mapped? so:
>
> length(foo[foo == 0])
>
> would tell me how many reads did not map to my annotated features.
>
> can you recommend a method that would allow me to map to the regions
> outside of my current features? I have been trying to devise a way to make
> a new set of features that are the spaces between my current features, but
> I have been unable to implement it successfully.
>
> Maybe a more genomics oriented approach and look at the coverage over each
> chromosome to try and identify regions that have a higher 'background' that
> could be then looked at in a genome browser like IGV to see the reads that
> did not map to a transcript?
I had previously offered a suggestion, like so:
tx <- transcripts(txdb)
strand(tx) <- '*';
txs <- reduce(tx)
o <- countOverlaps(txs, reads, ignore.strand=TRUE)
But I was missing a crucial step, which was to use the `gaps` function, eg:
tx <- transcripts(txdb)
strand(tx) <- '*';
txs <- reduce(tx)
nontx <- gaps(txs)
o <- countOverlaps(nontx, reads, ignore.strand=TRUE)
`nontx` should be a GenomicRanges object with ranges that are created
from the spaces between the transcripts (the "gaps") -- which should
be a good approximation to the intergenic regions you are looking for.
-steve
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
More information about the Bioconductor
mailing list