[BioC] ties & strandedness in distanceToNearest GRanges
Valerie Obenchain
vobencha at fhcrc.org
Wed Feb 27 18:52:54 CET 2013
Hi Tom,
Thanks for reporting the bug. A fix was checked into GenomicRanges
1.11.32 (devel) and 1.10.7 (release). Both versions will be available
through biocLite() on Thursday. The problem was in nearest(). The
distance computation for potential nearest ranges was correct for "+"
but not for "-".
Using the first negative range in your example below as the query,
query <- GRanges("10", IRanges(96723746, width=2), strand="-")
and a small collection of transcripts in 'rat' that surround this range
as the subject.
subject <- GRanges("10",
IRanges(c(95919265, 97203491),
c(96311060, 97204143)),
strand=c("-", "-"))
Previously nearest was giving the correct answer for "+" but not "-".
> nearest(query, subject, ignore.strand=TRUE)
[1] 1
> nearest(query, subject, ignore.strand=FALSE)
[1] 2
Now strand is handled correctly.
> nearest(query, subject, ignore.strand=TRUE)
[1] 1
> nearest(query, subject, ignore.strand=FALSE)
[1] 1
Valerie
On 02/23/13 11:11, Tom Oates wrote:
> Hi
> I am using distanceToNearest on a datset of CpG dinucleotides and the rat
> transcripts from the latest ensembl build.
> Datasets as below:
>
> CpGs
> GRanges with 6 ranges and 4 metadata columns:
> seqnames ranges strand |
> <Rle> <IRanges> <Rle> |
> [1] 10 [ 96723746, 96723747] - |
> [2] 7 [ 13641170, 13641171] + |
> [3] 16 [ 17772801, 17772802] - |
> [4] 3 [ 88173502, 88173503] - |
> [5] 13 [106979682, 106979683] + |
> [6] 9 [104393139, 104393140] + |
>
> rat <- makeTranscriptDbFromBiomart(
> biomart="ENSEMBL_MART_ENSEMBL",
> dataset='rnorvegicus_gene_ensembl',
> host="ensembl.org")
> rat_tx<-transcripts(rat)
>
> distances<-distanceToNearest(diff.cpgs.gr, rat.transcripts, ignore.strand=F)
>
> distances
> DataFrame with 1133 rows and 3 columns
> queryHits subjectHits distance
> <integer> <integer> <integer>
> 1 1 5962 479744
> 2 2 23710 65549
> 3 3 11077 199011
> 4 4 18109 101821
> 5 5 8159 664239
> 6 6 27327 457961
> 7 7 25795 0
> 8 8 25108 26868
> 9 9 14471 202908
>
> When I manually look through the object "distances" I have found that some
> negative strand CpGs have been assigned nearest transcripts which aren't
> the nearest.
> For example,
>
> ===========B==============B==CG========A=======A===
>
> The object distances contains a subjectHit reference to transcript A even
> though the CG is nearer to transcript B (and the transcript is on the
> negative strand so it would make more sense anyway to go to transcript B).
> The problem is not solved by:
> distanceToNearest(diff.cpgs.gr, rat.transcripts, ignore.strand=F,
> select=all)
>
> Any help would be appreciated
> Thanks
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list