On Fri, Jan 22, 2010 at 11:41 AM, Robert Castelo <robert.castelo@upf.edu>wrote:

> dear list, and particularly, the IRanges developers,
>
> i'm using the function findOverlaps from the IRanges package because i
> need to find what stranded genomic intervals from one set (as a
> RangedData object) overlap with what stranded genomic intervals from
> another set (as another RangedData object). the problem is that i don't
> what to consider overlaps between genomic intervals from different
> strands.
>
> i've been looking to the help page of findOverlaps (devel version, see
> my sessionInfo() below) and searched through the BioC mailinglist and my
> preliminary conclusion is that such an operation is not yet supported.
>
> i've been thinking of using rdapply to break down the RangedData objects
> into spaces and then again by the two strands but the problem is that
> the query and subject indexes resulting of findOverlaps will not match
> the dimension of the original RangedData objects.
>
> so, i'd like to suggest that some option is added to this useful
> function to restrict the overlapping search by strand. of course, if
> this is somehow already implemented and i just missed it, then i'll be
> very grateful if you let me know what function/parameter i should be
> using.
>
>
Well, IRanges knows nothing about Biology, so a 'strand' option would be out
of place, in my opinion. That said, I can think of at least two approaches.

1) Simply filter the results for matches that are the the same strand. This
is something as simple as:
result <- findOverlaps(a, b)
mat <- as.matrix(result)
mat <- mat[a$strand[mat[,1L]] == b$strand[mat[,2L]],]

2) Out of recognition that we are really treating the two strands as
separate spaces, break down the RangedData into chrom*strand spaces, as in:
rd <- RangedData(...)
rd <- do.call(c, split(rd, rd$strand))
result <- findOverlaps(rd, ...)
## then maybe eventually go back chromosome spaces
rds <- split(rd, rd$strand)
names(rds[[1]]) <- chromNames
names(rds[[2]]) <- chromNames
rd <- do.call(rbind, rds)

The second approach would be very convenient if you always want to treat the
strands separately. The separation could be specified at construction time,
e.g.:
RangedData(ranges, strand, space = interaction(chrom, strand))

But in general neither of these are awfully convenient, and I've always had
the suspicion that we'd eventually need multiple space variables. Yes, we
could add some argument to the findOverlaps method for RangedData that takes
a vector of variable names for splitting into subspaces, but I think we
would want a more general solution, where the RangedData itself has the
notion of subspaces. This would be a non-trivial change. Would it behave
like a nested list in some ways?

Hopefully others have better ideas...

Michael



>
> thanks a lot!!
> robert.
>
> sessionInfo()
> R version 2.11.0 Under development (unstable) (2009-10-06 r49948)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] IRanges_1.5.16
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

