[BioC] reducing RangedData

Patrick Aboyoun paboyoun at fhcrc.org
Thu Oct 22 09:58:36 CEST 2009


Robert,
I wrote you some code that you can use below. I've seen other reduction 
operations where the user wanted to reduce a RangedData type object by 
more than one column (e.g. strand and score), so perhaps an appropriate 
signature for a reduce method for RangedData would be

reduce(x, by, ...)

If this method where written, the remaining question would be should it 
support reducing transformations of the remaining values columns or 
should non "by" columns be dropped from the output. I'm inclined towards 
the latter, since a reduce(x, by, valuesFUN) method wouldn't be much 
simpler than the rdapply method given below. Would a straight-forward 
reduce(x = rd, by = "strand") approach for RangedData meet your needs?


 > suppressMessages(library(IRanges))
 > sta <- c(1, 2, 5, 6, 2, 3, 4, 5)
 > end <- c(3, 4, 5, 7, 2, 4, 6, 7)
 > str <- rep(c("+", "+", "-", "-"), 2)
 > chr <- rep(c("chr1", "chr2"), each = 4)
 > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr)
 > rd
RangedData: 8 ranges by 1 columns
columns(1): strand
sequences(2): chr1 chr2
 >
 > # Interesting bit starts here
 > reduceStranded <- function(x) {
+     strand <- x$strand
+     rngs <- ranges(x)[[1]]
+     ans <- IRangesList("+" = reduce(rngs[strand == "+"]),
+                        "-" = reduce(rngs[strand == "-"]))
+     RangedData(unlist(ans, use.names = FALSE),
+                strand = rep(c("+", "-"), elementLengths(ans)),
                 space = names(x))
+ } 
 > params <- RDApplyParams(rd, reduceStranded, reducerFun = function(x) 
do.call(c, unname(x)))
 > rdapply(params)
RangedData: 4 ranges by 1 columns
columns(1): strand
sequences(2): chr1 chr2
 > as.data.frame(rdapply(params))
  space start end width strand
1  chr1     1   4     4      +
2  chr1     5   7     3      -
3  chr2     2   4     3      +
4  chr2     4   7     4      -
 > sessionInfo()
R version 2.9.2 Patched (2009-08-24 r49420)
i386-apple-darwin9.8.0

locale:
C/en_US.UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] IRanges_1.2.3




Robert Castelo wrote:
> dear list,
>
> i'd like to project a set of exons on genomic space but keeping
> the strand information. let's assume for simplicity that no exons
> overlap in opposite strands so no conflicts need to be resolved
> regarding that case. in principle this is easy without keeping the
> strand using a RangeList and the reduce method. however i've been
> struggling for a while without success to figure out how can i
> "reduce" my coordinates without loosing the strand information.
>
> my guess is that to carry out the strand information i need to
> use a RangedData object:
>
> sta <- c(1, 1, 1)
> end <- c(3, 4, 5)
> str <- c("+", "+", "-")
> rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr)
> rd
> RangedData: 3 ranges by 1 columns
> columns(1): strand
> sequences(2): chr1 chr2
>
> and then use rdapply to reduce on each of the chromosomes but i
> either don't know how to directly apply reduce:
>
> params <- RDApplyParams(rd, reduce)
> rdapply(params)
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "reduce", for 
> signature "RangedData"
>
>
> or i loose the strand information:
>
> params <- RDApplyParams(rd, function(rd) reduce(ranges(rd)))
> rdapply(params)
> $chr1
> RangesList: 1 range
> 1: 1:4
>
> $chr2
> RangesList: 1 range
> 1: 1:5
>
> thanks for your help!
> robert.
>
>
>
> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-apple-darwin8.11.1
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] rtracklayer_1.4.1                     RCurl_0.98-1
>  [3] bitops_1.0-4.1                        GOstats_2.10.0
>  [5] graph_1.22.2                          Category_2.10.1
>  [7] geneplotter_1.22.0                    lattice_0.17-25
>  [9] limma_2.18.2                          bgSpJncArrayDm1.db_1.0.1
> [11] org.Dm.eg.db_2.2.11                   RSQLite_0.7-1
> [13] DBI_0.2-4                             
> BSgenome.Dmelanogaster.UCSC.dm1_1.0.0
> [15] BSgenome_1.12.3                       RColorBrewer_1.0-2
> [17] Biostrings_2.12.8                     IRanges_1.2.3
> [19] annotate_1.22.0                       AnnotationDbi_1.6.1
> [21] Biobase_2.4.1                         xtable_1.5-5
>
> loaded via a namespace (and not attached):
> [1] genefilter_1.24.2 GO.db_2.2.11      grid_2.9.1        GSEABase_1.6.1
> [5] RBGL_1.20.0       splines_2.9.1     survival_2.35-4   tools_2.9.1
> [9] XML_2.5-3
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list