[Bioc-sig-seq] 'agregate' error message

Patrick Aboyoun paboyoun at fhcrc.org
Mon Jan 4 19:27:17 CET 2010


P.,
The error message from aggregate isn't very informative and I'll clean 
it up.

The aggregate function threw an error for the cov.y object because the 
ranges in allPeaks referenced indices outside of the bounds of cov.y, in 
particular cov.y is an Rle of length 11 and allPeaks included the 
interval [17, 19]. If you know the length of underlying sequence, you 
can pass that into the width argument to the coverage function. For 
example, if the underlying sequence is of length 19, then the coverage 
from the y ranges would be calculated as shown below. (I also added code 
for more efficient summation withing the specified ranges.)

 > cov.y <- coverage(y, width = 19)
 > cov.y
'integer' Rle of length 19 with 5 runs
  Lengths:  3 2 4 2 8
  Values :  0 3 0 3 0
 > y.counts <- aggregate(cov.y, allPeaks, sum)
 > y.counts
[1] 6 0
 > y.counts.efficient <- viewSums(Views(cov.y, allPeaks))
 > y.counts.efficient
[1] 6 0
 > sessionInfo()
R version 2.10.1 Patched (2009-12-14 r50738)
i386-apple-darwin9.8.0

locale:
[1] C/en_US.UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] IRanges_1.4.9

loaded via a namespace (and not attached):
[1] tools_2.10.1


Cheers,
Patrick


pterry at huskers.unl.edu wrote:
> Dear bioc-sig-sequencing,
>
> I am working with a toy example to learn the material covered in part 3 (Differential expression, pages 10-11) of 'A ChIP-Seq Data Analysis' handout for a 11/19/09 session at the 'High throughput sequence analysis tools and approaches with Bioconductor' workshop in Seattle.
>
> I generated an error message in the following output.  Can you comment?
> (I note that when I use the sample data & code from the handout, ctcf.rda & gfp.rda, no errors are generated)
>
>   
>> x <- IRanges(start=c(1L, 9L, 4L, 1L, 5L, 10L, 15L, 17L, 17L),
>>     
> +                     width=c(5L, 6L, 3L, 4L, 3L, 3L, 5L, 3L, 3L))
>
>   
>> y <- IRanges(start=c(4L, 4L, 4L, 10L, 10L, 10L),
>>     
> +                     width=c(2L, 2L, 2L, 2L, 2L, 2L))
>
>   
>> cov.x <- coverage(x)
>> cov.y <- coverage(y)
>> allPeaks <- slice(cov.x, lower = 3)
>> allPeaks
>>     
> Views on a 19-length Rle subject
>
> views:
>     start end width
> [1]     4   5     2 [3 3]
> [2]    17  19     3 [3 3 3]
>   
>> x.counts <- aggregate(cov.x, allPeaks, sum)
>> x.counts
>>     
> [1] 6 9
>   
>> y.counts <- aggregate(cov.y, allPeaks, sum)
>>     
> Error in findIntervalAndStartFromWidth(start, runLength(x)) :
>   'x' must be less than 'sum(width)'
>
>   
>> sessionInfo()
>>     
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] ChIPseqTutorial_0.0.1              BSgenome.Mmusculus.UCSC.mm9_1.3.16
> [3] chipseq_0.2.0                      ShortRead_1.4.0
> [5] lattice_0.17-26                    BSgenome_1.14.0
> [7] Biostrings_2.14.1                  IRanges_1.4.2
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.0 grid_2.10.1   hwriter_1.1
>   
>
> Thanks,
> P. Terry
> huskers.unl.edu
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list