[Bioc-sig-seq] Rle, RleViews

Simon Anders anders at ebi.ac.uk
Wed Aug 19 14:11:52 CEST 2009


Hi Florence

I CC this mail to the bioc-sig-sequencing mailing list, as this is there 
such stuff is discussed, and your problem might be of interest for a few 
  more people.

Florence Cavalli wrote:
 > How are you? Do you enjoy being in Heidelberg so far? Hope so!

Thanks, I am slowly settling in. I still dont' have a flat and live in 
the Boxberg guest house but that's fine for the moment.

> Can I ask you a question? It may be quite simple but I am a bit
> confuse with different classes
> 
> I detect some peaks and would like to create a wig file to look at
> them in a browser.
> 
>> cov_polII_wt
> A GenomeData instance
> chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet
>> cov_polII_wt[["X"]]
>   'integer' Rle instance of length 22422250 with 1066959 runs
>   Lengths:  27 60 31 32 27 13 47 8 23 72 ...
>   Values :  1 2 3 4 3 2 3 2 3 2 ...
>> polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8)
>> polII_wt_peaks_X
>   Views on a 22422250-length Rle subject
> 
> views:
>            start      end width
>     [1]     2679     2927   249 [ 8  8  8  9  9  9  9  9  9  9  9  9  9 ...]
>     [2]     4300     4381    82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 ...]
>     [3]     5581     5837   257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 ...]
>     [4]     5885     6033   149 [ 9  9  9  9  9 10 10 10 10 10 10 10 10 ...]
>     [5]     6878     7020   143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 ...]
>     [6]     7025     7070    46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8 ...]
>     [7]     7555     7704   150 [ 8  8  8  8  8  8  8  8  9  9  9  9  9 ...]
>     [8]     7795     8022   228 [ 8  8  8  8  8  8  8  8  9  9 10 10 10 ...]
>     [9]     8332     8338     7 [8 8 8 8 8 8 8]
>     ...      ...      ...   ... ...
> [18252] 22414407 22414570   164 [ 8  8  8  8  9  9  9  9  9  9  9  9  9 ...]
> [18253] 22414629 22414639    11 [8 8 8 8 8 8 8 8 8 8 8]
> [18254] 22414655 22414657     3 [8 8 8]
> [18255] 22414678 22414778   101 [ 8  8  8  8  8  8  9  9  9  9  9 10 10 ...]
> [18256] 22414792 22415612   821 [ 8  8  8  8  8  8  8  8  9  9  9  9  9 ...]
> [18257] 22415635 22415650    16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8]
> [18258] 22415728 22415737    10 [9 9 8 8 8 8 8 8 8 8]
> [18259] 22415749 22415784    36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 ...]
> [18260] 22415830 22415871    42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 ...]
> 
> 
>>From your workshop I did before:
> cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData")
> names(cov_polII_wt_X_rd)="X"
> export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig")
> 
> How could I do the same for the peaks I have in polII_wt_peaks_X (or
> the peaks on all chromosomes) ?
> I don't know how to convert a RleViews to a Rle.

An RleViews object is a list of "views" into an Rle object. "View" means 
that it is still the same dat but you just look a a small portion of the 
whole vector.

To visualize the peaks, I see three possibilities:

(a) You assemble the RleViews object with all the peaks into one big Rle 
object. This is essentially the same as taking the original Rle object 
and setting all those places to 0 that are below the threshold.

My first attempt would have been to write

    rleobj <- cov_polII_wt[["X"]]
    rleobj[ rleobj < 8 ] <- 0

but this does not work as the Rle class does not implement `[<-`.

In this specific case, the boundaries of the runs will not change, so we 
can use the `value<-` method which is provided:

    rleobj <- cov_polII_wt[["X"]]
    values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) )


(b) Maybe setting the background to zero is not the best option anyway. 
It might be nicer to have a BED file that indicates the peaks alongside 
the wiggle file with the coverage values and display both in adjacent 
tracks in the genome browser. A BED file is essentially a tab-seperated 
values file with at least the columns chromosome, start and end. You 
could make a data frame out of the 'start' and 'end' values from the Rle 
object and write it out with write.table. You might want to add the peak 
area as another column. (See here for the correct order of the columns:
http://genome.ucsc.edu/FAQ/FAQformat )

I am not sure on how to use rtracklayer's 'export' function for this. 
Michael, do you have any comment?


(c) Finally, for your problem, there is a very easy solution. If you use 
the Integrated Genome Browser (IGB) to view yor data, you will see that 
it has a thresholding function. If you set it to a value, it marks all 
stretches of a track where the value exceeds the threshold.

> The other thing is: if I use the R-2.10.1 version it does not
> recognise the GenomeData object:
>> cov_polII_wt
> A GenomeData instanceError in .local(x, ...) :
>   no slot of name "metadata" for this object of class "GenomeData"
> 
> Do you know why?
> I would like to use the library chipseq, so maybe I should recreate
> the cov_polII_wt under R-2.10.0????
> 

There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you 
can have problems with objects created with R-2.9 and saved from there 
if you try to load them in R-2.10. Strictly speaking, the version of R 
does not matter but if you use R-2.9, biocLite takes the packages from 
Bioconductor 2.4 and, in R-2.10, it takes them from Bioc 2.5, which is 
still under development and keeps changing, so that you have to expect 
that objects become incompatible.

If you want to use the chipseq package you should do everything with R-2.10.

>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] rtracklayer_1.4.1 RCurl_0.98-1      bitops_1.0-4.1    ShortRead_1.2.1
> [5] lattice_0.17-25   BSgenome_1.12.3   Biostrings_2.12.7 IRanges_1.2.3
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.4.1 grid_2.9.1    hwriter_1.1   tools_2.9.1   XML_2.5-3

Cheers
   Simon


+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: sanders at fs.tum.de



More information about the Bioc-sig-sequencing mailing list