[Bioc-sig-seq] Rle, RleViews
Simon Anders
anders at ebi.ac.uk
Wed Aug 19 14:11:52 CEST 2009
Hi Florence
I CC this mail to the bioc-sig-sequencing mailing list, as this is there
such stuff is discussed, and your problem might be of interest for a few
more people.
Florence Cavalli wrote:
> How are you? Do you enjoy being in Heidelberg so far? Hope so!
Thanks, I am slowly settling in. I still dont' have a flat and live in
the Boxberg guest house but that's fine for the moment.
> Can I ask you a question? It may be quite simple but I am a bit
> confuse with different classes
>
> I detect some peaks and would like to create a wig file to look at
> them in a browser.
>
>> cov_polII_wt
> A GenomeData instance
> chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet
>> cov_polII_wt[["X"]]
> 'integer' Rle instance of length 22422250 with 1066959 runs
> Lengths: 27 60 31 32 27 13 47 8 23 72 ...
> Values : 1 2 3 4 3 2 3 2 3 2 ...
>> polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8)
>> polII_wt_peaks_X
> Views on a 22422250-length Rle subject
>
> views:
> start end width
> [1] 2679 2927 249 [ 8 8 8 9 9 9 9 9 9 9 9 9 9 ...]
> [2] 4300 4381 82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 ...]
> [3] 5581 5837 257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 ...]
> [4] 5885 6033 149 [ 9 9 9 9 9 10 10 10 10 10 10 10 10 ...]
> [5] 6878 7020 143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 ...]
> [6] 7025 7070 46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8 ...]
> [7] 7555 7704 150 [ 8 8 8 8 8 8 8 8 9 9 9 9 9 ...]
> [8] 7795 8022 228 [ 8 8 8 8 8 8 8 8 9 9 10 10 10 ...]
> [9] 8332 8338 7 [8 8 8 8 8 8 8]
> ... ... ... ... ...
> [18252] 22414407 22414570 164 [ 8 8 8 8 9 9 9 9 9 9 9 9 9 ...]
> [18253] 22414629 22414639 11 [8 8 8 8 8 8 8 8 8 8 8]
> [18254] 22414655 22414657 3 [8 8 8]
> [18255] 22414678 22414778 101 [ 8 8 8 8 8 8 9 9 9 9 9 10 10 ...]
> [18256] 22414792 22415612 821 [ 8 8 8 8 8 8 8 8 9 9 9 9 9 ...]
> [18257] 22415635 22415650 16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8]
> [18258] 22415728 22415737 10 [9 9 8 8 8 8 8 8 8 8]
> [18259] 22415749 22415784 36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 ...]
> [18260] 22415830 22415871 42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 ...]
>
>
>>From your workshop I did before:
> cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData")
> names(cov_polII_wt_X_rd)="X"
> export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig")
>
> How could I do the same for the peaks I have in polII_wt_peaks_X (or
> the peaks on all chromosomes) ?
> I don't know how to convert a RleViews to a Rle.
An RleViews object is a list of "views" into an Rle object. "View" means
that it is still the same dat but you just look a a small portion of the
whole vector.
To visualize the peaks, I see three possibilities:
(a) You assemble the RleViews object with all the peaks into one big Rle
object. This is essentially the same as taking the original Rle object
and setting all those places to 0 that are below the threshold.
My first attempt would have been to write
rleobj <- cov_polII_wt[["X"]]
rleobj[ rleobj < 8 ] <- 0
but this does not work as the Rle class does not implement `[<-`.
In this specific case, the boundaries of the runs will not change, so we
can use the `value<-` method which is provided:
rleobj <- cov_polII_wt[["X"]]
values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) )
(b) Maybe setting the background to zero is not the best option anyway.
It might be nicer to have a BED file that indicates the peaks alongside
the wiggle file with the coverage values and display both in adjacent
tracks in the genome browser. A BED file is essentially a tab-seperated
values file with at least the columns chromosome, start and end. You
could make a data frame out of the 'start' and 'end' values from the Rle
object and write it out with write.table. You might want to add the peak
area as another column. (See here for the correct order of the columns:
http://genome.ucsc.edu/FAQ/FAQformat )
I am not sure on how to use rtracklayer's 'export' function for this.
Michael, do you have any comment?
(c) Finally, for your problem, there is a very easy solution. If you use
the Integrated Genome Browser (IGB) to view yor data, you will see that
it has a thresholding function. If you set it to a value, it marks all
stretches of a track where the value exceeds the threshold.
> The other thing is: if I use the R-2.10.1 version it does not
> recognise the GenomeData object:
>> cov_polII_wt
> A GenomeData instanceError in .local(x, ...) :
> no slot of name "metadata" for this object of class "GenomeData"
>
> Do you know why?
> I would like to use the library chipseq, so maybe I should recreate
> the cov_polII_wt under R-2.10.0????
>
There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you
can have problems with objects created with R-2.9 and saved from there
if you try to load them in R-2.10. Strictly speaking, the version of R
does not matter but if you use R-2.9, biocLite takes the packages from
Bioconductor 2.4 and, in R-2.10, it takes them from Bioc 2.5, which is
still under development and keeps changing, so that you have to expect
that objects become incompatible.
If you want to use the chipseq package you should do everything with R-2.10.
>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.4.1 RCurl_0.98-1 bitops_1.0-4.1 ShortRead_1.2.1
> [5] lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.7 IRanges_1.2.3
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1 tools_2.9.1 XML_2.5-3
Cheers
Simon
+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: sanders at fs.tum.de
More information about the Bioc-sig-sequencing
mailing list