[Bioc-sig-seq] Rle, RleViews

Thu Aug 20 00:40:12 CEST 2009

Florence,
I just made three changes to the BioC 2.5 sequence infrastructure that should make this process much more straight-forward.

Although you didn't state this in your e-mail, I am guessing that your cov_polII_wt object was created by using ShortRead's coverage method on an AlignedRead object (also created by ShortRead). The GenomeData class was created at a time when most of the sequence infrastructure was in flux and like scaffolding used in bricks and mortar work should probably be dismantled.

The first change I made was to have the coverage method for AlignedRead to return a SimpleRleList object. This object class is more functional than GenomeData with methods for functions such as slice. The second change I made was to make rtracklayer's export.* functions coerce object to class RangedData when there is no other available method for that object type. The third change I made was add another coercion RleList to RangedData coercion method to support this workflow.

Here is what the new workflow looks like:

> # load packages
> suppressMessages(library(ShortRead))
> suppressMessages(library(rtracklayer))

> # read alignments
> dirPath <- system.file('extdata', 'maq', package='ShortRead')
> aln <- readAligned(dirPath, 'out.aln.1.txt', type="MAQMapview")

> # create coverage across chromosomes/contigs
> cov_polII_wt <- coverage(aln)
> names(cov_polII_wt) <- tolower(names(cov_polII_wt))
> cov_polII_wt
SimpleRleList: 1 element
names(1): chra

> # find peaks
> peaks_polII_wt <- slice(cov_polII_wt,8)
> peaks_polII_wt
SimpleRleViewsList: 1 element
names(1): chra

> # export to UCSC file formats
> export(cov_polII_wt,"cov_polII_wt.wig")
> export(peaks_polII_wt,"peaks_polII_wt.bed")

> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-05 r49073) 
i386-apple-darwin9.7.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rtracklayer_1.5.12 RCurl_0.98-1       bitops_1.0-4.1     ShortRead_1.3.26  
[5] lattice_0.17-25    BSgenome_1.13.10   Biostrings_2.13.32 IRanges_1.3.57    

loaded via a namespace (and not attached):
[1] Biobase_2.5.5 grid_2.10.0   hwriter_1.1   XML_2.6-0    

Michael Lawrence wrote:
> On Wed, Aug 19, 2009 at 5:11 AM, Simon Anders <anders at ebi.ac.uk> wrote:
>
>   
>> Hi Florence
>>
>> I CC this mail to the bioc-sig-sequencing mailing list, as this is there
>> such stuff is discussed, and your problem might be of interest for a few
>>  more people.
>>
>> Florence Cavalli wrote:
>>     
>>> How are you? Do you enjoy being in Heidelberg so far? Hope so!
>>>       
>> Thanks, I am slowly settling in. I still dont' have a flat and live in the
>> Boxberg guest house but that's fine for the moment.
>>
>>  Can I ask you a question? It may be quite simple but I am a bit
>>     
>>> confuse with different classes
>>>
>>> I detect some peaks and would like to create a wig file to look at
>>> them in a browser.
>>>
>>>  cov_polII_wt
>>>       
>>> A GenomeData instance
>>> chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet
>>>
>>>       
>>>> cov_polII_wt[["X"]]
>>>>
>>>>         
>>>  'integer' Rle instance of length 22422250 with 1066959 runs
>>>  Lengths:  27 60 31 32 27 13 47 8 23 72 ...
>>>  Values :  1 2 3 4 3 2 3 2 3 2 ...
>>>
>>>       
>>>> polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8)
>>>> polII_wt_peaks_X
>>>>
>>>>         
>>>  Views on a 22422250-length Rle subject
>>>
>>> views:
>>>           start      end width
>>>    [1]     2679     2927   249 [ 8  8  8  9  9  9  9  9  9  9  9  9  9
>>> ...]
>>>    [2]     4300     4381    82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
>>> ...]
>>>    [3]     5581     5837   257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9
>>> ...]
>>>    [4]     5885     6033   149 [ 9  9  9  9  9 10 10 10 10 10 10 10 10
>>> ...]
>>>    [5]     6878     7020   143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9
>>> ...]
>>>    [6]     7025     7070    46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8
>>> ...]
>>>    [7]     7555     7704   150 [ 8  8  8  8  8  8  8  8  9  9  9  9  9
>>> ...]
>>>    [8]     7795     8022   228 [ 8  8  8  8  8  8  8  8  9  9 10 10 10
>>> ...]
>>>    [9]     8332     8338     7 [8 8 8 8 8 8 8]
>>>    ...      ...      ...   ... ...
>>> [18252] 22414407 22414570   164 [ 8  8  8  8  9  9  9  9  9  9  9  9  9
>>> ...]
>>> [18253] 22414629 22414639    11 [8 8 8 8 8 8 8 8 8 8 8]
>>> [18254] 22414655 22414657     3 [8 8 8]
>>> [18255] 22414678 22414778   101 [ 8  8  8  8  8  8  9  9  9  9  9 10 10
>>> ...]
>>> [18256] 22414792 22415612   821 [ 8  8  8  8  8  8  8  8  9  9  9  9  9
>>> ...]
>>> [18257] 22415635 22415650    16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8]
>>> [18258] 22415728 22415737    10 [9 9 8 8 8 8 8 8 8 8]
>>> [18259] 22415749 22415784    36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
>>> ...]
>>> [18260] 22415830 22415871    42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9
>>> ...]
>>>
>>>
>>>  From your workshop I did before:
>>>       
>>> cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData")
>>> names(cov_polII_wt_X_rd)="X"
>>> export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig")
>>>
>>> How could I do the same for the peaks I have in polII_wt_peaks_X (or
>>> the peaks on all chromosomes) ?
>>> I don't know how to convert a RleViews to a Rle.
>>>
>>>       
>> An RleViews object is a list of "views" into an Rle object. "View" means
>> that it is still the same dat but you just look a a small portion of the
>> whole vector.
>>
>> To visualize the peaks, I see three possibilities:
>>
>> (a) You assemble the RleViews object with all the peaks into one big Rle
>> object. This is essentially the same as taking the original Rle object and
>> setting all those places to 0 that are below the threshold.
>>
>> My first attempt would have been to write
>>
>>   rleobj <- cov_polII_wt[["X"]]
>>   rleobj[ rleobj < 8 ] <- 0
>>
>> but this does not work as the Rle class does not implement `[<-`.
>>
>> In this specific case, the boundaries of the runs will not change, so we
>> can use the `value<-` method which is provided:
>>
>>   rleobj <- cov_polII_wt[["X"]]
>>   values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) )
>>
>>
>> (b) Maybe setting the background to zero is not the best option anyway. It
>> might be nicer to have a BED file that indicates the peaks alongside the
>> wiggle file with the coverage values and display both in adjacent tracks in
>> the genome browser. A BED file is essentially a tab-seperated values file
>> with at least the columns chromosome, start and end. You could make a data
>> frame out of the 'start' and 'end' values from the Rle object and write it
>> out with write.table. You might want to add the peak area as another column.
>> (See here for the correct order of the columns:
>> http://genome.ucsc.edu/FAQ/FAQformat )
>>
>> I am not sure on how to use rtracklayer's 'export' function for this.
>> Michael, do you have any comment?
>>
>>     
>
> The RleViews object is a Ranges object, so if you want a BED file of those
> ranges, it's:
> gd <- GenomicData(polII_wt_peaks_X)
> export(gd, "peaks.bed")
>
> Or if you want the peak max to be the score (translated to gray scale at
> UCSC):
> gd <- GenomicData(polII_wt_peaks_X, score = viewMaxs(poIII_wt_peaks_X))
> export(gd, "peaks.bed")
>
> Note that you can do this for all chromosomes fairly easily:
>
> polII_wt_peaks <- gdapply(cov_polII_wt, slice) # another GenomeData
> export(as(polll_wt_peaks, "RangedData"), "peaks.bed")
>
> I thought there might be a slice method for "GenomeData" that returned an
> RleViewsList, but I could not find one. The same export line would work in
> that case.
>
>
>   
>> (c) Finally, for your problem, there is a very easy solution. If you use
>> the Integrated Genome Browser (IGB) to view yor data, you will see that it
>> has a thresholding function. If you set it to a value, it marks all
>> stretches of a track where the value exceeds the threshold.
>>
>>  The other thing is: if I use the R-2.10.1 version it does not
>>     
>>> recognise the GenomeData object:
>>>
>>>       
>>>> cov_polII_wt
>>>>
>>>>         
>>> A GenomeData instanceError in .local(x, ...) :
>>>  no slot of name "metadata" for this object of class "GenomeData"
>>>
>>> Do you know why?
>>> I would like to use the library chipseq, so maybe I should recreate
>>> the cov_polII_wt under R-2.10.0????
>>>
>>>
>>>       
>> There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you
>> can have problems with objects created with R-2.9 and saved from there if
>> you try to load them in R-2.10. Strictly speaking, the version of R does not
>> matter but if you use R-2.9, biocLite takes the packages from Bioconductor
>> 2.4 and, in R-2.10, it takes them from Bioc 2.5, which is still under
>> development and keeps changing, so that you have to expect that objects
>> become incompatible.
>>
>> If you want to use the chipseq package you should do everything with
>> R-2.10.
>>
>>  sessionInfo()
>>     
>>> R version 2.9.1 (2009-06-26)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>>
>>> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] rtracklayer_1.4.1 RCurl_0.98-1      bitops_1.0-4.1    ShortRead_1.2.1
>>> [5] lattice_0.17-25   BSgenome_1.12.3   Biostrings_2.12.7 IRanges_1.2.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.4.1 grid_2.9.1    hwriter_1.1   tools_2.9.1   XML_2.5-3
>>>
>>>       
>> Cheers
>>  Simon
>>
>>
>> +---
>> | Dr. Simon Anders, Dipl.-Phys.
>> | European Molecular Biology Laboratory (EMBL), Heidelberg
>> | office phone +49-6221-387-8632
>> | preferred (permanent) e-mail: sanders at fs.tum.de
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>     
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>