[BioC] interoperability between IRanges/ShortRead and rtracklayer
Patrick Aboyoun
paboyoun at fhcrc.org
Fri Jun 5 01:30:25 CEST 2009
Simon,
Below is code that achieves what I think you are after. I'll have to ask
Michael if there is a cleaner way, and if there isn't we should create one.
Also I know the classes can be a bit bewildering and here is some
information that may help you out. Both a RangedData object and a
GenomeData object represent a list across spaces that (may or may not in
the case of RangedData) represent chromosomes. A RangedData object is
more rigid and requires you to have ranges within each of the spaces
with possibly an associated rectangular data set. A GenomeData object is
essentially a list where the elements represents unstructured
information on the chromosomes (i.e. the elements can contain whatever
you want them too). They relate to Rle objects in the following manner:
an Rle object can produce a RangedData object of length 1 and a
GenomeData object of length k containing Rle objects as elements can
produce a RangedData object of length k.
I hoped that helped.
Patrick
#####################################################
suppressMessages(library(IRanges))
suppressMessages(library(rtracklayer))
suppressMessages(library(BSgenome))
myGenomeData <- GenomeData(list(chr10 = Rle(1:1000, 1:1000), chr11 =
Rle(1:100, 100:1)))
myRangedData <- do.call(c, lapply(as.list(myGenomeData), as, "RangedData"))
export(myRangedData, "mytrack.wig", format = "wig")
> myRangedData
RangedData: 1100 ranges by 1 columns
columns(1): score
sequences(2): chr10 chr11
> sessionInfo()
R version 2.9.0 (2009-04-17)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome_1.12.2 Biostrings_2.12.5 rtracklayer_1.4.0 RCurl_0.97-3
[5] bitops_1.0-4.1 IRanges_1.2.2
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 XML_2.5-0
Simon Anders wrote:
> Dear Patrick, Hervé, Michael or Martin
>
> I'm unclear on how to get from an IRanges Rle object to a RangedData
> object to use with rtracklayer's export function.
>
> The use case is as follows:
>
> Let's say I caclulate a coverage vector from some ChIP-Seq data, e.g.:
>
> library(IRanges)
> library(ShortRead)
> library(rtracklayer)
> library(BSgenome.Mmusculus.UCSC.mm9)
>
> # Load the example data from the ChIP-Seq tutorial
> load("ctcf.rda")
>
> # Calculate the coverage
> cvg = coverage( ctcf, width = seqlengths( Mmusculus ) )
>
> Then, I get a GenomeData object that contains Rle objects:
>
> > cvg
> A GenomeData instance
> chromosomes(3): chr10 chr11 chr12
> > cvg$chr10
> 'integer' Rle instance of length 129987043 with 308238 runs
> Lengths: 24 968 5 3 11 5 3 2 2 2 ...
> Values : 1 0 1 2 3 4 3 4 3 4 ...
>
> Given that coverage data is a very typical case of something that one
> would like to display as a track in a genome browser, it would be nice
> if the rtracklayer object could deal with it. For example I might want
> to export the 'cvg' object as a wiggle file with rtracklayer's
> 'export' function or add it to a browser session with the 'track<-'
> function.
>
> However, 'export' cannot deal with it
>
> > export( cvg, "wig" )
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "export.ucsc", for
> signature "GenomeData"
> > export( cvg$chr10, "wig" )
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "export.ucsc", for
> signature "Rle"
>
> It seems that it expects a RangedData object, or, alternatively, a
> RangedDataList.
>
> It seems to me that there should be a one-to-one correspondance from
> Rle to RangedData and from GenomeData to RangedDataList. How can I
> convert from one to the other? And why are there both kinds of classes?
>
> Thanks in advance.
>
> Simon
>
>
> +---
> | Dr. Simon Anders, Dipl. Phys.
> | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
> | office phone +44-1223-492680, mobile phone +44-7505-841692
> | preferred (permanent) e-mail: sanders at fs.tum.de
More information about the Bioconductor
mailing list