[Bioc-sig-seq] as.data.frame on GRanges object with DNAStringSet in values

Hervé Pagès hpages at fhcrc.org
Thu Jun 16 00:05:46 CEST 2011


Hi Michael, Janet,

I just added an "as.vector" method for XStringSet objects to
Biostrings 2.21.6:

   > library(Biostrings)
   > x <- DNAStringSet(c("aaatg", "gt"))
   > as.vector(x)
   [1] "AAATG" "GT"

But that doesn't solve Janet's problem:

   > df <- DataFrame(id=c("ID1", "ID2"), seqs=x)
   > df
   DataFrame with 2 rows and 2 columns
              id           seqs
     <character> <DNAStringSet>
   1         ID1          AAATG
   2         ID2             GT
   > as.data.frame(df)
   Error in as.data.frame.default(y, optional = TRUE, ...) :
     cannot coerce class 'structure("DNAStringSet", package = 
"Biostrings")' into a data.frame

Michael?

Thanks,
H.


 > sessionInfo()
R version 2.14.0 Under development (unstable) (2011-05-30 r56024)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.21.6 IRanges_1.11.10


On 11-06-15 12:49 PM, Janet Young wrote:
> yes - as.character seems a good choice, I think
>
> thanks,
>
> Janet
>
> On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote:
>
>> So you would expect that the DNAStringSet is converted to a character vector? DNAStringSet (technically XStringSet) then just needs an as.vector method that delegates to as.character.
>>
>> Michael
>>
>>
>> On Wed, Jun 15, 2011 at 12:37 PM, Janet Young<jayoung at fhcrc.org>  wrote:
>> Hi there,
>>
>> I'm trying to as as.data.frame on a GRanges object. On regular GRanges objects it works fine but I have some objects that contain a DNAStringSet in the values column, which isn't built in to the as.data.frame method.  Is it possible to add the ability to coerce the DNAStringSet too, please?
>>
>> Here's some code that demonstrates the issue:
>>
>> ################
>> library(GenomicRanges)
>> library(Biostrings)
>>
>> gr1<- GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c(1,101,201),width=50),strand=c("+","-","+"), genenames=c("seq1","seq2","seq3") )
>>
>> as.data.frame(gr1)
>> # works
>>
>> gr2<- gr1
>> values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG", "ACGGTGGTGTT", "GAGGCTG"))
>>
>> as.data.frame(gr2)
>> # Error in as.data.frame.default(y, optional = TRUE, ...) :
>> #   cannot coerce class 'structure("DNAStringSet", package = "Biostrings")' into a data.frame
>> ################
>>
>> and here's   sessionInfo() output:
>>
>> R version 2.13.0 (2011-04-13)
>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] Biostrings_2.20.1   GenomicRanges_1.4.6 IRanges_1.10.4
>>
>> ################
>>
>>
>> You might wonder why I'm storing sequences in the GRanges values - in my real data they're sequencing reads that have mapped back to that region, but I'm still curious to maintain the sequence itself (for the moment) because it's not always identical to the underlying genomic sequence of that region (investigating mapping issues).
>>
>> (and my desire to use as.data.frame relates to a suggestion from Herve to let me workaround some issues with the identical function)
>>
>> thanks,
>>
>> Janet
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list