[Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Patrick Aboyoun paboyoun at fhcrc.org
Thu Apr 1 21:01:05 CEST 2010


I just checked in a patch to the GenomicRanges package in which the 
GRanges constructor will now convert NA values in strand to the 
both/either strand indicator "*" and issue a warning to the end-user 
that informs them of the change. The updated GenomicRanges package 
should be available from bioconductor.org with the next 36 hours. Here 
is an example:


 > RangedData(IRanges(1,2))
RangedData with 1 row and 0 value columns across 1 space
         space    ranges |
<character> <IRanges> |
1           1    [1, 2] |

 > as(RangedData(IRanges(1,2)), "GRanges")
GRanges with 1 range and 0 elementMetadata values
     seqnames    ranges strand |
<Rle> <IRanges> <Rle> |
[1]        1    [1, 2]      * |

seqlengths
  1
NA
Warning message:
In GRanges(seqnames = space(from), ranges = ranges, strand = 
Rle(strand(from)),  :
   missing values in strand converted to "*"

 > sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-22 r51355)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_0.1.3 IRanges_1.5.74




On 4/1/10 8:04 AM, Michael Lawrence wrote:
> Thinking about this some more, it's somewhat analogous to the coercion to
> factor in R, i.e. as.factor(c("male", "female")) returns something
> reasonable, despite missing level information.
>
> as.factor("male") would probably not be what I wanted, but we live with it,
> since the alternative (requiring the levels argument) would probably be
> worse.
>
> On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence<michafla at gene.com>  wrote:
>
>    
>>
>> On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan<mtmorgan at fhcrc.org>  wrote:
>>
>>      
>>> On 04/01/2010 07:12 AM, Michael Lawrence wrote:
>>>        
>>>> On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan<mtmorgan at fhcrc.org>
>>>>          
>>> wrote:
>>>        
>>>>          
>>>>> On 03/31/2010 07:11 PM, pterry at huskers.unl.edu wrote:
>>>>>            
>>>>>>   Dear bioc-sig-sequencing,
>>>>>>
>>>>>> I would like to annotate chip-seq peaks for the arabidopsis genome.
>>>>>>              
>>>   In
>>>        
>>>>> trying to work thru the GenomicFeatures vignette dated 03/27/10, I need
>>>>>            
>>> to
>>>        
>>>>> convert my ChIPSeq peaks from a RangedData object to a GRanges object.
>>>>>            
>>>   In a
>>>        
>>>>> recent, but previous Bioconductor development version, the conversion
>>>>>            
>>> with
>>>        
>>>>> this particular RangedData object worked fine.
>>>>>            
>>>>>> In this more recent Bioconductor development version, I get the
>>>>>>              
>>> following
>>>        
>>>>> error message:
>>>>>            
>>>>>>              
>>>>>>> gr_ChSeqPks<- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")
>>>>>>>                
>>>>>> Error in validObject(.Object) :
>>>>>>    invalid class "GRanges" object: slot 'strand' contains missing
>>>>>>              
>>> values
>>>        
>>>>>>> rd0_chr1_s_8_trt_vs_INPctl
>>>>>>>                
>>>>>> RangedData with 57 rows and 2 value columns across 1 space
>>>>>>            space               ranges   |     ARAB8 ARAB7INPCTL
>>>>>>      <character>             <IRanges>    |<integer>    <integer>
>>>>>> 1          chr1   [ 617092,  617094]   |        24           0
>>>>>> 2          chr1   [1808262, 1808262]   |         8           0
>>>>>> 3          chr1   [3889445, 3889452]   |        64           0
>>>>>> 4          chr1   [4404410, 4404410]   |         8           0
>>>>>> 5          chr1   [7081127, 7081127]   |         8           0
>>>>>> 6          chr1   [7128574, 7128581]   |        64           0
>>>>>> 7          chr1   [7128592, 7128649]   |       464           0
>>>>>> 8          chr1   [7530777, 7530781]   |        40           0
>>>>>> 9          chr1   [7530784, 7530786]   |        24           0
>>>>>> ...         ...                  ... ...       ...         ...
>>>>>>              
>>>>> Hi,
>>>>>
>>>>>            
>>>>>> rd = RangedData(IRanges(1, 10))
>>>>>> as(rd, "GRanges")
>>>>>>              
>>>>> Error in validObject(.Object) :
>>>>>   invalid class "GRanges" object: slot 'strand' contains missing values
>>>>>            
>>>>>> rd[["strand"]] = "*"
>>>>>> as(rd, "GRanges")
>>>>>>              
>>>>> GRanges with 1 range and 0 elementMetadata values
>>>>>     seqnames    ranges strand |
>>>>>        <Rle>  <IRanges>   <Rle>  |
>>>>> [1]        1   [1, 10]      * |
>>>>>
>>>>> seqlengths
>>>>>   1
>>>>> NA
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>            
>>>> Shouldn't the coerce function just do this automatically?
>>>>          
>>> Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
>>> allows NA as well (hence the error) so coercing NA to * represents a
>>> decision on the part of the investigator that '*' (strand irrelevant) is
>>> synonymous with NA (no information about strand available). Part of the
>>> motivation for this current state of affairs is that the use case for
>>> both NA and * were unclear, but course corrections welcome.
>>>
>>>
>>>        
>> Ok. I guess one could think of the coercion of a RangedData missing a
>> 'strand' column to a GRanges as an equivalent statement, since GRanges
>> requires strand information. If that doesn't sound reasonable, a better
>> error message will help avoid questions like this in the future.
>>
>> Michael
>>
>>
>>
>>
>>      
>>> Martin
>>>        
>>>>          
>>>>>>              
>>>>>>> sessionInfo()
>>>>>>>                
>>>>>> R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
>>>>>> x86_64-unknown-linux-gnu
>>>>>>
>>>>>> locale:
>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] biomaRt_2.3.5         GenomicFeatures_0.5.0 GenomicRanges_0.1.0
>>>>>> [4] IRanges_1.5.73
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] Biobase_2.7.5      Biostrings_2.15.26 BSgenome_1.15.20   DBI_0.2-5
>>>>>> [5] RCurl_1.3-1        RSQLite_0.8-4      rtracklayer_1.7.11
>>>>>>              
>>> tools_2.12.0
>>>        
>>>>>> [9] XML_2.8-1
>>>>>>              
>>>>>>>                
>>>>>>
>>>>>> Thanks,
>>>>>> P. Terry
>>>>>> pterry at huskers.unl.edu
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-sig-sequencing mailing list
>>>>>> Bioc-sig-sequencing at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>>              
>>>>>
>>>>> --
>>>>> Martin Morgan
>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> Location: Arnold Building M1 B861
>>>>> Phone: (206) 667-2793
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-sig-sequencing mailing list
>>>>> Bioc-sig-sequencing at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>
>>>>>            
>>>>          
>>>
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>>        
>>
>>      
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list