[BioC] HTqPCR setCategories problem

Mon May 10 23:32:43 CEST 2010

On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote:

> Hello Mike,
>
> On 7 May 2010, at 17:11, Michael Muratet wrote:
>
>> Greetings
>>
>> I have encountered a small problem when using the setCategories  
>> method:
>>
>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9,  
>> groups=sampleNames(s2010004660))
>> Categories after Ct.max and Ct.min filtering:
>>             wt_4wks mt_4wks wt_17wks mt_17wks
>> OK               582     571      566      578
>> Undetermined     186     197      202      190
>> Categories after standard deviation filtering:
>>             wt_4wks mt_4wks wt_17wks mt_17wks
>> OK               581     569      565      577
>> Undetermined     186     197      202      190
>> There were 50 or more warnings (use warnings() to see the first 50)
>> > warnings()
>> Warning messages:
>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>  invalid factor level, NAs generated
>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>  invalid factor level, NAs generated
>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>  invalid factor level, NAs generated
>>
> Hm, I can't reproduce this with the standard data sets in HTqPCR,  
> although this also only contains "OK" and "Undetermined" initially:
>
> > data(qPCRraw)
> > apply(featureCategory(qPCRraw), 2, table)
>             sample1 sample2 sample3 sample4 sample5 sample6
> OK               353     312     360     339     335     336
> Undetermined      31      72      24      45      49      48
> > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3))
> Categories after Ct.max and Ct.min filtering:
>             sample1 sample2 sample3 sample4 sample5 sample6
> OK               313     264     327     295     296     286
> Undetermined      68     119      56      86      86      96
> Unreliable         3       1       1       3       2       2
> Categories after standard deviation filtering:
>             sample1 sample2 sample3 sample4 sample5 sample6
> OK               309     258     323     291     292     281
> Undetermined      68     119      56      86      86      96
> Unreliable         7       7       5       7       6       7
>
> How does featureCategory() of your new object look after you run  
> setCategory? Also, it seems like all your Ct values fall within the  
> default range given by Ct.max and Ct.min in setCategory(), hence  
> none of the categories are adjusted during the "first round", before  
> the standard deviation filtering. What happens if you set one of  
> Ct.max or Ct.min so that some values are called as "Unreliable"  
> based on this, e.g. by saying Ct.max=25? Do you still get the same  
> warning?

Heidi

I have traced the problem to this line in setCategory by setting the  
options so that warnings are errors:

featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable"

I have never seen dual indices like this on a data frame before, can  
you tell me what this does?

If I force the data to be called "Unreliable" by setting Ct.min I can  
get the same failure here:

featureCategory(out)[data < Ct.min] <- "Unreliable"

If you reference a single dimension from a data frame like this, don't  
you set/get just a column? I thought you had to use something like  
sapply in a case like this.

I note that any of the 'primitive' data sets, i.e., the ones that I  
read with readCtData can be used in setCategory without a problem.  
Once I rbind the two data types together, I get the problem. I have  
looked at the rbind.qPCRset code and I don't see anything that would  
cause this to be the case but I suspect my problem is somehow tied up  
some artifact of data assembly.

Thanks

Mike

>
> Cheers
> \Heidi
>
>
>> other 47 warnings are the same
>>
>> The categories that you get in the SDS data are "OK" and  
>> "Undetermined" and it seems to be unwilling to add the new level  
>> "Unreliable".  I tried to manually add the levels:
>>
>> featureCategory(s2010004660)$wt_4wks <-  
>> factor(featureCategory(s2010004660)$wt_4wks,  
>> levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable"))
>> featureCategory(s2010004660)$mt_4wks <-  
>> factor(featureCategory(s2010004660)$mt_4wks,  
>> levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable"))
>> featureCategory(s2010004660)$wt_17wks <-  
>> factor(featureCategory(s2010004660)$wt_17wks,  
>> levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable"))
>> featureCategory(s2010004660)$mt_17wks <-  
>> factor(featureCategory(s2010004660)$mt_17wks,  
>> levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable"))
>>
>> and get another error
>>
>> Error in count[names(tab), i] <- tab : subscript out of bounds
>>
>> Is this a bug or operator error?
>>
>> Thanks
>>
>> Mike
>>
>> > sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] HTqPCR_1.2.0       limma_3.4.0        RColorBrewer_1.0-2  
>> Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.26.0           affyio_1.16.0          
>> gdata_2.7.1           gplots_2.7.4          gtools_2.6.1           
>> preprocessCore_1.10.0
>> [7] tools_2.11.0
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806