[BioC] HTqPCR setCategories problem

Tue May 11 00:11:37 CEST 2010

Hi Mike,

you're absolutely right that the issue turned out to be with rbind of
qPCRset! The feature categories were intended to be stored as characters,
not factors, so that uses can easily add their own categories if they
want. However, in rbind I've omitted the parameter
"stringsAsFactors=FALSE" in a call to as.data.frame(), so I get:

> data(qPCRraw)
> class(featureCategory(qPCRraw))
[1] "data.frame"
> class(featureCategory(qPCRraw)[,1])
[1] "character"
> test <- rbind(qPCRraw[1:4,], qPCRraw[1:40,])
> class(featureCategory(test))
[1] "data.frame"
> class(featureCategory(test)[,1])
[1] "factor"

I'll sanitise the rbind (and probably also the cbind function), but I'm
afraid it'll take a while for these changes to get into BioC devel.

Unfortunately, in the meantime, you'll have to manually transfer your
featureCategory back to a data frame of characters instead of factors:

> featureCategory(test) <- data.frame(apply(featureCategory(test), 2,
as.character), stringsAsFactors=FALSE)
> class(featureCategory(test)[,1])
[1] "character"
> setCategory(test, groups=rep(c("A", "B"),3), Ct.min=20)
Categories after Ct.max and Ct.min filtering:
             sample1 sample2 sample3 sample4 sample5 sample6
OK                29      32      34      32      32      33
Undetermined       6       7       5       4       7       5
Unreliable         9       5       5       8       5       6
Categories after standard deviation filtering:
             sample1 sample2 sample3 sample4 sample5 sample6
OK                29      32      34      32      32      33
Undetermined       6       7       5       4       7       5
Unreliable         9       5       5       8       5       6

Yet another item on the ever-growing HTqPCR to-do list...
\Heidi

>
> On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote:
>
>> Hello Mike,
>>
>> On 7 May 2010, at 17:11, Michael Muratet wrote:
>>
>>> Greetings
>>>
>>> I have encountered a small problem when using the setCategories
>>> method:
>>>
>>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9,
>>> groups=sampleNames(s2010004660))
>>> Categories after Ct.max and Ct.min filtering:
>>>             wt_4wks mt_4wks wt_17wks mt_17wks
>>> OK               582     571      566      578
>>> Undetermined     186     197      202      190
>>> Categories after standard deviation filtering:
>>>             wt_4wks mt_4wks wt_17wks mt_17wks
>>> OK               581     569      565      577
>>> Undetermined     186     197      202      190
>>> There were 50 or more warnings (use warnings() to see the first 50)
>>> > warnings()
>>> Warning messages:
>>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>>  invalid factor level, NAs generated
>>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>>  invalid factor level, NAs generated
>>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>>  invalid factor level, NAs generated
>>>
>> Hm, I can't reproduce this with the standard data sets in HTqPCR,
>> although this also only contains "OK" and "Undetermined" initially:
>>
>> > data(qPCRraw)
>> > apply(featureCategory(qPCRraw), 2, table)
>>             sample1 sample2 sample3 sample4 sample5 sample6
>> OK               353     312     360     339     335     336
>> Undetermined      31      72      24      45      49      48
>> > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3))
>> Categories after Ct.max and Ct.min filtering:
>>             sample1 sample2 sample3 sample4 sample5 sample6
>> OK               313     264     327     295     296     286
>> Undetermined      68     119      56      86      86      96
>> Unreliable         3       1       1       3       2       2
>> Categories after standard deviation filtering:
>>             sample1 sample2 sample3 sample4 sample5 sample6
>> OK               309     258     323     291     292     281
>> Undetermined      68     119      56      86      86      96
>> Unreliable         7       7       5       7       6       7
>>
>> How does featureCategory() of your new object look after you run
>> setCategory? Also, it seems like all your Ct values fall within the
>> default range given by Ct.max and Ct.min in setCategory(), hence
>> none of the categories are adjusted during the "first round", before
>> the standard deviation filtering. What happens if you set one of
>> Ct.max or Ct.min so that some values are called as "Unreliable"
>> based on this, e.g. by saying Ct.max=25? Do you still get the same
>> warning?
>
> Heidi
>
> I have traced the problem to this line in setCategory by setting the
> options so that warnings are errors:
>
> featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable"
>
> I have never seen dual indices like this on a data frame before, can
> you tell me what this does?
>
> If I force the data to be called "Unreliable" by setting Ct.min I can
> get the same failure here:
>
> featureCategory(out)[data < Ct.min] <- "Unreliable"
>
> If you reference a single dimension from a data frame like this, don't
> you set/get just a column? I thought you had to use something like
> sapply in a case like this.
>
> I note that any of the 'primitive' data sets, i.e., the ones that I
> read with readCtData can be used in setCategory without a problem.
> Once I rbind the two data types together, I get the problem. I have
> looked at the rbind.qPCRset code and I don't see anything that would
> cause this to be the case but I suspect my problem is somehow tied up
> some artifact of data assembly.
>
> Thanks
>
> Mike
>
>
>
>
>
>>
>> Cheers
>> \Heidi
>>
>>
>>> other 47 warnings are the same
>>>
>>> The categories that you get in the SDS data are "OK" and
>>> "Undetermined" and it seems to be unwilling to add the new level
>>> "Unreliable".  I tried to manually add the levels:
>>>
>>> featureCategory(s2010004660)$wt_4wks <-
>>> factor(featureCategory(s2010004660)$wt_4wks,
>>> levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable"))
>>> featureCategory(s2010004660)$mt_4wks <-
>>> factor(featureCategory(s2010004660)$mt_4wks,
>>> levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable"))
>>> featureCategory(s2010004660)$wt_17wks <-
>>> factor(featureCategory(s2010004660)$wt_17wks,
>>> levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable"))
>>> featureCategory(s2010004660)$mt_17wks <-
>>> factor(featureCategory(s2010004660)$mt_17wks,
>>> levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable"))
>>>
>>> and get another error
>>>
>>> Error in count[names(tab), i] <- tab : subscript out of bounds
>>>
>>> Is this a bug or operator error?
>>>
>>> Thanks
>>>
>>> Mike
>>>
>>> > sessionInfo()
>>> R version 2.11.0 (2010-04-22)
>>> i386-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] HTqPCR_1.2.0       limma_3.4.0        RColorBrewer_1.0-2
>>> Biobase_2.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.26.0           affyio_1.16.0
>>> gdata_2.7.1           gplots_2.7.4          gtools_2.6.1
>>> preprocessCore_1.10.0
>>> [7] tools_2.11.0
>>>
>>> Michael Muratet, Ph.D.
>>> Senior Scientist
>>> HudsonAlpha Institute for Biotechnology
>>> mmuratet at hudsonalpha.org
>>> (256) 327-0473 (p)
>>> (256) 327-0966 (f)
>>>
>>> Room 4005
>>> 601 Genome Way
>>> Huntsville, Alabama 35806
>>>
>>>
>>>
>>>
>>
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>