[BioC] HTqPCR setCategories problem

Tue May 11 00:15:00 CEST 2010

On May 10, 2010, at 5:11 PM, Heidi Dvinge wrote:

> Hi Mike,
>
> you're absolutely right that the issue turned out to be with rbind of
> qPCRset! The feature categories were intended to be stored as  
> characters,
> not factors, so that uses can easily add their own categories if they
> want. However, in rbind I've omitted the parameter
> "stringsAsFactors=FALSE" in a call to as.data.frame(), so I get:

Thanks Heidi!

Manual is OK, as long as there's a path I can keep moving forward

Mike
>
>> data(qPCRraw)
>> class(featureCategory(qPCRraw))
> [1] "data.frame"
>> class(featureCategory(qPCRraw)[,1])
> [1] "character"
>> test <- rbind(qPCRraw[1:4,], qPCRraw[1:40,])
>> class(featureCategory(test))
> [1] "data.frame"
>> class(featureCategory(test)[,1])
> [1] "factor"
>
> I'll sanitise the rbind (and probably also the cbind function), but  
> I'm
> afraid it'll take a while for these changes to get into BioC devel.
>
> Unfortunately, in the meantime, you'll have to manually transfer your
> featureCategory back to a data frame of characters instead of factors:
>
>> featureCategory(test) <- data.frame(apply(featureCategory(test), 2,
> as.character), stringsAsFactors=FALSE)
>> class(featureCategory(test)[,1])
> [1] "character"
>> setCategory(test, groups=rep(c("A", "B"),3), Ct.min=20)
> Categories after Ct.max and Ct.min filtering:
>             sample1 sample2 sample3 sample4 sample5 sample6
> OK                29      32      34      32      32      33
> Undetermined       6       7       5       4       7       5
> Unreliable         9       5       5       8       5       6
> Categories after standard deviation filtering:
>             sample1 sample2 sample3 sample4 sample5 sample6
> OK                29      32      34      32      32      33
> Undetermined       6       7       5       4       7       5
> Unreliable         9       5       5       8       5       6
>
> Yet another item on the ever-growing HTqPCR to-do list...
> \Heidi
>
>>
>> On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote:
>>
>>> Hello Mike,
>>>
>>> On 7 May 2010, at 17:11, Michael Muratet wrote:
>>>
>>>> Greetings
>>>>
>>>> I have encountered a small problem when using the setCategories
>>>> method:
>>>>
>>>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9,
>>>> groups=sampleNames(s2010004660))
>>>> Categories after Ct.max and Ct.min filtering:
>>>>            wt_4wks mt_4wks wt_17wks mt_17wks
>>>> OK               582     571      566      578
>>>> Undetermined     186     197      202      190
>>>> Categories after standard deviation filtering:
>>>>            wt_4wks mt_4wks wt_17wks mt_17wks
>>>> OK               581     569      565      577
>>>> Undetermined     186     197      202      190
>>>> There were 50 or more warnings (use warnings() to see the first 50)
>>>>> warnings()
>>>> Warning messages:
>>>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>>> invalid factor level, NAs generated
>>>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>>> invalid factor level, NAs generated
>>>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>>> invalid factor level, NAs generated
>>>>
>>> Hm, I can't reproduce this with the standard data sets in HTqPCR,
>>> although this also only contains "OK" and "Undetermined" initially:
>>>
>>>> data(qPCRraw)
>>>> apply(featureCategory(qPCRraw), 2, table)
>>>            sample1 sample2 sample3 sample4 sample5 sample6
>>> OK               353     312     360     339     335     336
>>> Undetermined      31      72      24      45      49      48
>>>> setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3))
>>> Categories after Ct.max and Ct.min filtering:
>>>            sample1 sample2 sample3 sample4 sample5 sample6
>>> OK               313     264     327     295     296     286
>>> Undetermined      68     119      56      86      86      96
>>> Unreliable         3       1       1       3       2       2
>>> Categories after standard deviation filtering:
>>>            sample1 sample2 sample3 sample4 sample5 sample6
>>> OK               309     258     323     291     292     281
>>> Undetermined      68     119      56      86      86      96
>>> Unreliable         7       7       5       7       6       7
>>>
>>> How does featureCategory() of your new object look after you run
>>> setCategory? Also, it seems like all your Ct values fall within the
>>> default range given by Ct.max and Ct.min in setCategory(), hence
>>> none of the categories are adjusted during the "first round", before
>>> the standard deviation filtering. What happens if you set one of
>>> Ct.max or Ct.min so that some values are called as "Unreliable"
>>> based on this, e.g. by saying Ct.max=25? Do you still get the same
>>> warning?
>>
>> Heidi
>>
>> I have traced the problem to this line in setCategory by setting the
>> options so that warnings are errors:
>>
>> featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable"
>>
>> I have never seen dual indices like this on a data frame before, can
>> you tell me what this does?
>>
>> If I force the data to be called "Unreliable" by setting Ct.min I can
>> get the same failure here:
>>
>> featureCategory(out)[data < Ct.min] <- "Unreliable"
>>
>> If you reference a single dimension from a data frame like this,  
>> don't
>> you set/get just a column? I thought you had to use something like
>> sapply in a case like this.
>>
>> I note that any of the 'primitive' data sets, i.e., the ones that I
>> read with readCtData can be used in setCategory without a problem.
>> Once I rbind the two data types together, I get the problem. I have
>> looked at the rbind.qPCRset code and I don't see anything that would
>> cause this to be the case but I suspect my problem is somehow tied up
>> some artifact of data assembly.
>>
>> Thanks
>>
>> Mike
>>
>>
>>
>>
>>
>>>
>>> Cheers
>>> \Heidi
>>>
>>>
>>>> other 47 warnings are the same
>>>>
>>>> The categories that you get in the SDS data are "OK" and
>>>> "Undetermined" and it seems to be unwilling to add the new level
>>>> "Unreliable".  I tried to manually add the levels:
>>>>
>>>> featureCategory(s2010004660)$wt_4wks <-
>>>> factor(featureCategory(s2010004660)$wt_4wks,
>>>> levels 
>>>> =c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable"))
>>>> featureCategory(s2010004660)$mt_4wks <-
>>>> factor(featureCategory(s2010004660)$mt_4wks,
>>>> levels 
>>>> =c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable"))
>>>> featureCategory(s2010004660)$wt_17wks <-
>>>> factor(featureCategory(s2010004660)$wt_17wks,
>>>> levels 
>>>> =c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable"))
>>>> featureCategory(s2010004660)$mt_17wks <-
>>>> factor(featureCategory(s2010004660)$mt_17wks,
>>>> levels 
>>>> =c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable"))
>>>>
>>>> and get another error
>>>>
>>>> Error in count[names(tab), i] <- tab : subscript out of bounds
>>>>
>>>> Is this a bug or operator error?
>>>>
>>>> Thanks
>>>>
>>>> Mike
>>>>
>>>>> sessionInfo()
>>>> R version 2.11.0 (2010-04-22)
>>>> i386-apple-darwin9.8.0
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods    
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] HTqPCR_1.2.0       limma_3.4.0        RColorBrewer_1.0-2
>>>> Biobase_2.8.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affy_1.26.0           affyio_1.16.0
>>>> gdata_2.7.1           gplots_2.7.4          gtools_2.6.1
>>>> preprocessCore_1.10.0
>>>> [7] tools_2.11.0
>>>>
>>>> Michael Muratet, Ph.D.
>>>> Senior Scientist
>>>> HudsonAlpha Institute for Biotechnology
>>>> mmuratet at hudsonalpha.org
>>>> (256) 327-0473 (p)
>>>> (256) 327-0966 (f)
>>>>
>>>> Room 4005
>>>> 601 Genome Way
>>>> Huntsville, Alabama 35806
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>
>

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806