[BioC] HTqPCR setCategories problem
Michael Muratet
mmuratet at hudsonalpha.org
Mon May 10 23:32:43 CEST 2010
On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote:
> Hello Mike,
>
> On 7 May 2010, at 17:11, Michael Muratet wrote:
>
>> Greetings
>>
>> I have encountered a small problem when using the setCategories
>> method:
>>
>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9,
>> groups=sampleNames(s2010004660))
>> Categories after Ct.max and Ct.min filtering:
>> wt_4wks mt_4wks wt_17wks mt_17wks
>> OK 582 571 566 578
>> Undetermined 186 197 202 190
>> Categories after standard deviation filtering:
>> wt_4wks mt_4wks wt_17wks mt_17wks
>> OK 581 569 565 577
>> Undetermined 186 197 202 190
>> There were 50 or more warnings (use warnings() to see the first 50)
>> > warnings()
>> Warning messages:
>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>> invalid factor level, NAs generated
>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>> invalid factor level, NAs generated
>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>> invalid factor level, NAs generated
>>
> Hm, I can't reproduce this with the standard data sets in HTqPCR,
> although this also only contains "OK" and "Undetermined" initially:
>
> > data(qPCRraw)
> > apply(featureCategory(qPCRraw), 2, table)
> sample1 sample2 sample3 sample4 sample5 sample6
> OK 353 312 360 339 335 336
> Undetermined 31 72 24 45 49 48
> > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3))
> Categories after Ct.max and Ct.min filtering:
> sample1 sample2 sample3 sample4 sample5 sample6
> OK 313 264 327 295 296 286
> Undetermined 68 119 56 86 86 96
> Unreliable 3 1 1 3 2 2
> Categories after standard deviation filtering:
> sample1 sample2 sample3 sample4 sample5 sample6
> OK 309 258 323 291 292 281
> Undetermined 68 119 56 86 86 96
> Unreliable 7 7 5 7 6 7
>
> How does featureCategory() of your new object look after you run
> setCategory? Also, it seems like all your Ct values fall within the
> default range given by Ct.max and Ct.min in setCategory(), hence
> none of the categories are adjusted during the "first round", before
> the standard deviation filtering. What happens if you set one of
> Ct.max or Ct.min so that some values are called as "Unreliable"
> based on this, e.g. by saying Ct.max=25? Do you still get the same
> warning?
Heidi
I have traced the problem to this line in setCategory by setting the
options so that warnings are errors:
featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable"
I have never seen dual indices like this on a data frame before, can
you tell me what this does?
If I force the data to be called "Unreliable" by setting Ct.min I can
get the same failure here:
featureCategory(out)[data < Ct.min] <- "Unreliable"
If you reference a single dimension from a data frame like this, don't
you set/get just a column? I thought you had to use something like
sapply in a case like this.
I note that any of the 'primitive' data sets, i.e., the ones that I
read with readCtData can be used in setCategory without a problem.
Once I rbind the two data types together, I get the problem. I have
looked at the rbind.qPCRset code and I don't see anything that would
cause this to be the case but I suspect my problem is somehow tied up
some artifact of data assembly.
Thanks
Mike
>
> Cheers
> \Heidi
>
>
>> other 47 warnings are the same
>>
>> The categories that you get in the SDS data are "OK" and
>> "Undetermined" and it seems to be unwilling to add the new level
>> "Unreliable". I tried to manually add the levels:
>>
>> featureCategory(s2010004660)$wt_4wks <-
>> factor(featureCategory(s2010004660)$wt_4wks,
>> levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable"))
>> featureCategory(s2010004660)$mt_4wks <-
>> factor(featureCategory(s2010004660)$mt_4wks,
>> levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable"))
>> featureCategory(s2010004660)$wt_17wks <-
>> factor(featureCategory(s2010004660)$wt_17wks,
>> levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable"))
>> featureCategory(s2010004660)$mt_17wks <-
>> factor(featureCategory(s2010004660)$mt_17wks,
>> levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable"))
>>
>> and get another error
>>
>> Error in count[names(tab), i] <- tab : subscript out of bounds
>>
>> Is this a bug or operator error?
>>
>> Thanks
>>
>> Mike
>>
>> > sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2
>> Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.26.0 affyio_1.16.0
>> gdata_2.7.1 gplots_2.7.4 gtools_2.6.1
>> preprocessCore_1.10.0
>> [7] tools_2.11.0
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>
Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)
Room 4005
601 Genome Way
Huntsville, Alabama 35806
More information about the Bioconductor
mailing list