[BioC] HTqPCR setCategories problem
Heidi Dvinge
heidi at ebi.ac.uk
Tue May 11 00:11:37 CEST 2010
Hi Mike,
you're absolutely right that the issue turned out to be with rbind of
qPCRset! The feature categories were intended to be stored as characters,
not factors, so that uses can easily add their own categories if they
want. However, in rbind I've omitted the parameter
"stringsAsFactors=FALSE" in a call to as.data.frame(), so I get:
> data(qPCRraw)
> class(featureCategory(qPCRraw))
[1] "data.frame"
> class(featureCategory(qPCRraw)[,1])
[1] "character"
> test <- rbind(qPCRraw[1:4,], qPCRraw[1:40,])
> class(featureCategory(test))
[1] "data.frame"
> class(featureCategory(test)[,1])
[1] "factor"
I'll sanitise the rbind (and probably also the cbind function), but I'm
afraid it'll take a while for these changes to get into BioC devel.
Unfortunately, in the meantime, you'll have to manually transfer your
featureCategory back to a data frame of characters instead of factors:
> featureCategory(test) <- data.frame(apply(featureCategory(test), 2,
as.character), stringsAsFactors=FALSE)
> class(featureCategory(test)[,1])
[1] "character"
> setCategory(test, groups=rep(c("A", "B"),3), Ct.min=20)
Categories after Ct.max and Ct.min filtering:
sample1 sample2 sample3 sample4 sample5 sample6
OK 29 32 34 32 32 33
Undetermined 6 7 5 4 7 5
Unreliable 9 5 5 8 5 6
Categories after standard deviation filtering:
sample1 sample2 sample3 sample4 sample5 sample6
OK 29 32 34 32 32 33
Undetermined 6 7 5 4 7 5
Unreliable 9 5 5 8 5 6
Yet another item on the ever-growing HTqPCR to-do list...
\Heidi
>
> On May 10, 2010, at 9:05 AM, Heidi Dvinge wrote:
>
>> Hello Mike,
>>
>> On 7 May 2010, at 17:11, Michael Muratet wrote:
>>
>>> Greetings
>>>
>>> I have encountered a small problem when using the setCategories
>>> method:
>>>
>>> s2010004660.cat <- setCategory(s2010004660, quantile=0.9,
>>> groups=sampleNames(s2010004660))
>>> Categories after Ct.max and Ct.min filtering:
>>> wt_4wks mt_4wks wt_17wks mt_17wks
>>> OK 582 571 566 578
>>> Undetermined 186 197 202 190
>>> Categories after standard deviation filtering:
>>> wt_4wks mt_4wks wt_17wks mt_17wks
>>> OK 581 569 565 577
>>> Undetermined 186 197 202 190
>>> There were 50 or more warnings (use warnings() to see the first 50)
>>> > warnings()
>>> Warning messages:
>>> 1: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>> invalid factor level, NAs generated
>>> 2: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>> invalid factor level, NAs generated
>>> 3: In `[<-.factor`(`*tmp*`, index, value = "Unreliable") :
>>> invalid factor level, NAs generated
>>>
>> Hm, I can't reproduce this with the standard data sets in HTqPCR,
>> although this also only contains "OK" and "Undetermined" initially:
>>
>> > data(qPCRraw)
>> > apply(featureCategory(qPCRraw), 2, table)
>> sample1 sample2 sample3 sample4 sample5 sample6
>> OK 353 312 360 339 335 336
>> Undetermined 31 72 24 45 49 48
>> > setCategory(qPCRraw, quantile=0.9, groups=rep(c("A", "B"), 3))
>> Categories after Ct.max and Ct.min filtering:
>> sample1 sample2 sample3 sample4 sample5 sample6
>> OK 313 264 327 295 296 286
>> Undetermined 68 119 56 86 86 96
>> Unreliable 3 1 1 3 2 2
>> Categories after standard deviation filtering:
>> sample1 sample2 sample3 sample4 sample5 sample6
>> OK 309 258 323 291 292 281
>> Undetermined 68 119 56 86 86 96
>> Unreliable 7 7 5 7 6 7
>>
>> How does featureCategory() of your new object look after you run
>> setCategory? Also, it seems like all your Ct values fall within the
>> default range given by Ct.max and Ct.min in setCategory(), hence
>> none of the categories are adjusted during the "first round", before
>> the standard deviation filtering. What happens if you set one of
>> Ct.max or Ct.min so that some values are called as "Unreliable"
>> based on this, e.g. by saying Ct.max=25? Do you still get the same
>> warning?
>
> Heidi
>
> I have traced the problem to this line in setCategory by setting the
> options so that warnings are errors:
>
> featureCategory(out)[split.by==gene,groups==g][index] <- "Unreliable"
>
> I have never seen dual indices like this on a data frame before, can
> you tell me what this does?
>
> If I force the data to be called "Unreliable" by setting Ct.min I can
> get the same failure here:
>
> featureCategory(out)[data < Ct.min] <- "Unreliable"
>
> If you reference a single dimension from a data frame like this, don't
> you set/get just a column? I thought you had to use something like
> sapply in a case like this.
>
> I note that any of the 'primitive' data sets, i.e., the ones that I
> read with readCtData can be used in setCategory without a problem.
> Once I rbind the two data types together, I get the problem. I have
> looked at the rbind.qPCRset code and I don't see anything that would
> cause this to be the case but I suspect my problem is somehow tied up
> some artifact of data assembly.
>
> Thanks
>
> Mike
>
>
>
>
>
>>
>> Cheers
>> \Heidi
>>
>>
>>> other 47 warnings are the same
>>>
>>> The categories that you get in the SDS data are "OK" and
>>> "Undetermined" and it seems to be unwilling to add the new level
>>> "Unreliable". I tried to manually add the levels:
>>>
>>> featureCategory(s2010004660)$wt_4wks <-
>>> factor(featureCategory(s2010004660)$wt_4wks,
>>> levels=c(levels(featureCategory(s2010004660)$wt_4wks),"Unreliable"))
>>> featureCategory(s2010004660)$mt_4wks <-
>>> factor(featureCategory(s2010004660)$mt_4wks,
>>> levels=c(levels(featureCategory(s2010004660)$mt_4wks),"Unreliable"))
>>> featureCategory(s2010004660)$wt_17wks <-
>>> factor(featureCategory(s2010004660)$wt_17wks,
>>> levels=c(levels(featureCategory(s2010004660)$wt_17wks),"Unreliable"))
>>> featureCategory(s2010004660)$mt_17wks <-
>>> factor(featureCategory(s2010004660)$mt_17wks,
>>> levels=c(levels(featureCategory(s2010004660)$mt_17wks),"Unreliable"))
>>>
>>> and get another error
>>>
>>> Error in count[names(tab), i] <- tab : subscript out of bounds
>>>
>>> Is this a bug or operator error?
>>>
>>> Thanks
>>>
>>> Mike
>>>
>>> > sessionInfo()
>>> R version 2.11.0 (2010-04-22)
>>> i386-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] HTqPCR_1.2.0 limma_3.4.0 RColorBrewer_1.0-2
>>> Biobase_2.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.26.0 affyio_1.16.0
>>> gdata_2.7.1 gplots_2.7.4 gtools_2.6.1
>>> preprocessCore_1.10.0
>>> [7] tools_2.11.0
>>>
>>> Michael Muratet, Ph.D.
>>> Senior Scientist
>>> HudsonAlpha Institute for Biotechnology
>>> mmuratet at hudsonalpha.org
>>> (256) 327-0473 (p)
>>> (256) 327-0966 (f)
>>>
>>> Room 4005
>>> 601 Genome Way
>>> Huntsville, Alabama 35806
>>>
>>>
>>>
>>>
>>
>
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
>
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
>
>
>
>
>
More information about the Bioconductor
mailing list