[BioC] Error in calculating P-values with Genefilter function
Bradley Cattrysse
bcattrys at uoguelph.ca
Tue Jun 11 16:21:45 CEST 2013
Hi Jim,
I see what you mean, I was thinking it was giving me the number of observations in X. I will poke around some more, thanks again for the help!
Brad
----- Original Message -----
From: "James W. MacDonald" <jmacdon at uw.edu>
To: "Bradley Cattrysse" <bcattrys at uoguelph.ca>
Cc: Bioconductor at r-project.org
Sent: Tuesday, June 11, 2013 10:13:37 AM
Subject: Re: [BioC] Error in calculating P-values with Genefilter function
Hi Brad,
On 6/10/2013 10:34 AM, Bradley Cattrysse wrote:
> Hi Jim,
> Thanks for the additional help in trying to solve this problem. I used the option(error=recover) command and poked around like you said and found that probe 56 was giving the function a problem (like the NA in row 432 in yours). I removed that row from the data set and tried to re-run the p-value calculation to see if that would solve the problem. Although I think it solved that problem, I am now experiencing a different error with the function. There is a problem in the apply(expr, 1, flist) frame of genefilter:
>
>> Anova7_P0.01<-genefilter(check,Func7P0.01)
> Error in apply(expr, 1, flist) : dim(X) must have a positive length
>
> Enter a frame number, or 0 to exit
>
> 1: genefilter(check, Func7P0.01)
> 2: apply(expr, 1, flist)
>
> Selection: 2
> Called from: genefilter(check, Func7P0.01)
> Browse[1]> ls()
> [1] "dl" "FUN" "MARGIN" "X"
> Browse[1]> X
> [1] 35555 7
> Browse[1]> dim(X)
> NULL
It doesn't say that the dimensions of X are 35555 x 7. It says that X is
a vector with two numbers in it, (35555 and 7) and that the dimensions
of X are NULL, which stands to reason as it is a vector, which has no
dimensional attributes.
You might try poking around in frame 1. Usually I get better results
when I look one frame higher than I think I should.
Best,
Jim
>
> It says that dim(X) must have a positive length. When I browse X it says it has 35555 rows and 7 columns, which is correct for the data set. But then when I browse the dimensions of X it says NULL. Im not sure why this is? Do you have any idea what I should do to problem shoot this?
>
> Thanks again I really appreciate the help troubleshooting!
> Brad
>
>
>
> ----- Original Message -----
> From: "James W. MacDonald"<jmacdon at uw.edu>
> To: "Bradley Cattrysse"<bcattrys at uoguelph.ca>
> Cc: Bioconductor at r-project.org
> Sent: Tuesday, June 4, 2013 12:21:35 PM
> Subject: Re: [BioC] Error in calculating P-values with Genefilter function
>
> Hi Brad,
>
> Please don't take things off-list (e.g., in future, use reply-all). We
> like to think of the list archives as a searchable repository of
> knowledge, and if we go off-list, that aspect is lost.
>
> On 6/4/2013 11:53 AM, Bradley Cattrysse wrote:
>> Hi Jim,
>>
>> Thank you for the help. When I run the option(error=recover) it does show where the error is occurring, specifying that it is happening in fun(x) like when I use the traceback() function. Im not sure how to diagnose from there. We are analyzing an 8 array set, but we have deemed one array may be problematic. It works perfectly on the 8 array set, but when I drop one array I get the error. If you have any additional ideas that may help in diagnosing this problem the help would be greatly appreciated!
> Ideally what will happen is that when you error out, you will be able to
> figure out what the problem is by looking at the various frames that are
> available to you. As an example (which indicates that my original idea
> is not correct):
>
> dat<- matrix(rnorm(10000), ncol=10)
> dat[432,1:5]<- NA ## make sure it will break
> library(genefilter)
> fact<- factor(rep(1:2, each=5))
> f<- filterfun(Anova(fact, p=0.01))
> options(error=recover)
> genefilter(dat, f)
>
> Enter a frame number, or 0 to exit
>
> 1: genefilter(dat, f)
> 2: apply(expr, 1, flist)
> 3: FUN(newX[, i], ...)
> 4: fun(x)
> 5: lm(x ~ cov)
> 6: model.matrix(mt, mf, contrasts)
> 7: model.matrix.default(mt, mf, contrasts)
> 8: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>
> Selection: 3 *<------------ I chose to enter frame #3*
> Called from: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
> Browse[1]>*ls()<------------------------ What's in here?*
> [1] "fun" "x"
> Browse[1]> x *<---------------------- What is x?*
> [1] NA NA NA NA NA 0.2737152
> [7] 0.4907177 -0.1716024 0.2109492 1.0631105
>
> You can then hit enter and look at other frames. This isn't an exact
> science. For example, frame 2 is hard to figure out:
>
> Enter a frame number, or 0 to exit
>
> 1: genefilter(dat, f)
> 2: apply(expr, 1, flist)
> 3: FUN(newX[, i], ...)
> 4: fun(x)
> 5: lm(x ~ cov)
> 6: model.matrix(mt, mf, contrasts)
> 7: model.matrix.default(mt, mf, contrasts)
> 8: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>
> Selection: 2
> Called from: model.matrix.default(mt, mf, contrasts)
> Browse[1]> ls()
> [1] "ans" "d" "d2" "d.ans" "d.call" "dl" "dn"
> [8] "dn.ans" "dn.call" "ds" "FUN" "i" "MARGIN" "newX"
> [15] "s.ans" "s.call" "tmp" "X"
>
> That's a lot of stuff, and fairly cryptic. But we can get some info here:
>
> Browse[1]> i
> [1] 432
>
> So we know this is row 432, where we put the NAs. You just need to poke
> around in the various frames to try to figure out what is wrong with
> your data, and why you get the errors. It is always safest to do
> something like
>
> Browse[1]> class(X)
> [1] "matrix"
> Browse[1]> dim(X)
> [1] 1000 10
>
> rather than just hitting X to see what it it, as sometimes these things
> are really big and you might get stuck with lots of data being output to
> your screen.
>
> Best,
>
> Jim
>
>
>
>
>
>> Thanks again,
>> Brad
>>
>>
>>
>> ----- Original Message -----
>> From: "James W. MacDonald"<jmacdon at uw.edu>
>> To: "Brad Cattrysse [guest]"<guest at bioconductor.org>
>> Cc: bioconductor at r-project.org, bcattrys at uoguelph.ca, "genefilter Maintainer"<maintainer at bioconductor.org>
>> Sent: Monday, June 3, 2013 2:27:19 PM
>> Subject: Re: [BioC] Error in calculating P-values with Genefilter function
>>
>> Hi Brad,
>>
>> On 6/3/2013 2:12 PM, Brad Cattrysse [guest] wrote:
>>> To whom it may concern,
>>>
>>> I am having trouble with the genefilter function in R. I am attempting to extract genes from 7 arrays using a p-value of 0.01 using the following code:
>>>
>>> Func7P0.01<-filterfun(Anova(class7,p=0.01))
>>> Func7P0.01
>>> Anova7_P0.01<-genefilter(SCDexprs7,Func7P0.01)
>>> Anova7_P0.01
>>>
>>> Creating Func7P0.01 works fine, but when I run the genefilter using my data matrix and Func7P0.01 i get the following error.
>>>
>>>
>>>> Anova7_P0.01<-genefilter(SCDexprs7,Func7P0.01)
>>> Error in if (fstat< p) return(TRUE) :
>>> missing value where TRUE/FALSE needed
>>>
>>>
>>> and when I runtraceback(), I get:
>>>
>>>> traceback()
>>> 4: fun(x)
>>> 3: FUN(newX[, i], ...)
>>> 2: apply(expr, 1, flist)
>>> 1: genefilter(SCDexprs7, Func7P0.01)
>>>
>>>
>>> Im not entirely sure what is going on, but when I extract genes from the same 7 arrays, plus another array (8 arrays total) using the same code structure (below) it works fine.
>> My best guess would be that you have some missing data for a particular
>> gene, and when you only have seven arrays you get to a point where you
>> don't have enough data of one type to fit a linear model, so the code here
>>
>> m1<- lm(x ~ cov)
>> m2<- lm(x ~ 1)
>> av<- anova(m2, m1)
>>
>> from Anova() breaks.
>>
>> Try doing
>>
>> options(error = recover)
>>
>> and then run genefilter. You will error out at the point where things
>> are breaking, and can look at the variables being analyzed at that point
>> to see what the problem is.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>> Func8P0.01<-filterfun(Anova(class8,p=0.01))
>>> Func8P0.01
>>> Anova8_P0.01<-genefilter(SCDexprs8,Func8P0.01)
>>> Anova8_P0.01
>>>
>>>
>>> Any help with this matter would be greatly appreciated as I am not sure what else to try.
>>>
>>> Thanks in advance!
>>> Brad Cattrysse
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>>> sessionInfo()
>>> R version 3.0.0 (2013-04-03)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] pd.mogene.1.1.st.v1_3.8.0 RSQLite_0.11.3
>>> [3] DBI_0.2-6 ggplot2_0.9.3.1
>>> [5] e1071_1.6-1 class_7.3-7
>>> [7] pvac_1.8.0 pgmm_1.0
>>> [9] mclust_4.1 cluster_1.14.4
>>> [11] genefilter_1.42.0 oligoData_1.8.0
>>> [13] oligo_1.24.0 Biobase_2.20.0
>>> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affxparser_1.32.0 affy_1.38.1 affyio_1.28.0
>>> [4] annotate_1.38.0 AnnotationDbi_1.22.5 BiocInstaller_1.10.1
>>> [7] Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8
>>> [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3
>>> [13] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.2
>>> [16] grid_3.0.0 gtable_0.1.2 IRanges_1.18.0
>>> [19] iterators_1.0.6 labeling_0.1 MASS_7.3-26
>>> [22] munsell_0.4 plyr_1.8 preprocessCore_1.22.0
>>> [25] proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2
>>> [28] scales_0.2.3 splines_3.0.0 stats4_3.0.0
>>> [31] stringr_0.6.2 survival_2.37-4 tools_3.0.0
>>> [34] XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.6.0
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list