[BioC] Gene filtering
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Sat Feb 12 02:04:36 CET 2005
[[Ignore the previous mail. I hit the send button too soon]]
I never used genefilter and filterfun so I would not be able to advice
on this and hope the suggestions below solves your problem. On a
personal note, I just calculate and store the p-values/statistics
directly. This may be more efficient for the following reasons
1) to generate various lists of interesting genes at different p-value
cutoffs. This is often required by the biologists who might want a high
confidence subset for biological validation and maybe a broader subset
for computation validation (e.g. pathway analysis)
2) to rank genes by p-values
3) to adjust p-values for multiple hypothesis
Here is one way how you can do this :
mat <- matrix( rnorm(100000, sd=5), nc=100 )
rownames(mat) <- paste("g", 1:1000, sep="")
# replace 'mat' with your exprs(data)
g <- rep(1:2, each=50)
# Class information e.g. 50 normal and 50 tumour
# again replace this with your own groups
stats <- t( apply( mat, 1, function(z) {
x <- z[ which( g==1 ) ]
y <- z[ which( g==2 ) ]
t.p <- t.test(x, y)$p.value
w.p <- wilcox.test(x, y)$p.value
fc <- mean(x, na.rm=T) - mean(y, na.rm=T)
return( c(t.pval=t.p, wilcox.pval=w.p, fold.change=fc) )
}))
# You can modify the above to include further tests etc.
Hopefully you can get something compact like the following (Note : your
results will vary due to random number generation).
t.pval wilcox.pval fold.change
g1 0.2376890 0.2655573 1.0214440
g2 0.1513874 0.2931174 -1.2895703
g3 0.4788188 0.5014898 -0.7349789
g4 0.2021780 0.1302305 1.3201382
g5 0.2537569 0.2655573 -1.1256882
g6 0.5881588 0.7020112 -0.5907285
.. ......... ......... ..........
Now, you can generate various lists such as
list1 <- names( which( stats[ , "t.pval"] < 0.05 ) )
list2 <- names( which( stats[ , "fold.change"] > 1 ) )
intersect( list1, list2 )
I guess this is probably a matter of taste. Hope this helps.
Regards, Adai
On Fri, 2005-02-11 at 10:08 -0500, James W. MacDonald wrote:
> Heike Pospisil wrote:
> > Hello Adaikalavan
> >
> >> I think justRMA() uses nearly all the memory you have access to, so it
> >> it only able to handle small computations afterwards. What I would
> >> suggest is try saving the exprSet and exit. Then start from a fresh R
> >> session and do your analysis from that. See below.
> >>
> >>
> >
> > Thanks for your suggestion. Saving and loading the exprSet work and
> > help. But, unfortunately, my filter function do not work.
> >
> > ff1<-ttest(data,.001,na.rm=TRUE)
> > ff2<-filterfun(ff1)
> > wh2<-genefilter(exprs(data), ff2)
> >
> > No idea :-(
> >
> > Best wishes.
> > Heike
> >
> I think you are setting up ff1 incorrectly. As an example, let's say
> that your exprSet contains 10 samples, the first 5 are e.g.,
> experimental, and the second 5 are control. Then you would set up ff1
> like this:
>
> ff1 <- ttest(5, 0.001, na.rm = TRUE)
>
> -or-
>
> cl <- c(rep(1,5), rep(2,5))
> ff1 <- ttest(cl, 0.001, na.rm = TRUE)
>
> The second method can be used if the samples are not contiguous (e.g.,
> they are ordered exp, cont, exp, cont, etc).
>
> cl <- c(rep(c(1,2), 5)
> ff1 <- ttest(cl, 0.001, na.rm = TRUE)
>
> HTH,
>
> Jim
>
>
>
More information about the Bioconductor
mailing list