[BioC] kOverA(k, A=100, na.rm=TRUE)
Robert Gentleman
rgentlem at fhcrc.org
Tue Jul 1 09:29:49 CEST 2008
Hi,
Roberts, Raymond wrote:
> sessionInfo()
>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid splines stats graphics grDevices datasets
> [7] tools utils methods base
>
> other attached packages:
> [1] RColorBrewer_1.0-2 goProfiles_1.2.0
> [3] aroma.affymetrix_0.9.3 aroma.apd_0.1.3
> [5] R.huge_0.1.5 affxparser_1.12.2
> [7] aroma.core_0.9.3 sfit_0.1.5
> [9] aroma.light_1.8.1 digest_0.3.1
> [11] matrixStats_0.1.2 R.rsp_0.3.4
> [13] R.cache_0.1.7 farms_1.3
> [15] MASS_7.2-41 RMySQL_0.6-0
> [17] gplots_2.6.0 gdata_2.4.2
> [19] gtools_2.5.0 affycoretools_1.12.0
> [21] annaffy_1.12.1 KEGG.db_2.2.0
> [23] gcrma_2.12.1 matchprobes_1.12.0
> [25] biomaRt_1.14.0 RCurl_0.9-3
> [27] GOstats_2.6.0 Category_2.6.0
> [29] genefilter_1.20.0 survival_2.34-1
> [31] RBGL_1.16.0 annotate_1.18.0
> [33] xtable_1.5-2 GO.db_2.2.0
> [35] AnnotationDbi_1.2.1 RSQLite_0.6-8
> [37] DBI_0.2-4 graph_1.18.1
> [39] limma_2.14.5 affy_1.18.2
> [41] preprocessCore_1.2.0 affyio_1.8.0
> [43] rJava_0.5-1 Biobase_2.0.1
> [45] R.utils_1.0.2 R.oo_1.4.3
> [47] R.methodsS3_1.0.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.10 XML_1.94-0.1
>
>
Please show the code and the output - clearly there is a problem where
there should not be one, and by not giving us the whole story you do
make it rather hard to diagnose.
>
> There are only 78 shared between the two outputs. kOverA(2,7) produces
> 361 genes, kOverA(2,6) produces 121 genes. This seems like a major
> problem that I encountered in a completely different script, which was
> not mine so I just scrapped using it.
So all kOverA is doing is asking whether at least k of the arrays are
larger than A. In your case k is 2 and once it is 7, another time 6.
The code for kOverA is so simple as to suggest there is no problem there
function (k, A = 100, na.rm = TRUE)
{
function(x) {
if (na.rm)
x <- x[!is.na(x)]
sum(x > A) >= k
}
}
this is applied row-wise to your expression matrix.
Suppose your ExpressionSet is called myExpr
then you can mimic kOverA as follows
kO6 = apply(exprs(myExpr), 1, function(x) sum(x > 6) >= 2)
kO7 = apply(exprs(myExpr), 1, function(x) sum(x > 7) >= 2)
so perhaps you could run both of those and tell us what happens?
and how they compare with the genes you selected - and please do show
enough of the code so that someone other than yourself might be able to
find the source of the error
Robert
>
> There are no apparent similarities between the genes being included in
> both output sets. Is there another function I can use that will do the
> same filtering but may be more reliable?
>
>
> Ray
>
> Lovelace Respiratory Research Institute
>
>
>
>
>
>
>
> -----Original Message-----
> From: Robert Gentleman [mailto:rgentlem at fhcrc.org]
> Sent: Monday, June 30, 2008 2:27 PM
> To: Roberts, Raymond
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] kOverA(k, A=100, na.rm=TRUE)
>
> Hi,
> That behavior seems somewhat unusual.
> Could you perhaps see which genes are selected for
> kOverA(2,7) and which for kOverA(2,6)? And then look at the expression
> values to see what might be going on..
>
> Also please include the output of sessionInfo, as requested in the
> posting guide.
>
> Robert
>
>
> Roberts, Raymond wrote:
>> Dear List,
>>
>>
>>
>> I'm using genefilter and Limma to analyze the human exon array. I did
>> my preprocessing with aroma.affymetrix and then loaded it into limma
> and
>> used genefilter to isolate the significant genes. At least that is
> what
>> I think I am doing. Here is my script starting at the function I have
>> the question about
>>
>>
>>
>>
>>
>> ##Filter for differentialy expressed genes
>>
>> f1<-kOverA(2,7)
>>
>> ffun<-filterfun(f1)
>>
>> which<-genefilter(exprs(RMASet), ffun)
>>
>> rma_no<-summary(which)
>>
>> rmafiltset<-RMASet[which,]
>>
>>
>>
>>
>>
>>
>>
>> grps <- paste(pData(rmafiltset)$hours)
>>
>> lev <- c("time0", "time3", "time6", "time12", "time24", "time48")
>>
>> f <- factor(grps, levels = lev)
>>
>> design <- model.matrix(~ 0 + f)
>>
>> colnames(design) <- lev
>>
>> rmafit <- lmFit(rmafiltset, design)
>>
>>
>>
>>
>>
>>
>>
>> contrast.matrix <- makeContrasts("time3-time0", "time6-time3",
>> "time12-time6", "time24-time12", "time48-time24", "time6-time0",
>> "time12-time0", "time24-time0", "time48-time0", levels=design)
>>
>> rmafit2 <- contrasts.fit(rmafit, contrast.matrix)
>>
>> rmafit3 <- eBayes(rmafit2)
>>
>>
>>
>>
>>
>> pval<-0.05
>>
>> l2foldch<- 0.58496250072
>>
>>
>>
>> rmaresults<-decideTests(rmafit3, p.value=pval, lfc=l2foldch)
>>
>>
>>
>>
>>
>> rmalist1 <- rmaresults[,1]!=0
>>
>> rmalist2 <- rmaresults[,2]!=0
>>
>> rmalist3 <- rmaresults[,3]!=0
>>
>> rmalist4 <- rmaresults[,4]!=0
>>
>> rmalist5 <- rmaresults[,5]!=0
>>
>> rmalist6 <- rmaresults[,6]!=0
>>
>> rmalist7 <- rmaresults[,7]!=0
>>
>> rmalist8 <- rmaresults[,8]!=0
>>
>> rmalist9 <- rmaresults[,9]!=0
>>
>>
>>
>> rmagroup1 <- rmafiltset[rmalist1,]
>>
>> rmagroup2 <- rmafiltset[rmalist2,]
>>
>> rmagroup3 <- rmafiltset[rmalist3,]
>>
>> rmagroup4 <- rmafiltset[rmalist4,]
>>
>> rmagroup5 <- rmafiltset[rmalist5,]
>>
>> rmagroup6 <- rmafiltset[rmalist6,]
>>
>> rmagroup7 <- rmafiltset[rmalist7,]
>>
>> rmagroup8 <- rmafiltset[rmalist8,]
>>
>> rmagroup9 <- rmafiltset[rmalist9,]
>>
>>
>>
>> rmaglist1 <- featureNames(rmagroup1)
>>
>> rmaglist2 <- featureNames(rmagroup2)
>>
>> rmaglist3 <- featureNames(rmagroup3)
>>
>> rmaglist4 <- featureNames(rmagroup4)
>>
>> rmaglist5 <- featureNames(rmagroup5)
>>
>> rmaglist6 <- featureNames(rmagroup6)
>>
>> rmaglist7 <- featureNames(rmagroup7)
>>
>> rmaglist8 <- featureNames(rmagroup8)
>>
>> rmaglist9 <- featureNames(rmagroup9)
>>
>>
>>
>> rmaunion <- union(rmaglist1, union(rmaglist2, union(rmaglist3,
>> union(rmaglist4, union(rmaglist5, union(rmaglist6, union(rmaglist7,
>> union(rmaglist8, rmaglist9))))))))
>>
>>
>>
>> rmaunionset <- rmafiltset[rmaunion,]
>>
>>
>>
>> plotdata2 <- exprs(rmaunionset)
>>
>> colnames(plotdata2) <- paste(pData(RMASet)$label)
>>
>>
>>
>>
>>
>> I go on to create a heatmap and html output of significant genes. My
>> problem is that when I change kOverA(2, 7) to, say, kOverA(2, 6), I
> get
>> fewer genes, I thought this should increase the number of genes I'm
>> seeing in the final output. If I increase from 7 to 8 I still get a
>> shorter list. It seems like 7 is the value that generates the maximum
>> output. If anyone could explain why this is happening I would really
>> appreciate it. If I should post more information please just let me
>> know.
>>
>>
>>
>>
>>
>> Ray Roberts
>>
>>
>>
>> Lovelace Respiratory Research Institute
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list