[BioC] Creating a customized cut flow with the affy/limma packages
J.M.Jensen
jacob.malte.jensen at hum-gen.au.dk
Tue Feb 14 12:00:32 CET 2012
James W. MacDonald <jmacdon at ...> writes:
>
> Hi J.M.,
>
> On 2/13/2012 9:35 AM, J.M.Jensen [guest] wrote:
> > Dear all,
> > I am trying to create a customized cut flow with the affy/limma packages. In
order to assign different
> cut-off values between different arrays would be to first make one cut-off and
then use the remaining
> expression set in further analysis. When working with the affy/limma packages,
I read and normalize my
> data with the justRMA() function, fit a linear model with the limma package
and use the decideTests() from
> the limma package to select probes with a specified cut-off (e.g. lfc). The
problem is that I cannot assign
> multiple criteria for the cut flow. For example, if I want to remove all
probes from my data set that differ
> more than a lfc of 1.2 between two controls, regardless of the expression
values in the rest of the samples.
> After this cut, I would then be able to use another cut o
> ff value, e.g. lfc=1.8 to select probes from different contrasts of the
remaining data. I tried looking
> into the vennSelect() (affycoretools), because I can use it to select various
contras!
> ts
> > and their intersection, however, I cannot figure how to assign different
cut off criteria.
> > Any help or thoughts on this would be most appreciated.
>
> It seems like you want to do two very different things here, but correct
> me if I misunderstand. First, it appears that you want to remove all
> probes that vary between controls. Here I am assuming that you have one
> set of controls, and you want to exclude any probeset that varies
> between any two of the controls by 2^1.2 or greater. Second, you want to
> select all probes that vary between various contrasts.
>
> These are very different things, and I don't think there is a
> one-size-fits-all solution. The way I understand it, you won't be able
> to do anything with the fitted object to filter the first set of
> probesets. Instead, you want to use the raw data.
>
> You could set up a function, say filterControls() to do step 1.
>
> filterControls <- function(controlvector, eset, filt){
> ## control vector is a numeric vector
> ## eset is an ExpressionSet (e.g., from justRMA())
> ## filt is a numeric log fold change to filter on
> require(gtools)
> comb <- combinations(length(controlvector), 2, controlvector)
> indlst <- lapply(1:nrow(comb), function(x) abs(eset[,comb[x,1]] -
> eset[,comb[x,2]]) > filt)
> ind <- apply(do.call("cbind", indlst), 1, any)
> eset <- eset[!ind,]
> eset
> }
>
> Then you would just go forward with the subsetted ExpressionSet object,
> using decideTests() with whatever lfc you like, and vennSelect() to output.
>
> Best,
>
> Jim
Hi Jim.
Thank you for your quick response. You understand me correct I believe, however,
I have some trouble deciphering the solution you suggested. I believe I get the
basic idea of probing combinations of the controls and then excluding the
combinations which differ with more than 'filt' - if that is indeed the idea.
But I do not quite grasp how to determine the controlvector - I have my two
control samples loaded in my ExpressionSet (loaded from .CEL files along with my
other samples using justRMA()). I have tried to determine the controlvector as
the two columns containing my control samples using the exprs() function (e.g.
controlvector <- exprs(eset)[,3:4] (column 3 and 4 being the control samples) -
but I believe I am missing something here.
If you would care to elaborate, I would be very grateful.
Best,
Jacob
> > Best regards,
> > J.M. Jensen
> >
> > -- output of sessionInfo():
> >
> > R version 2.13.0 (2011-04-13)
> > Platform: i386-pc-mingw32/i386 (32-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] grid stats graphics grDevices utils datasets methods
> > [8] base
> >
> > other attached packages:
> > [1] venneuler_1.1-0 rJava_0.9-3 affycoretools_1.24.0
> > [4] KEGG.db_2.5.0 gplots_2.10.1 KernSmooth_2.23-4
> > [7] caTools_1.12 bitops_1.0-4.1 gdata_2.8.2
> > [10] gtools_2.6.2 GO.db_2.5.0 hgu133plus2.db_2.5.0
> > [13] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5
> > [16] hgu133plus2cdf_2.8.0 annotate_1.30.1 AnnotationDbi_1.14.1
> > [19] limma_3.8.3 affy_1.30.0 Biobase_2.12.2
> >
> > loaded via a namespace (and not attached):
> > [1] affyio_1.20.0 annaffy_1.24.0 biomaRt_2.8.1
> > [4] Biostrings_2.20.4 Category_2.18.0 gcrma_2.24.1
> > [7] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0
> > [10] GSEABase_1.14.0 IRanges_1.10.6 preprocessCore_1.14.0
> > [13] RBGL_1.28.0 RCurl_1.6-10.1 splines_2.13.0
> > [16] survival_2.36-5 tools_2.13.0 XML_3.4-2.2
> > [19] xtable_1.6-0
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at ...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list