[BioC] Creating a customized cut flow with the affy/limma packages

J.M.Jensen jacob.malte.jensen at hum-gen.au.dk
Tue Feb 14 12:43:12 CET 2012


James W. MacDonald <jmacdon at ...> writes:

> 
> Hi J.M.,
> 
> On 2/13/2012 9:35 AM, J.M.Jensen [guest] wrote:
> > Dear all,
> > I am trying to create a customized cut flow with the affy/limma packages. In 
order to assign different
> cut-off values between different arrays would be to first make one cut-off and 
then use the remaining
> expression set in further analysis. When working with the affy/limma packages, 
I read and normalize my
> data with the justRMA() function, fit a linear model with the limma package 
and use the decideTests() from
> the limma package to select probes with a specified cut-off (e.g. lfc). The 
problem is that I cannot assign
> multiple criteria for the cut flow. For example, if I want to remove all 
probes from my data set that differ
> more than a lfc of 1.2 between two controls, regardless of the expression 
values in the rest of the samples.
> After this cut, I would then be able to use another cut o
>  ff value, e.g. lfc=1.8 to select probes from different contrasts of the 
remaining data. I tried looking
> into the vennSelect() (affycoretools), because I can use it to select various 
contras!
>  ts
> >   and their intersection, however, I cannot figure how to assign different 
cut off criteria.
> > Any help or thoughts on this would be most appreciated.
> 
> It seems like you want to do two very different things here, but correct 
> me if I misunderstand. First, it appears that you want to remove all 
> probes that vary between controls.  Here I am assuming that you have one 
> set of controls, and you want to exclude any probeset that varies 
> between any two of the controls by 2^1.2 or greater. Second, you want to 
> select all probes that vary between various contrasts.
> 
> These are very different things, and I don't think there is a 
> one-size-fits-all solution. The way I understand it, you won't be able 
> to do anything with the fitted object to filter the first set of 
> probesets. Instead, you want to use the raw data.
> 
> You could set up a function, say filterControls() to do step 1.
> 
> filterControls <- function(controlvector, eset, filt){
> ## control vector is a numeric vector
> ## eset is an ExpressionSet (e.g., from justRMA())
> ## filt is a numeric log fold change to filter on
> require(gtools)
> comb <- combinations(length(controlvector), 2, controlvector)
> indlst <- lapply(1:nrow(comb), function(x) abs(eset[,comb[x,1]] - 
> eset[,comb[x,2]]) > filt)
> ind <- apply(do.call("cbind", indlst), 1, any)
> eset <- eset[!ind,]
> eset
> }
> 
> Then you would just go forward with the subsetted ExpressionSet object, 
> using decideTests() with whatever lfc you like, and vennSelect() to output.
> 
> Best,
> 
> Jim
> 

Hi Jim,
I believe I might have misunderstood your interpretation after all - It just 
struck me that this might actually not what I wanted exactly, although it is 
rather close - What I want in step 1 is to exclude probes that vary with more 
than e.g. 2^1.2 between two probes - however not just any two control probes - 
it should still be the same probes compared. Let me give an example:

Below are shown the two control samples with the RMA normalized expression 
values of five probes.

          A1319_03.CEL A1319_04.CEL
1007_s_at    10.623985    10.823174
1053_at       8.259183     8.720877
117_at        5.854332     6.126637
121_at        7.732116     7.797098
1255_g_at     2.456035     6.298101 

The only one probeset that should be excluded is the '1255_g_at', it does not 
matter that '1007_s_at' in the 'A1319_03.CEL' sample differs from '121_at' in 
either of the samples.

Again, I appreciate your help.

Best,
Jacob 


> > Best regards,
> > J.M. Jensen
> >
> >   -- output of sessionInfo():
> >
> > R version 2.13.0 (2011-04-13)
> > Platform: i386-pc-mingw32/i386 (32-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] grid      stats     graphics  grDevices utils     datasets  methods
> > [8] base
> >
> > other attached packages:
> >   [1] venneuler_1.1-0      rJava_0.9-3          affycoretools_1.24.0
> >   [4] KEGG.db_2.5.0        gplots_2.10.1        KernSmooth_2.23-4
> >   [7] caTools_1.12         bitops_1.0-4.1       gdata_2.8.2
> > [10] gtools_2.6.2         GO.db_2.5.0          hgu133plus2.db_2.5.0
> > [13] org.Hs.eg.db_2.5.0   RSQLite_0.9-4        DBI_0.2-5
> > [16] hgu133plus2cdf_2.8.0 annotate_1.30.1      AnnotationDbi_1.14.1
> > [19] limma_3.8.3          affy_1.30.0          Biobase_2.12.2
> >
> > loaded via a namespace (and not attached):
> >   [1] affyio_1.20.0         annaffy_1.24.0        biomaRt_2.8.1
> >   [4] Biostrings_2.20.4     Category_2.18.0       gcrma_2.24.1
> >   [7] genefilter_1.34.0     GOstats_2.18.0        graph_1.30.0
> > [10] GSEABase_1.14.0       IRanges_1.10.6        preprocessCore_1.14.0
> > [13] RBGL_1.28.0           RCurl_1.6-10.1        splines_2.13.0
> > [16] survival_2.36-5       tools_2.13.0          XML_3.4-2.2
> > [19] xtable_1.6-0
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at ...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list