[BioC] Creating a customized cut flow with the affy/limma packages

James W. MacDonald jmacdon at uw.edu
Tue Feb 14 15:16:41 CET 2012


Hi JM,

On 2/14/2012 6:00 AM, J.M.Jensen wrote:
> James W. MacDonald<jmacdon at ...>  writes:
>
>> Hi J.M.,
>>
>> On 2/13/2012 9:35 AM, J.M.Jensen [guest] wrote:
>>> Dear all,
>>> I am trying to create a customized cut flow with the affy/limma packages. In
> order to assign different
>> cut-off values between different arrays would be to first make one cut-off and
> then use the remaining
>> expression set in further analysis. When working with the affy/limma packages,
> I read and normalize my
>> data with the justRMA() function, fit a linear model with the limma package
> and use the decideTests() from
>> the limma package to select probes with a specified cut-off (e.g. lfc). The
> problem is that I cannot assign
>> multiple criteria for the cut flow. For example, if I want to remove all
> probes from my data set that differ
>> more than a lfc of 1.2 between two controls, regardless of the expression
> values in the rest of the samples.
>> After this cut, I would then be able to use another cut o
>>   ff value, e.g. lfc=1.8 to select probes from different contrasts of the
> remaining data. I tried looking
>> into the vennSelect() (affycoretools), because I can use it to select various
> contras!
>>   ts
>>>    and their intersection, however, I cannot figure how to assign different
> cut off criteria.
>>> Any help or thoughts on this would be most appreciated.
>> It seems like you want to do two very different things here, but correct
>> me if I misunderstand. First, it appears that you want to remove all
>> probes that vary between controls.  Here I am assuming that you have one
>> set of controls, and you want to exclude any probeset that varies
>> between any two of the controls by 2^1.2 or greater. Second, you want to
>> select all probes that vary between various contrasts.
>>
>> These are very different things, and I don't think there is a
>> one-size-fits-all solution. The way I understand it, you won't be able
>> to do anything with the fitted object to filter the first set of
>> probesets. Instead, you want to use the raw data.
>>
>> You could set up a function, say filterControls() to do step 1.
>>
>> filterControls<- function(controlvector, eset, filt){
>> ## control vector is a numeric vector
>> ## eset is an ExpressionSet (e.g., from justRMA())
>> ## filt is a numeric log fold change to filter on
>> require(gtools)
>> comb<- combinations(length(controlvector), 2, controlvector)
>> indlst<- lapply(1:nrow(comb), function(x) abs(eset[,comb[x,1]] -
>> eset[,comb[x,2]])>  filt)
>> ind<- apply(do.call("cbind", indlst), 1, any)
>> eset<- eset[!ind,]
>> eset
>> }
>>
>> Then you would just go forward with the subsetted ExpressionSet object,
>> using decideTests() with whatever lfc you like, and vennSelect() to output.
>>
>> Best,
>>
>> Jim
> Hi Jim.
>
> Thank you for your quick response. You understand me correct I believe, however,
> I have some trouble deciphering the solution you suggested. I believe I get the
> basic idea of probing combinations of the controls and then excluding the
> combinations which differ with more than 'filt' - if that is indeed the idea.
> But I do not quite grasp how to determine the controlvector - I have my two
> control samples loaded in my ExpressionSet (loaded from .CEL files along with my
> other samples using justRMA()). I have tried to determine the controlvector as
> the two columns containing my control samples using the exprs() function (e.g.
> controlvector<- exprs(eset)[,3:4] (column 3 and 4 being the control samples) -
> but I believe I am missing something here.

The controlvector just gives the columns of the exprs slot that contain 
controls. We then use these columns to get all combinations of 
comparisons (in your case it will just be 3 v 4). So something like

filterControls(3:4, eset, 1.2)

will do what you requested originally.

Best,

Jim


>
>
> If you would care to elaborate, I would be very grateful.
>
> Best,
> Jacob
>
>>> Best regards,
>>> J.M. Jensen
>>>
>>>    -- output of sessionInfo():
>>>
>>> R version 2.13.0 (2011-04-13)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] grid      stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>>    [1] venneuler_1.1-0      rJava_0.9-3          affycoretools_1.24.0
>>>    [4] KEGG.db_2.5.0        gplots_2.10.1        KernSmooth_2.23-4
>>>    [7] caTools_1.12         bitops_1.0-4.1       gdata_2.8.2
>>> [10] gtools_2.6.2         GO.db_2.5.0          hgu133plus2.db_2.5.0
>>> [13] org.Hs.eg.db_2.5.0   RSQLite_0.9-4        DBI_0.2-5
>>> [16] hgu133plus2cdf_2.8.0 annotate_1.30.1      AnnotationDbi_1.14.1
>>> [19] limma_3.8.3          affy_1.30.0          Biobase_2.12.2
>>>
>>> loaded via a namespace (and not attached):
>>>    [1] affyio_1.20.0         annaffy_1.24.0        biomaRt_2.8.1
>>>    [4] Biostrings_2.20.4     Category_2.18.0       gcrma_2.24.1
>>>    [7] genefilter_1.34.0     GOstats_2.18.0        graph_1.30.0
>>> [10] GSEABase_1.14.0       IRanges_1.10.6        preprocessCore_1.14.0
>>> [13] RBGL_1.28.0           RCurl_1.6-10.1        splines_2.13.0
>>> [16] survival_2.36-5       tools_2.13.0          XML_3.4-2.2
>>> [19] xtable_1.6-0
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at ...
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list