[BioC] Is GCRMA influenced by lexicographical order?

Mon Nov 8 17:29:44 CET 2010

Hi Henrik
   It's not your bad memory. We had used a seed in estimating bg 
parameters but later avoided random sampling by sampling fixed quantiles 
from the affinities.

On 11/8/2010 11:15 AM, Henrik Bengtsson wrote:
> Hi,
>
> It seem to be more correct to say that it depends on the ordering of
> the arrays loaded, which is in turn by default loaded in lexicographic
> ordering.
>
> On Mon, Nov 8, 2010 at 7:50 AM, Wolfgang Huber<whuber at embl.de>  wrote:
>> Hi Markus,
>>
>> Jean or Rafa will be able to give a more competent response, but as far as I
>> can see from the code, there are some places where the first array is
>> treated specially:
>>
>> ~/madman/Rpacks/gcrma/R$ grep ",1]" *
>> gcrma.R:      index2=which(!is.na(anc[,1]))
>> gcrma.engine2.R:    index.affinities<- which(!is.na(pm.affinities[,1]))
>> justGCRMA.R:       mm<- read.probematrix(filenames=filenames[i],
>> which="mm")$mm[,1]
>> justGCRMA.R:    mm<- read.probematrix(filenames=filenames[i],
>> which="mm")$mm[,1]
>
> The latter to lines only reads one array, so there [,1] is just to
> pull out the signals, I think.
>
>
> Note also that the GCRMA algorithm is using a random subset to
> estimate some the parameters, which could affect this too.  However,
> looking at the devel code for GCRMA is see that there are
> "set.seed(1)" calls before each sample() call, i.e. they are no longer
> "random subsets".
>
> I though this was discussed a long time ago and all fixing of the
> seeds were dropped, but now it's back again.  Is my bad memory playing
> me a trick?
>
> /H
>
>>
>> I agree that this is not the most desirable feature.
>>
>> Best wishes
>>         Wolfgang
>>
>>
>> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto:
>>>
>>> Dear all,
>>>
>>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained
>>> in the gcrma package.
>>>
>>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of
>>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I
>>> rename both files in a way, which changes the lexicographical order,
>>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II).
>>>
>>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for
>>> some probe sets I get different values for A.cel (in CASE I) than for
>>> BA.cel (in CASE II), although both files contain the same data. Of
>>> course, for B.cel and AB.cel the same phenomenon becomes obvious.
>>>
>>> How can this be possible? To me, it seems as if GCRMA normalizes the
>>> data with respect to the lexicographical order, but as far as I know,
>>> GCRMA treats each array independently.
>>>
>>> This effect is not reproducible using call.exprs() with argument
>>> algorithm="rma" or "mas5". But is reproducible for other arrays like
>>> HGU95A, for example.
>>>
>>> Please, can anybody try to explain me, if this effect is a bug or a
>>> feature (and why)?
>>>
>>> Best wishes,
>>> Markus
>>>
>>>
>>>
>>> #################################################################
>>>
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>> other attached packages:
>>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4]
>>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0
>>> Biobase_2.10.0
>>> loaded via a namespace (and not attached):
>>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2
>>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0
>>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6
>>>
>>> Also tried on other machine with similar packages
>>>
>>> sessionInfo()
>>> R version 2.11.1 Patched (2010-09-16 r52943)
>>> Platform: i686-pc-linux-gnu (32-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
>>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1
>>> [5] Biobase_2.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2
>>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17
>>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1
>>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6
>>>
>>>
>>> #################################################################
>>>
>>> Code example
>>>
>>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel
>>> CEL<- ReadAffy()
>>> bag.1<- bg.adjust.gcrma(CEL)
>>> ebag.1<- exprs(bag.1)
>>>
>>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel
>>> CEL<- ReadAffy()
>>> bag.2<- bg.adjust.gcrma(CEL)
>>> ebag.2<- exprs(bag.2)
>>>
>>> par(mfrow=c(2,1))
>>> # differences should be zero
>>> plot(ebag.1[,1]-ebag.2[,2])
>>> plot(ebag.1[,2]-ebag.2[,1])
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
------------------------------------
Zhijin (Jean) Wu
Assistant Professor of Biostatistics
Brown University, Box G-S121
Providence, RI  02912

Tel: 401 863 1230
Fax: 401 863 9182
http://www.stat.brown.edu/zwu