[BioC] Discrepancy on results from gcrma function and justGCRMA
Henrik Bengtsson
hb at stat.berkeley.edu
Fri Feb 5 03:50:15 CET 2010
FYI,
the aroma.affymetrix package can reproduce the gcrma() estimates with
great precision, cf.
http://www.aroma-project.org/replication/gcRMA
It does this by utilizing the gcrma functions while running in bounded
memory (~1 GB RAM) regardless of the number of arrays.
/Henrik
On Thu, Feb 4, 2010 at 2:56 PM, James W. MacDonald
<jmacdon at med.umich.edu> wrote:
> Hi Jenny,
>
> It wasn't posted to the list because Jin got tired of waiting for a response
> from the list and emailed me directly, so the correspondence was just
> between the two of us.
>
> It was my oversight to keep the conversation off-list. Below is the
> transcript of our conversation, for the archives (and apologies for keeping
> this 'secret').
>
> Hi Jin,
>
> I just checked in a modification to justGCRMA that results in identical
> output as compared to gcrma. There have been quite a few modifications to
> gcrma since I originally wrote justGCRMA, and it is no longer possible for
> justGCRMA to do everything that one can do with gcrma.
>
> Specifically, in the past gcrma used the MM probes on a chip to estimate
> background, but this has been changed so the default is to use internal
> estimates that come with the package. The functionality to use MM probes (or
> a subset thereof) still exists, but it is too much work to modify justGCRMA
> to use the MM probes, not to mention the headache of trying to maintain the
> code. So justGCRMA will only use the internal NSB estimates that come with
> the package, and will give identical results to gcrma as long as you use the
> internal NSB estimates for both functions.
>
> The new version of gcrma is 2.18.1, and should be available for installation
> within the next couple of days.
>
> Best,
>
> Jim
>
>
>>>> >>> Jin <jinxing1986 at gmail.com> wrote:
>> > Hello Dr. MacDonald,
>> >
>> > I was wondering if you could help me with a problem. I've ran my
>> > dataset through GCRMA and justGCRMA but I've gotten different
>> > numerical results from both. I was under the impression that these two
>> > functions were identical except justGCRMA was more memory efficient
>> > because it did not have to create an AffyBatch object. I used the
>> > default parameters for both meaning I did not include any parameters
>> > at all when calling gcrma() and justgcrma().
>> >
>> > Are they suppose to give identical numerical results?
>> >
>> > For example, below are some sample results. The first row is GCRMA and
>> > the second row is justGCRMA. The columns are my 8 arrays. JustGCRMA
>> > values are consistently higher than GCRMA values whereas I would have
>> > expected similar numbers matching the top and bottom rows.
>> >
>> > 3.58 3.80 3.83 4.52 3.91 5.17 3.78 6.45
>> > 4.79 4.73 5.21 5.68 5.44 5.88 5.11 7.76
>> >
>> >
>> >
>> > Thanks,
>> > Jin
>> > University of California, San Diego
>
>
> Jenny Drnevich wrote:
>>
>> Thanks Jim,
>>
>> Thanks for the info. I found the question that was posted last last year
>> (and responded to it to start the thread again), but I still can't find your
>> response on BioC. Perhaps it accidently didn't get sent back to the list?
>>
>> :)
>> Jenny
>>
>> At 04:31 PM 2/3/2010, James W. MacDonald wrote:
>>>
>>> Hi Jenny,
>>>
>>> Jenny Drnevich wrote:
>>>>
>>>> Hi everyone,
>>>> I just spent several hours tracking down this problem, as I noticed this
>>>> same discrepancy between the results of justGCRMA() and gcrma() called on
>>>> the same data. There was also another post about this on Nov 4, 2008, but I
>>>> couldn't find where either of them had ever been answered.
>>>
>>> I think I missed that one, but the question was asked again and answered
>>> by me late last year.
>>>
>>>> HOWEVER, it appears that the discrepancy has been fixed in R 2.10.1
>>>> (gcrma 2.18.1), but it was in R 2.10.0 (gcrma 2.18.0) (examples and
>>>> sessionInfos below).
>>>> While I'm glad it has been "fixed", where would this have been
>>>> documented? I didn't think this type of change would have been included in a
>>>> minor version upgrade. What was the explanation for the original
>>>> discrepancy?
>>>
>>> The explanation is that justGCRMA() is a function that I wrote several
>>> years ago. In the intervening period the defaults for gcrma() were changed
>>> without making the corresponding changes to justGCRMA().
>>>
>>> I made the (unfounded) assumption that the maintainer of the gcrma
>>> package (Jean Wu) was keeping justGCRMA() consistent with gcrma(), which as
>>> you have seen is not true. Since this was an oversight by both me and Jean,
>>> it wasn't documented anywhere.
>>>
>>> I have since made the two functions consistent when running under the
>>> default arguments. However, there are some arguments for gcrma() that are
>>> not available for justGCRMA(), so they are not consistent in all cases.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>> Thanks,
>>>> Jenny
>>>> #The examples below are not reproducible because you need .CEL files -
>>>> run them on any that you have.
>>>> > library(gcrma)
>>>> Loading required package: affy
>>>> Loading required package: Biobase
>>>> Welcome to Bioconductor
>>>> Vignettes contain introductory material. To view, type
>>>> 'openVignette()'. To cite Bioconductor, see
>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>> > setwd("K:/Bulla/CELfiles")
>>>> > raw <- ReadAffy()
>>>> > raw
>>>> AffyBatch object
>>>> size of arrays=834x834 features (10 kb)
>>>> cdf=Rat230_2 (31099 affyids)
>>>> number of samples=6
>>>> number of genes=31099
>>>> annotation=rat2302
>>>> notes=
>>>> >
>>>> > gcrma.1 <- gcrma(raw)
>>>> Adjusting for optical effect......Done.
>>>> Computing affinitiesLoading required package: AnnotationDbi
>>>> .Done.
>>>> Adjusting for non-specific binding......Done.
>>>> Normalizing
>>>> Calculating Expression
>>>> >
>>>> > gcrma.2 <- justGCRMA()
>>>> Computing affinities.Done.
>>>> Adjusting for optical effect.......Done.
>>>> Adjusting for non-specific binding......Done.
>>>> Normalizing
>>>> Calculating Expression
>>>> >
>>>> > all.equal(exprs(gcrma.1),exprs(gcrma.2))
>>>> [1] "Mean relative difference: 0.03514035"
>>>> >
>>>> > sessionInfo()
>>>> R version 2.10.0 (2009-10-26)
>>>> i386-pc-mingw32
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252
>>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>> other attached packages:
>>>> [1] rat2302probe_2.5.0 AnnotationDbi_1.8.1 rat2302cdf_2.5.0
>>>> gcrma_2.18.0
>>>> [5] affy_1.24.2 Biobase_2.6.0
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.14.0 Biostrings_2.14.3 DBI_0.2-4
>>>> IRanges_1.4.4
>>>> [5] preprocessCore_1.8.0 RSQLite_0.7-3 splines_2.10.0
>>>> tools_2.10.0
>>>> >
>>>> #Now switch R versions but run same code....
>>>> > library(gcrma)
>>>> Loading required package: affy
>>>> Loading required package: Biobase
>>>> Welcome to Bioconductor
>>>> Vignettes contain introductory material. To view, type
>>>> 'openVignette()'. To cite Bioconductor, see
>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>> > setwd("K:/Bulla/CELfiles")
>>>> > raw <- ReadAffy()
>>>> > raw
>>>> AffyBatch object
>>>> size of arrays=834x834 features (10 kb)
>>>> cdf=Rat230_2 (31099 affyids)
>>>> number of samples=6
>>>> number of genes=31099
>>>> annotation=rat2302
>>>> notes=
>>>> >
>>>> > gcrma.1 <- gcrma(raw)
>>>> Adjusting for optical effect......Done.
>>>> Computing affinitiesLoading required package: AnnotationDbi
>>>> .Done.
>>>> Adjusting for non-specific binding......Done.
>>>> Normalizing
>>>> Calculating Expression
>>>> >
>>>> > gcrma.2 <- justGCRMA()
>>>> Computing affinities.Done.
>>>> Adjusting for optical effect.......Done.
>>>> Adjusting for non-specific binding......Done.
>>>> Normalizing
>>>> Calculating Expression
>>>> >
>>>> > all.equal(exprs(gcrma.1),exprs(gcrma.2))
>>>> [1] TRUE
>>>> >
>>>> > sessionInfo()
>>>> R version 2.10.1 (2009-12-14)
>>>> i386-pc-mingw32
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252
>>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>> other attached packages:
>>>> [1] rat2302probe_2.5.0 AnnotationDbi_1.8.1 rat2302cdf_2.5.0
>>>> gcrma_2.18.1
>>>> [5] affy_1.24.2 Biobase_2.6.1
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.14.0 Biostrings_2.14.10 DBI_0.2-5
>>>> IRanges_1.4.9
>>>> [5] preprocessCore_1.8.0 RSQLite_0.8-0 splines_2.10.1
>>>> tools_2.10.1
>>>> >
>>>>
>>>>
>>>> At 04:31 PM 10/28/2009, Jin wrote:
>>>>>
>>>>> Jerry <norn2k at ...> writes: > > Hi, > Â > I'm currently using gcrma
>>>>> package in R to summarize probeset intensities from CEL files. Surprisingly,
>>>>> > IÂ found that the results generated from gcrma function and justGCRMA
>>>>> function are quite different. In > general, expression values from gcrma
>>>>> function are lower and the boxplots from gcrma are quite > asymmetric (no
>>>>> tails on the bottom). I attached some plots below for your information.
>>>>> This confused > me as I thought that the two functions implemented similar
>>>>> algorithms. >  > Thank you so much for your help! >  > Best, > Jianjun
>>>>> > > PS: The package I used is gcrma 2.12.1. I also observed similar results
>>>>> on 2.14 version. > > Hello, I'm running into the same problem with
>>>>> gcrma and justGCRMA. They are not giving the same numeric results. Is this
>>>>> an intended feature? I was under the impression that these two methods were
>>>>> identical with the exception that justGCRMA was more memory efficient
>>>>> because it didn't have to create an AffyBatch object. Can anyone shed some
>>>>> light on this subject? Thanks, Jin
>>>>> _______________________________________________ Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>> Jenny Drnevich, Ph.D.
>>>> Functional Genomics Bioinformatics Specialist
>>>> W.M. Keck Center for Comparative and Functional Genomics
>>>> Roy J. Carver Biotechnology Center
>>>> University of Illinois, Urbana-Champaign
>>>> 330 ERML
>>>> 1201 W. Gregory Dr.
>>>> Urbana, IL 61801
>>>> USA
>>>> ph: 217-244-7355
>>>> fax: 217-265-5066
>>>> e-mail: drnevich at illinois.edu
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Douglas Lab
>>> University of Michigan
>>> Department of Human Genetics
>>> 5912 Buhl
>>> 1241 E. Catherine St.
>>> Ann Arbor MI 48109-5618
>>> 734-615-7826
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and should not
>>> be used for urgent or sensitive issues
>>
>> Jenny Drnevich, Ph.D.
>>
>> Functional Genomics Bioinformatics Specialist
>> W.M. Keck Center for Comparative and Functional Genomics
>> Roy J. Carver Biotechnology Center
>> University of Illinois, Urbana-Champaign
>>
>> 330 ERML
>> 1201 W. Gregory Dr.
>> Urbana, IL 61801
>> USA
>>
>> ph: 217-244-7355
>> fax: 217-265-5066
>> e-mail: drnevich at illinois.edu
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
> used for urgent or sensitive issues
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list