[BioC] Xps package : errors and RMA difference with Partek GS

Mon Apr 6 23:30:36 CEST 2009

Dear Arnaud,

As I said already in an earlier mail, when comparing xps and APT I get 
slightly different expression levels, which may be due to different 
probes used for background correction and quantile normalization since 
the results obtained for median-polish only are identical (see vignette 
APTvsXPS.pdf). For this reason the means are also slightly different, 
and thus also the p-values.

Furthermore, in order to compare xps and APT I needed to create a 
metaprobeset file using function "metaProbesets()" which I used with 
APT. This file assures that the same metacore probesets are used by both 
programs.

Please note that cannot comment on differences between xps and any 
commercial software, I can only compare xps to other open-source 
programs such as affy and APT. However, if you compare your software to 
APT and the results are identical, then this would answer your question.

Regarding fdr adjustment you will see that "validData(rma.ufr)" lists 
only the probesets which satisfy the condition pval<0.05. In order to 
see all probesets you need to do "validData(rma.ufr,"UnitName")" or 
simply "rma.ufr at data". (I must admit that I need to document this 
feature in the help file.) Then you should see also differences between 
p-value and p-adjusted. But I need to investigate further.

Best regards
Christian

arnaud Le Cavorzin wrote:
> Dear Christian,
>
> Thank you very much for your answer and for your reactivity.
>
> We have performed RMA normalization using exonlevel=c(16316,8192,8192) 
> that is to say using all probes for the background correction and only 
> metacore probes for the quantile normalization and summarization, so 
> the same options using by Partek ("core" in Partek corresponding to 
> "metacore" in xps, like you have suggested it).
>
> We get the same number of probeset with Partek and xps package, but 
> the results are differents for the two softwares.
> Even if it was better, we found 3539 genes with a p-value<0.05 with 
> xps, and 3337 genes with p-value<0.05 with Partek, and the results 
> remain different for the two softwares. We don't obtain the same 
> p-value, in particular because we don't obtain the same means.
>
> I have also imported the data.rma from xps to Partek, and performed 
> the t test with Partek : the results are the same than performing 
> unifilter with xps, we obtain the same p value than with xps and the 
> same means. So they are still different than the results using Partek 
> only.
> (Confirm that there is a difference with the probes used in Partek for 
> the RMA normalization)
>
> Another question : when I use fdr correction or no correction with 
> xps, the results are still the same. Only when I use bonferroni 
> correction the p-adjusted change. I don't understand why FDR 
> correction have no effect.
>
> /> 
> unifltr=UniFilter(unitest=c("t.test","two.sided","none",0,0.0,FALSE,0.95,TRUE),
> + foldchange=c(1,"both"),unifilter=c(0.05,"pval"))
> > 
> rma.ufr=unifilter(data.test2.rma,"tmpdt_HuextestUnifilterallmetmet",getwd(),logbase="log2",
> + unifltr,group=c("075","075","TEM","TEM"),verbose=FALSE)
> /
> /> 
> unifltr=UniFilter(unitest=c("t.test","two.sided","fdr",0,0.0,FALSE,0.95,TRUE),
> + foldchange=c(1,"both"),unifilter=c(0.05,"pval"))
> > 
> rma.ufr=unifilter(data.test2.rma,"tmpdt_HuextestUnifilterallmetmet",getwd(),logbase="log2",
> + unifltr,group=c("075","075","TEM","TEM"),verbose=FALSE)/
>
>
> Thanks
> Best regards,
>
> Arnaud
>
>
> > Date: Fri, 3 Apr 2009 20:25:15 +0200
> > From: cstrato at aon.at
> > To: arnaudlc at msn.com
> > CC: bioconductor at stat.math.ethz.ch; delphine.rossille at chu-rennes.fr
> > Subject: Re: [BioC] Xps package : errors and RMA difference with 
> Partek GS
> >
> > Dear Arnaud,
> >
> > Let me answer your questions individually:
> >
> > 1. Differential expression:
> > As you see, package xps is quite flexible and allows to use different
> > probes for background correction, quantile normalization and
> > summarization. This may have effects on the results you get with the
> > different settings. Thus, the most important question to determine 
> which
> > setting to choose, is:
> > Do you expect to find differentially expressed genes in your experiment
> > or not?
> > Since the different settings give different results the best way would
> > be to confirm one result or the other experimentally, e.g. by doing PCR
> > with some of the genes in question.
> >
> > 2. Number of "core" genes:
> > Please note that the number of "core" transcripts is defined in the
> > Affymetrix probeset annotation file. Only probesets with "level=Core"
> > are combined as "core" transcripts, and only the subset with
> > "crosshyb_type=unique" is used as "metacore" transcripts. Thus the
> > number of "core" transcripts depends on the annotation used. For the
> > current Affymetrix probeset annotation file version
> > "HuEx-1_0-st-v2.na28.hg18.probeset.csv" you get 18708 "core" 
> transcripts
> > and 17880 "metacore" transcripts. Since the difference is 828
> > transcripts, I assume that you are only using the "metacore" 
> transcripts
> > with Partek. Thus, in xps you need to use exonlevel="metacore".
> >
> > 3. Probes used for background correction:
> > Package xps contains an internal function exonLevel() which is used to
> > convert parameter "exonlevel" to an integer. Because of your question I
> > have decided to make this function public in version xps_1.2.9, which
> > you should be able to download on Monday. The corresponding helpfile
> > "?exonLevel" will explain the different integers to be used.
> > For your convenience the integers are: core (=8192+1024), extended
> > (=4096+512), full (=2048+256), ambiguous (=128), affx(=60). Thus
> > "core+extended+full" is 16128.
> >
> > Coincidently, the question which probes to be used for background
> > correction was recently asked also at:
> > 
> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/69fdc2757894f290#
> >
> > Best regards
> > Christian
> >
> >
> > arnaud Le Cavorzin wrote:
> > > Dear Christian,
> > >
> > > Thank you for your answer.
> > >
> > > We have tried to perform rma normalization using
> > > exonlevel=c(16316,9216,9216) that is to say using all of the probes
> > > for the background correction and core probes for the two last steps.
> > > And the results seems to be better. We found results closed to those
> > > obtained with Partek, and after performing t.test and fdr correction
> > > we found the same result with Partek and with xps : no differencially
> > > expressed genes even if the p-values and fdr values are a slightly
> > > different.
> > >
> > > In fact there is always 827 genes more with xps than with Partek, so
> > > it is maybe the explanation of these differences.
> > > So our question : Is it possible to perform rma using only
> > > core+extended+full probes for the background correction? If yes what
> > > is the "number" needs for this? (core => 9216, metacore => 8192, all
> > > => 16316)
> > >
> > > Thank you very much
> > > Best regards
> > >
> > > Arnaud
> > >
> > >
> > > > Date: Wed, 1 Apr 2009 21:31:54 +0200
> > > > From: cstrato at aon.at
> > > > To: arnaudlc at msn.com
> > > > CC: bioconductor at stat.math.ethz.ch; delphine.rossille at chu-rennes.fr
> > > > Subject: Re: [BioC] Xps package : errors and RMA difference with
> > > Partek GS
> > > >
> > > > Dear Arnaud,
> > > >
> > > > Yes, xps uses the same background algorithm as APT (and Partek) but
> > > uses
> > > > by default only the probes defined in "exonlevel" for background
> > > > correction and quantile normalization, i.e. in your case only the
> > > "core"
> > > > probes are used.
> > > >
> > > > However, xps offers the possibility to select the probes to be 
> used for
> > > > background correction, quantile normalization and summarization
> > > > individually. For RMA normalization you can do:
> > > >
> > > > data.rma <- rma(data.exon, "ExonRMAbq16core", filedir=datdir,
> > > > background="antigenomic",
> > > > normalize=T, option="transcript",
> > > > exonlevel=c(16316,16316,9216))
> > > >
> > > > This means that probes "core+extended+full+ambiguous+affx" 
> (=16316) are
> > > > used for background correction and quantile normalization,
> > > respectively,
> > > > and probes "core" (=9216) are used for summarization.
> > > >
> > > > As you will see, the results will be more similar to the results
> > > > obtained with APT (see also Figures 17 vs 15 in APTvsXPS.pdf). 
> However,
> > > > it is my belief that it is better to use the same probes for all 
> three
> > > > steps. Maybe this is reflected in your results of finding
> > > differentially
> > > > expressed genes?
> > > >
> > > > Best regards
> > > > Christian
> > > >
> > > >
> > > > arnaud Le Cavorzin wrote:
> > > > > Hi.
> > > > >
> > > > > Thank you for your anwser.
> > > > >
> > > > > I have downloaded the last version of xps and performed the
> > > > > metaProbesets with success.
> > > > >
> > > > >
> > > > > So if I understand, xps use the same background correction 
> algorithm
> > > > > than Partek or APT. But the difference between these softwares 
> seems
> > > > > to be caused by the probes used for the background correction.
> > > > > Moreover it seems to get a difference not only for the background
> > > > > correction, but for the quantile normalization too, again due 
> to the
> > > > > different probes.
> > > > >
> > > > > In fact I can't get the same results on my samples with xps 
> package
> > > > > and with Partek.
> > > > > With Partek we found no expression differences for all of the 
> genes
> > > > > (after fdr correction), while xps found a little more than 
> 1100 genes
> > > > > who are differentially expressed (after using prefilter() and
> > > > > unifilter() function, with the same options than in Partek).
> > > > >
> > > > > And I don't know why.
> > > > >
> > > > > Thank you for your help and your answer,
> > > > > Best regards
> > > > >
> > > > > Arnaud
> > > > >
> > > > >
> > > > > > Date: Tue, 31 Mar 2009 22:04:42 +0200
> > > > > > From: cstrato at aon.at
> > > > > > To: arnaudlc at msn.com
> > > > > > CC: bioconductor at stat.math.ethz.ch; 
> delphine.rossille at chu-rennes.fr
> > > > > > Subject: Re: [BioC] Xps package : errors and RMA difference with
> > > > > Partek GS
> > > > > >
> > > > > > Dear Arnaud,
> > > > > >
> > > > > > Let me first answer your question how to export the 
> background data
> > > > > > (although an example is shown in vignette xps.pdf p.15):
> > > > > >
> > > > > > # 1. compute background (or rma)
> > > > > > data.bg <- bgcorrect(data.exon, "ExonRMABgrdCore", 
> filedir=datdir,
> > > > > > method="rma", select="antigenomic", 
> option="pmonly:epanechnikov",
> > > > > > params=c(16384), exonlevel="core")
> > > > > >
> > > > > > # 2. find background treenames for data.bg
> > > > > > getTreeNames(rootFile(data.bg))
> > > > > >
> > > > > > # 3. get background for all trees
> > > > > > export.root(rootFile(data.bg), schemeFile(data.bg),
> > > setName(data.bg),
> > > > > > "*", "rbg", "fBg", "BgrdAll.txt")
> > > > > > # or get background intensity for e.g. tree "BreastA.rbg"
> > > > > > export.root(rootFile(data.bg), schemeFile(data.bg),
> > > setName(data.bg),
> > > > > > "BreastA", "rbg", "fBg", "BgrdBreastA.txt")
> > > > > >
> > > > > > In addition you can also export the background subtracted
> > > intensities:
> > > > > > # 4. get background corrected intensities for all trees
> > > > > > export.root(rootFile(data.bg), schemeFile(data.bg),
> > > setName(data.bg),
> > > > > > "*", "int", "fInten", "IntenAll.txt")
> > > > > >
> > > > > > However, please note that only the "core" probes are corrected,
> > > so I am
> > > > > > not sure if you can use these data in another program.
> > > > > >
> > > > > >
> > > > > > Some notes on background correction:
> > > > > >
> > > > > > Please note that the above background correction does NOT use
> > > > > > "antigenomic" probes for background correction. The parameter
> > > > > > select="antigenomic" does only define the kind of MM probes
> > > (although
> > > > > > they are not used in this case). Which probes are used as PM
> > > probes is
> > > > > > defined by exonlevel="core", which means that only "core" 
> probes are
> > > > > > used for background correction.
> > > > > >
> > > > > > As you know, in the RMA background algorithm observed PM 
> probes are
> > > > > > modeled as the sum of a normal noise component and an 
> exponential
> > > > > signal
> > > > > > component. Since in above case only "core" probes are 
> selected as PM
> > > > > > probes, only these probes are used for background correction.
> > > This may
> > > > > > be the reason why the background data differ from the 
> background
> > > data
> > > > > > computed by APT, as explained in vignette APTvsXPS.pdf.
> > > > > >
> > > > > > If you want to compute the background using "genomic" or
> > > "antigenomic"
> > > > > > probes and the APT algorithm based on GC content of these 
> probes
> > > then
> > > > > > you need to use:
> > > > > > data.bg <- bgcorrect(data.exon, "ExonGCBgrdCore", 
> filedir=datdir,
> > > > > > method="gccontent", select="antigenomic", option="attenuatebg",
> > > > > > params=c(0.4, 0.005, -1.0), exonlevel="core")
> > > > > > or the dedicated function:
> > > > > > data.bg <- bgcorrect.gc(data.exon, "ExonGCBgrdCore", 
> filedir=datdir,
> > > > > > select="antigenomic", exonlevel="core")
> > > > > >
> > > > > >
> > > > > > Maybe one note on processing time:
> > > > > > My main goal was to allow processing of exon arrays on 
> computers
> > > with
> > > > > > 1GB RAM only, and to allow access to all interim data such as
> > > > > background
> > > > > > intensities and background-corrected probe intensities. Thus,
> > > all these
> > > > > > data are stored as root trees, which means that saving all these
> > > > > interim
> > > > > > data on HD is probably the time-consuming step.
> > > > > >
> > > > > > I hope that I could answer your questions.
> > > > > >
> > > > > > Best regards
> > > > > > Christian
> > > > > >
> > > > > >
> > > > > > arnaud Le Cavorzin wrote:
> > > > > > > Hi
> > > > > > >
> > > > > > > Thank you for your answer, I will download your last version
> > > when it
> > > > > > > will be available.
> > > > > > >
> > > > > > > We found that the difference between Partek GS and xps 
> package
> > > is the
> > > > > > > background correction.
> > > > > > > And Partek's support confirmed that Partek GS doesn't use
> > > genomic or
> > > > > > > antigenomic background correction but correction like 
> described by
> > > > > > > Professor Bolstad.
> > > > > > >
> > > > > > > But I have an other problem : I would like to perform 
> background
> > > > > > > correction with xps, using the function bgcorrect(), and 
> after to
> > > > > > > export the data.bg.rma for performing the normalization and
> > > > > > > summarization with Partek GS (that take less time than xps)
> > > > > > > I have tried with function export.expr(), export.data and 
> export()
> > > > > > > without success.
> > > > > > >
> > > > > > > How can I do to extract the data.bg.rma into a .txt file, if
> > > it was
> > > > > > > possible of course?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Best regards
> > > > > > >
> > > > > > > Arnaud
> >
>
> ------------------------------------------------------------------------
> Vous voulez savoir ce que vous pouvez faire avec le nouveau Windows 
> Live ? Lancez-vous ! 
> <http://www.microsoft.com/windows/windowslive/default.aspx>