[BioC] loged data or not loged previous to use normalize.quantile

Naomi Altman naomi at stat.psu.edu
Wed Apr 6 02:57:01 CEST 2005

```Neither Rhonda nor I had it quite right.

Permutation tests require "exchangeability under the null hypothesis" which
means that when the null hypothesis is true, the distribution of the test
statistic does not depend on the treatment to which the data are assigned.

Independence is not enough - e.g. If the data in one group are iid N(a,5)
and in the other group are iid N(b,25) then the permutation distribution of
the t-statistic under the hypothesis a=b does not provide an appropriate
null distribution.

But if exchangeability under the null is true for a transformation not
depending on the mean, permutation tests will be correct.

--Naomi

At 11:51 AM 4/5/2005, Rhonda DeCook wrote:
>With respect to permutations tests...
>
>I'm under the impression that you only need independence, not the
>assumption of
>constant variance.
>
>The permutation test provides us with a distribution of the test statistic
>under the null hypothesis (equal means in the 2-sample scenario, i.e. all
>data
>was generated from one distribution-even though it may be an ugly looking
>single distribution).  As long as all 'groupings' of the data into 2
>groups are
>equally likely (which is provided by the independence assumption) this
>permutation distribution of the test statistic (e.g. a t-statistic here)gives
>us an idea of the test statistic's distribution under the null without the
>assumption of normality or constant variance.  Computing a permutation
>p-value
>from this null distribution provides a p-value that has the usual behavior
>under the null, or Uniform(0,1) though in a discrete manner.  When the
>alternative is true, the distribution of the p-value will have more mass near
>zero tha the Uniform(0,1).
>
>If this logic doesn't apply to the microarray setting, please let me know.
>
>Rhonda
>
>
>
>
>
> > I just want to remind people that permutation tests, rank tests, etc still
> > require i.i.d. errors.  So the variance needs to be stabilized even  for
> > nonparametric tests.
> >
> > --Naomi
> >
> > At 01:32 PM 4/4/2005, Fangxin Hong wrote:
> > >Hi Marcelo;
> > >As what Wolfgang mentioned, non-parametric permutation test is an option
> > >when t-distribution assumption is not valid.  But if you have few
> > >replications (2-3), most permutation tests don't have power either. I
> > >would suggest you try RankProd package, which would be powerful enough to
> > >detect differentially expressed genes with 2 replications.
> > >
> > >Bests;
> > >Fangxin
> > >
> > >
> > >
> > > > Hi Marcelo,
> > > >
> > > > the difference is that the power of the test you are doing can be
> > > > different when you consider the data on the "raw" or on the
> > > > log-transformed scale.
> > > >
> > > > Also, the p-value calculated by limma is based on the assumption that
> > > > the null-distribution of the test statistic is given by a
> > > > t-distribution; this assumption might be more or less true in both
> cases.
> > > >
> > > > You are really doing two different tests: test A, say, consists of
> > > > applying the t-statistic to the untransformed intensities, test B, say,
> > > > applying the t-statistic to the transformed intensities.
> > > >
> > > > Then, if you want to use the t-distribution for getting p-values, you
> > > > need to make sure that the null distribution of your test statistic
> > > > is indeed (to good enough approximation) t-distributed. You can do this
> > > > e.g. by permutations. For that you need either a large number of
> > > > replicates, or to pool variance estimators across genes.
> > > >
> > > > If you don't want to make a parametric assumption for getting p-values,
> > > > you need a larger number of replicates; if you have these, you can for
> > > > example calculate a permutation p-value.
> > > >
> > > > So, there is really no "right" or "wrong" about transforming, or which
> > > > transformation -- as long as you don't violate the assumptions of the
> > > > subsequent tests. If the assumptions are met, then the procedure with
> > > > the highest power is preferable. And that depends very much on your
> data
> > > > (about which you have not told us much.)
> > > >
> > > > Hope that helps.
> > > >
> > > > And here is another shameless plug: have a look at this paper:
> > > > Differential Expression with the Bioconductor Project
> > > > http://www.bepress.com/bioconductor/paper7
> > > >
> > > >    Best wishes
> > > >     Wolfgang
> > > >
> > > > Marcelo Luiz de Laia wrote:
> > > >> Dear Bioconductors Friends,
> > > >>
> > > >> I have a question that I dont found answer for it. Please, if you
> have a
> > > >> paper/article that explain it, please, tell me.
> > > >>
> > > >> I normalize our data using normalize.quantile function.
> > > >>
> > > >> If I previous transform our intensities (single channel) in log2,
> I dont
> > > >> get differentially genes in limma.
> > > >>
> > > >> But, if I dont transform our data, I get some genes with p.value
> around
> > > >> 0.0001, thats is great!
> > > >>
> > > >> Of course, when I transform the intensities data to log2, I get
> some NA.
> > > >>
> > > >> Why are there this difference? Am I wrong in does an analysis with not
> > > >> loged data?
> > > >>
> > > >> Thanks a lot
> > > >>
> > > >> Marcelo
> > > >>
> > > >> _______________________________________________
> > > >> Bioconductor mailing list
> > > >> Bioconductor at stat.math.ethz.ch
> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > >
> > > > --
> > > > Best regards
> > > >    Wolfgang
> > > >
> > > > -------------------------------------
> > > > Wolfgang Huber
> > > > European Bioinformatics Institute
> > > > European Molecular Biology Laboratory
> > > > Cambridge CB10 1SD
> > > > England
> > > > Phone: +44 1223 494642
> > > > Fax:   +44 1223 494486
> > > > Http:  www.ebi.ac.uk/huber
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >
> > > >
> > >
> > >
> > >--
> > >Fangxin Hong, Ph.D.
> > >Plant Biology Laboratory
> > >The Salk Institute
> > >10010 N. Torrey Pines Rd.
> > >La Jolla, CA 92037
> > >E-mail: fhong at salk.edu
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor at stat.math.ethz.ch
> > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> > Naomi S. Altman                                814-865-3791 (voice)
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics                              814-863-7114 (fax)
> > Penn State University                         814-865-1348 (Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

```