[R] Should I use full models when using Powersim?

Tue Feb 12 10:28:11 CET 2019

I tried using powersim from R package simr to estimate the number of participants that I need for an experiment. I performed the simulation based on the data from my pilot study. The model I used is sketched below:

fit <- glmer(B ~ a+b+a:b
             (1+a+b+a:b|Subject) +
             (1+a+b+a:b|Item),
           family = binomial(link="logit"),
           data = data,
           control = glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000),
                                  tol = .0001))

in which Subject and Item mean the distinct id of subjects and items from the pilot study. I want to know test how the power of the interaction term (a:b) changes with the growth of the number of participants. The code I am using is:

fit2<- extend(fit, along="Subject", n = 84)
sim <- powerCurve(fit2, test = fcompare(~a+b), along = "Subject", breaks=c(48,60,72,84), nsim = 5000)
print(sim)

But the results of the simulation was rather bizarre. To begin with, the power of the interaction grew smaller when the number of participants increased from 72 to 84, which I believe is incompatible with the normal observation that the power increases with the number of participants. Second, I tried using the full random model to perform the simulation, but it is really slow (it took me weeks to get just one result). I was wondering if I can use a simpler random model to perform the simulation.

To reiterate my question: first, why my simulated power decreased with the increase of the number of participants? Is there something wrong with my code? Second, can I use a simpler random model for the simulation in order to save time? Thanks in advance!
------------------------
Chi Zhang

PhD Student
Department of Experimental Psychology
Ghent University
Henri Dunantlaan 2, B-9000 Gent, Belgium
Tel: +32 465386530
E-mail: chi.zhang using ugent.be

________________________________________
From: R-help <r-help-bounces using r-project.org> on behalf of r-help-request using r-project.org <r-help-request using r-project.org>
Sent: Saturday, February 9, 2019 12:00
To: r-help using r-project.org
Subject: R-help Digest, Vol 192, Issue 9

Send R-help mailing list submissions to
        r-help using r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
        r-help-request using r-project.org

You can reach the person managing the list at
        r-help-owner using r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-help digest..."

Today's Topics:

   1. Re: Randomization Test (Meyners, Michael)
   2. pattern evaluation in electron microscopy images (PIKAL Petr)
   3. why standardize the variables to perform LDA? (Tony Gozdz)

----------------------------------------------------------------------

Message: 1
Date: Fri, 8 Feb 2019 15:31:26 +0000
From: "Meyners, Michael" <meyners.m using pg.com>
To: Ogbos Okike <giftedlife2014 using gmail.com>, r-help
        <r-help using r-project.org>
Subject: Re: [R] Randomization Test
Message-ID:
        <BL0PR01MB4132FD4418FADCA82CBDC3349A690 using BL0PR01MB4132.prod.exchangelabs.com>

Content-Type: text/plain; charset="us-ascii"

Ogbos,

You do not seem to have received a reply over the list yet, which might be due to the fact that this seems rather a stats than an R question. Neither got your attachment (Figure) through - see posting guide.

I'm not familiar with epoch analysis, so not sure what exactly you are doing / trying to achieve, but some general thoughts:

* You do NOT want to restrict your re-randomizations in a way that "none of the dates corresponds with the ones in the real event" - actually, as a general principle, the true data must be an admissible re-randomization as well. You seem to have excluded that (and a lot of other randomizations at the same time which might have occurred, i.e. dates 1 and 2 reversed but all others the same), thereby rendering the test invalid. Any restrictions you have on your re-randomizations must've applied to the original randomization as well.
* If you have rather observational data (which I suspect, but not sure), Edgington & Onghena (2007) would rather refer to this as a permutation test - the difference being that you have to make strong assumptions (similar to parametric tests) on the nature of the data, which are designed-in to be true for randomization tests. It might be a merely linguistic discrimination, but it is important to note which assumptions have to be (implicitly) made.
* I'm not sure what you mean by "mean differences" of the events - is that two groups you are comparing? If so, that seems reasonable, but just make sure the test statistic you use is reasonable and sensitive against the alternatives you are mostly interested in. The randomization/permutation test will never proof that, e.g., means are significantly different, but only that there is SOME difference. By selecting the appropriate test statistic, you can influence what will pop up more easily and what not, but you can never be sure (unless you make strong assumptions about everything else, like in many parametric tests).
* For any test statistic, you would then determine the proportion of its values among the 5000 samples where it is as large or larger than the one observed (or as small or smaller, or either, depending on the nature of the test statistic and whether you aim for a one- or a two-sided test). That is your p value. If small enough, conclude significance. At least conceptually important: The observed test statistic is always part of the re-randomization (i.e. your 5000) - so you truly only do 4999 plus the one you observed. Otherwise the test may be more or less liberal. Your p value is hence no smaller than 1/n, where n is the total number of samples you looked at (including the observed one), a p value of 0 is not possible in randomization tests (nor in other tests, of course).

I hope this is helpful, but you will need to go through these and refer to your own setup to check whether you adhered to the principles or not, which is impossible for me to judge based on the information provided (and I won't be able to look at excessive code to check either).

Michael

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Ogbos Okike
> Sent: Montag, 28. Januar 2019 19:42
> To: r-help <r-help using r-project.org>
> Subject: [R] Randomization Test
>
> Dear Contributors,
>
> I conducting epoch analysis. I tried to test the significance of my result using
> randomization test.
>
> Since I have 71 events, I randomly selected another 71 events, making sure
> that none of the dates in the random events corresponds with the ones in
> the real event.
>
> Following the code I found here
> (https://www.uvm.edu/~dhowell/StatPages/R/RandomizationTestsWithR/R
> andom2Sample/TwoIndependentSamplesR.html),
> I combined these two data set and used them to generate another 5000
> events. I then plotted the graph of the mean differences for the 5000
> randomly generated events. On the graph, I indicated the region of the
> mean difference between the real 71 epoch and the randomly selected 71
> epoch.
>
> Since the two tail test shows that the mean difference falls at the extreme of
> the randomly selected events, I concluded that my result is statistically
> significant.
>
>
>
> I am attaching the graph to assistance you in you suggestions.
>
> I can attach both my code and the real and randomly generated events if you
> ask for it.
>
> My request is that you help me to understand if I am on the right track or no.
> This is the first time I am doing this and except the experts decide, I am not
> quite sure whether I am right or not.
>
> Many thanks for your kind concern.
>
> Best
> Ogbos
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

------------------------------

Message: 2
Date: Fri, 8 Feb 2019 09:53:58 +0000
From: PIKAL Petr <petr.pikal using precheza.cz>
To: "r-help using r-project.org" <r-help using r-project.org>
Subject: [R] pattern evaluation in electron microscopy images
Message-ID:
        <fb6c47fe3525477c9d63e53e9a633855 using SRVEXCHCM1302.precheza.cz>
Content-Type: text/plain; charset="iso-8859-2"

Dear all

I enclose 3 electron microscope images in which I would like to evaluate plane spacing.

Before I start to dig deeper and use trial and error in trying to find some packages/functions for such pattern evaluation in electron microscopy pictures I would like to ask if anybody could point me to suitable packages/functions.

I am aware of EBImage package for general purpose image manipulation, but it does not have such functionality.

Best regards
Petr

If images did not came through please use this link:
Stáhnout soubory<https://uschovna.agrofert.cz/dshosts/getfiles.aspx?fip=2475efa6-a77b-4ff6-8155-d9302e7b151b>.

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner's personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

------------------------------

Message: 3
Date: Fri, 8 Feb 2019 13:24:16 -0500
From: Tony Gozdz <tgozdz using gmail.com>
To: r-help using r-project.org
Subject: [R] why standardize the variables to perform LDA?
Message-ID:
        <CAO_D832BtyU=5T2j74NcFauf6ymjznE57iVf7SmOEENWNrOnAQ using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I understand the need to standardize the variables to perform PCA, but is
this a recommendation or necessity before running LDA?

        [[alternative HTML version deleted]]

------------------------------

Subject: Digest Footer

_______________________________________________
R-help using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

------------------------------

End of R-help Digest, Vol 192, Issue 9
**************************************