[R] Odp: Unexpected Behavior (potentially) in t.test
Petr PIKAL
petr.pikal at precheza.cz
Mon Jun 23 16:34:29 CEST 2008
Hi
I tried your code with R 2.8.0 devel, gave up after about 4 hours (more
or less t1N was 18) but until i interrupted it manually there was no
problem neither error. Maybe you could try it with new R version. Besides
I did not debug or profile your code so maybe you could try to debug it
yourself. Especially 3 nested loops seems to me quite ineffective together
with incremental Cgroup and Tgroup increasing.
Regards
Petr
petr.pikal at precheza.cz
724008364, 581252140, 581252257
r-help-bounces at r-project.org napsal dne 20.06.2008 20:55:08:
> Greetings,
>
> I have stumbled across some unexpected behavior (potential a bug) in,
what I
> suspect to be R's (2.6.2 on Ubuntu Linux) t.test function; then again
the
> problem may exist in my code. I have shutdown R and started it back up,
> re-run the code and re-experienced the error. I have searched on Google
for
> the abnormal termination error message "(stderr < 10 *
.Machine$double.eps *
> max(abs(mx), abs(my))) stop("data are essentially constant")" but only
found
> one instance, http://tolstoy.newcastle.edu.au/R/e2/help/07/06/18179.html
,
> but the discussion there did not seem particularly helpful.
>
> I've included all of my code, amateurish though it may be. I have not
> isolated the faulty part, and to me it all looks pretty simple, so I'm
not
> sure where I'm going wrong. For background, the goal of this code is to
run
> a simulation to explore the problem space of inflation of Type I error
when
> decisions to run or not to run more participants are made by preliminary
> looks at the data (as in Wagenmakers, 2007). This code is meant to
examine
> the problem space given that there is no true difference between the
groups
> (as is the case when both a generated from random draws from the normal
> distribution). I run an initial number of subjects in two groups (t1N)
then,
> if p is < .25 on the t-test I add t2N more subjects to each group. Then
I
> perform the t.test again. If the p was > .25 at time 1 I stop. Plainp is
> simply storing the p-values from t2 (if it was performed) or from t1 (if
t2
> was not performed). In the code I provide t1 starts at 16 since this is
> about when the problem becomes more frequent. Please note that it takes
> quite a long while to fail, and depending on what the true cause is it
may
> not fail at all. On my system it is failing before t1N advances to 17.
>
> Any suggestions as to how to avoid the error and instructions as to the
> cause of it would be appreciated. Thank you for your input and patience.
>
> logit <- function(p)
>
> {
>
> # compute and return logit of p;
>
> # if p=.5 then logit==0 else sign(logit)==(p>.5)
>
> return( log(p/(1-p)) )
>
> }
>
>
> antilogit <- function(x)
>
> {
>
> # compute and return antilogit of x;
>
> # this returns a proportion p for which logit(p)==x;
>
> return( exp(x)/(1+exp(x)) )
>
> }
>
>
> plainp <- c() #Clear the plainp value
>
> t1Nsim <- (100/5) * 1000 * 10 # random chance should provide 10000 cases
at
> t1
>
> contthreshold <- .25 #p value below which we run more subjects
>
> t1pvals <- rep(NA,t1Nsim) #clear the pvalues
>
> t2pvals <- rep(NA,t1Nsim) #clear the pvalues
>
> t1N <- 10 #for debugging
>
> t2N <- 5 #for debugging
>
>
> for (t1N in 16:50) #Outer loop testing possible values for t1N
>
> for (t2N in 1:50) #Inner loop testing possible values for t2N
>
> {
>
> print(paste("Checking with ",t1N," initial samples and ",t2N," extra
> samples",sep="")) #feedback
>
> for (lcv in 1:t1Nsim) #Run simulation t1Nsim times...
>
> {
>
> if (lcv %% 20000 == 0) {print(paste((lcv/t1Nsim)*100,"%",sep=""))}
#feedback
>
> Cgroup <- rnorm(t1N) #Initial random draw for Group1
>
> Tgroup <- rnorm(t1N) #Initial random draw for Group 2
>
> currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t1 p value
>
> t1pvals[lcv] <- currentp #Store t1 p value
>
> #If p >= .05 or <= continue threshold then run more subjects
>
> if ((currentp <= contthreshold) & (currentp >= .05)) {
>
> Cgroup <- c(Cgroup,rnorm(t2N)) #Add t2N subjects to group 1
>
> Tgroup <- c(Tgroup,rnorm(t2N)) #Add t2N subjects to group 2
>
> currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t2 p value
>
> t2pvals[lcv] <- currentp #store t2 p value
>
> }
>
> }
>
> plainp <- ifelse(!is.na(t2pvals),{t2pvals},{t1pvals}) #Make sure we are
> looking at the right ps
>
> table(t1pvals <= .05); round(summary(t1pvals),4) #debugging
>
> hist(t1pvals) #debugging
>
> table(plainp <= .05); round(summary(plainp),4) #debugging
>
> hist(plainp, probability=TRUE,main=paste(t1N,"then",t2N)) #Histogram of
> interest
>
> abline(a=1.00,b=0) #Baseline probability
>
> dev.copy(jpeg,filename=paste("Sim with ",t1N, "start samples and ",t2N,"
> extra samples.jpg",sep=""),height=600,width=800,bg="white") # Create the
> image
>
> dev.off() #Save the image
>
> chi <- rbind(table(t1pvals <= .05),table(plainp <= .05)) #debugging
>
> chisq.test(chi) #debugging
>
> explore <- data.frame(t1=t1pvals,t2=t2pvals,picked=plainp) #debugging
>
> t.test(explore$picked,explore$t1) #debugging
>
> t.test(logit(explore$picked),logit(explore$t1)) #debugging
>
> }
>
> ---
> Russell Pierce
> Psychology Department
> Graduate Student - Cognitive
> (951) 827-2553
> University of California, Riverside, 92521
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list