[R] Loss of numerical precision from conversion to list ?
Fabian Scheipl
f.abian at gmx.net
Thu Jul 20 23:09:19 CEST 2006
I´m working on an R-implementation of the simulation-based finite-sample null-distribution of (R)LR-Test in Mixed Models (i.e. testing for Var(RandomEffect)=0) derived by C. M. Crainiceanu and D. Ruppert.
I'm in the beginning stages of this project and while comparing quick and dirty grid-search-methods and more exact optim()/optimize()-based methods to find the maximum of a part of the RLR-Test-Statistic i stumbled upon the following problem:
It seems to me that R produces different results depending on whether originally identical numbers involved in the exact same computations are read from a matrix or a list.
(I need both in order to do quick vectorized computation for the grid-search with matrices and "list-based" computation so that i can put the function to be maximized in something like mapply(...,optim(foo),...)- I can elaborate if desired)
However, the problem goes away once a number involved in the computation is set from almost zero (e-15) to 4.
I'm completely mystified by this; especially since this number that I change is NOT one of the numbers that are switched from matrix to list.
Here's the code:
library(nlme)
data(Orthodont) #108 dental measurements on 27 subjects
# m1<-lme(distance~age,random=~1|Subject,data=Orthodont)
# summary(m1)
# ...
# Random effects:
# Formula: ~1 | Subject
# (Intercept) Residual
# StdDev: 2.114724 1.431592 -> lambda.REML=2.114^2/1.431^2 = 2.182382
#DesignMatrix for fixed Effects
X<-cbind(rep(1,108),Orthodont$age)
#DesignMatrix of RandomEffects
Z<-matrix(data=c(rep(1,4),rep(0,108)),nrow=108,ncol=27)
#Corr(RanEf)^0.5 = 27 x 27 Identity, since RandomIntercepts are independent
sqrt.Sigma<-diag(27)
K<-27 #number of subjects/ random intercepts
n<-nrow(X)
p<-ncol(X)
lambda0 <- 2.182382 #actually not a sensible choice as Null-Hypothesis, but that doesn't pertain to the problem
#Projection-Matrix for Fixed-Effects-Model: Y -> errors
P0=diag(n)-X%*%solve((t(X)%*%X))%*%t(X)
mu<-eigen(sqrt.Sigma%*%t(Z)%*%P0%*%Z%*%sqrt.Sigma)$values
# mu
# [1] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00
#[11] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00
#[21] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 5.77316e-15
# ! Notice the last (27th) value very close to 0
nsim<-10
set.seed(10)
#nsim x K array of ChiSq(1)-variates
w.k.sq.mat<-matrix(rchisq(nsim*K,1),nrow=nsim)
#nsim x 1 array of ChiSq(n-p-K)-variates
w.sum2<-rchisq(nsim,n-p-K)
### vectorized computation of nsim=10 realizations
### of a part of the RLR-statistic under the Null:
w.k.sq<- cbind(w.k.sq.mat,w.sum2) #nsim x (K+1)
#vector-based results:
num.v<- rowSums(((lambda-lambda0)*mu*w.k.sq[,-(K+1)])/(1+lambda*mu))
den.v<- rowSums(((1+lambda0*mu)*w.k.sq[,-(K+1)]) / (1+lambda*mu)) + w.k.sq[,K+1]
### list-based computation of nsim=10 realizations
### of a part of the RLR-statistic under the Null:
w.k.sq<-list()
length(w.k.sq)<-nsim
#put the nsim rows into list-slots:
for(i in 1:nsim) w.k.sq[[i]]<-c(w.k.sq.mat[i,],w.sum2[i])
num.l<-numeric(0)
den.l<-numeric(0)
for(i in 1:nsim)
{
num.l[i]<-sum(((lambda-lambda0)*mu*w.k.sq[[i]][-(K+1)])/(1+lambda*mu))
#exactly analogous to num.v & den.v, except list-elements instead of vector
den.l[i]<-sum(((1+lambda0*mu)*w.k.sq[[i]][-(K+1)]) / (1+lambda*mu)) + w.k.sq[[i]][K+1]
}
# Now the actual problem:
# notice the discrepancies between the results from vectorized computation
# and the results from list-based computation
# Since discrepancies disappear if mu[27] is changed
# from 5.77316e-15 to 4, i'm guessing somewhere in the conversion to
# "list" there must be a loss of precision or is there an entirely
# different problem?
num.l
# [1] -25.93322 -17.65486 -18.80239 -19.49974 ....
num.v
# [1] -23.84733 -17.62233 -27.22975 -19.50294 ....
den.l
# [1] 117.30246 92.59041 92.91491 112.90113 ...
den.v
# [1] 115.21657 92.55789 101.34228 112.90433 ...
#now i set
mu[27]<-4
#and reran the computation of num.l /.v and den.l /.v from above:
num.l
# [1] -26.25565 -17.67423 -27.47259 -20.97961 ...
num.v
# [1] -26.25565 -17.67423 -27.47259 -20.97961 ...
den.l
# [1] 117.62489 92.60979 101.58511 114.38100 ...
den.v
# [1] 117.62489 92.60979 101.58511 114.38100 ...
what i would like to know now is:
1) which of the two calculations yields a more precise result?
or rather:
2) how can i avoid these discrepancies in the future since i need to be able to compare these two methods?
and, most importantly,
3) what in R.A.Fisher's name is happening here?
version information:
Version 2.3.1 (2006-06-01)
i386-pc-mingw32
.Machine$double.eps is 2.220446e-16 (does it matter?)
thanks for your time,
--
Fabian Scheipl
f.abian at gmx.net
"Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ...
More information about the R-help
mailing list