[R] problem about mean function in ffbase package
Milan Bouchet-Valat
nalimilan at club.fr
Fri Aug 2 22:00:35 CEST 2013
Le jeudi 01 août 2013 à 00:10 +0800, Chaos Chen a écrit :
> Hi all,
>
> I experienced some unmatched result using mean function in ffbase package
> and cannot figure out what's wrong.
>
> I have a simulated ff vector with 1000000000 numbers inside and want to
> calculate its mean. But the results are quite different.
>
> With mean( ) function in ffbase package, the mean is 152.6858.
> But with R's mean( ) or adding sum from chunks directly, I got 667.5595
>
> any idea ? Thank you in advance!
Could you provide a fully reproducible example with a shorter vector (I
cannot create such a large vector on my box)? Use set.seed() so that
runif() gives exactly the same values.
>From quick tests here, the problem does not appear.
Regards
> Bayes Chen
>
> # F1 is an ffdf , F1$X1 is an ff vector
> > length(F1$X1)
> [1] 1000000000
>
> # Use mean() function in ffbase package
> > mean(F1$X1)
> [1] 152.6858
>
> > X2 = F1$X1[] #X2 is now an non-ff vector
> > length(X2)
> [1] 1000000000
> > mean(X2) # R's original mean function for ordinary vectors
> [1] 667.5595
>
> # calculate sum and then mean by chunks
> > chunks = chunk(F1$X1, by=5000000)
> > sumx = 0
> > for (i in chunks) {
> + sumx = sumx + sum(F1$X1[i])
> + }
> > sumx/length(F1$X1)
> [1] 667.5595
>
> ----------------------------------- below are some other trials
> > X2 = F1$X1[1:1000000]
> > mean(X2)
> [1] 59.43149
> > mean(as.ff(X2))
> [1] 59.43149
>
> > X2 = F1$X1[1:100000000]
> > mean(X2)
> [1] 59.41978
> > mean(as.ff(X2))
> [1] 59.42128
>
> > X2 = F1$X1[1:500000000]
> > mean(X2)
> [1] 60.53615
> > mean(as.ff(X2))
> [1] 57.72168
>
> > X2 = F1$X1[1:750000000]
> > mean(X2)
> [1] 59.37562
> > mean(as.ff(X2))
> [1] 57.81179
>
> > X2 = F1$X1[1:900000000]
> > mean(X2)
> [1] 57.0867
> > mean(as.ff(X2))
> [1] 57.44862
>
> > X3 = F1$X1[900000000:1000000000]
> > mean(X3)
> [1] 6161.814
> > mean(as.ff(X3))
> [1] 6161.797
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list