[R] acf gives correlations > 1
Berwin A Turlach
berwin at maths.uwa.edu.au
Mon Sep 14 14:16:43 CEST 2009
G'day Steve,
On Mon, 14 Sep 2009 13:44:56 +0200
Steve Jones <steve at squaregoldfish.co.uk> wrote:
> Apologies for the missing data. It can be downloaded from here (22Kb):
> http://www.squaregoldfish.co.uk/sekrett/series.csv
Well, the Details section of acf's help page states:
By default, no missing values are allowed. If the 'na.action'
function passes through missing values (as 'na.pass' does), the
covariances are computed from the complete cases. This means that
the estimate computed may well not be a valid autocorrelation
sequence, and may contain missing values. [...]
And you have seem to have a massive amount of missing data:
R> dat <- scan(url("http://www.squaregoldfish.co.uk/sekrett/series.csv"))
Read 6940 items
R> mean(!is.na(dat))
[1] 0.02881844
And, not surprisingly, an even smaller proportion of consecutive,
non-missing observations.
R> mean(!is.na(dat[-1]) & !is.na(dat[-length(dat)]))
[1] 0.006340971
You can find out which formulae are used exactly by acf by studying the
code, but this might give you an idea about what is going on:
R> ind <- !is.na(dat)
R> (mu <- mean(dat[ind])) ## too lazy for mean(dat, na.rm=TRUE)
[1] 373.5165
R> (sig2 <- var(dat[ind])) ## ditto
[1] 463.4041
R> ind <- which(!is.na(dat[-1]) & !is.na(dat[-length(dat)]))
R> sum( (dat[ind]-mu) * (dat[ind+1] - mu)) / length(ind)
[1] 593.3041
R> sum( (dat[ind]-mu) * (dat[ind+1] - mu)) / length(ind) / sig2
[1] 1.280317
HTH
Cheers,
Berwin
========================== Full address ============================
Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019) +61 (8) 6488 3383 (self)
The University of Western Australia FAX : +61 (8) 6488 1028
35 Stirling Highway
Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au
Australia http://www.maths.uwa.edu.au/~berwin
More information about the R-help
mailing list