[R] acf gives correlations > 1

Mon Sep 14 14:16:43 CEST 2009

G'day Steve,

On Mon, 14 Sep 2009 13:44:56 +0200
Steve Jones <steve at squaregoldfish.co.uk> wrote:

> Apologies for the missing data. It can be downloaded from here (22Kb):
> http://www.squaregoldfish.co.uk/sekrett/series.csv

Well, the Details section of acf's help page states:

     By default, no missing values are allowed.  If the 'na.action'
     function passes through missing values (as 'na.pass' does), the
     covariances are computed from the complete cases.  This means that
     the estimate computed may well not be a valid autocorrelation
     sequence, and may contain missing values.  [...]

And you have seem to have a massive amount of missing data:

R> dat <- scan(url("http://www.squaregoldfish.co.uk/sekrett/series.csv"))
Read 6940 items
R> mean(!is.na(dat))
[1] 0.02881844

And, not surprisingly, an even smaller proportion of consecutive, 
non-missing observations.

R> mean(!is.na(dat[-1]) & !is.na(dat[-length(dat)]))
[1] 0.006340971

You can find out which formulae are used exactly by acf by studying the
code, but this might give you an idea about what is going on:

R> ind <- !is.na(dat)
R> (mu <- mean(dat[ind]))  ## too lazy for mean(dat, na.rm=TRUE)
[1] 373.5165
R> (sig2 <- var(dat[ind])) ## ditto
[1] 463.4041
R> ind <- which(!is.na(dat[-1]) & !is.na(dat[-length(dat)]))
R> sum( (dat[ind]-mu) * (dat[ind+1] - mu)) / length(ind)
[1] 593.3041
R> sum( (dat[ind]-mu) * (dat[ind+1] - mu)) / length(ind) / sig2
[1] 1.280317

HTH

Cheers,

	Berwin

========================== Full address ============================
Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
Australia                        http://www.maths.uwa.edu.au/~berwin