[R-meta] Estimate variance from time series data

Tue Aug 14 22:50:19 CEST 2018

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on). Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total <- 100
grp <- sample(1:4, size=n.total, replace=TRUE)
y   <- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y <- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y <- c(t(Y))
grp <- c(1:50, 1:50)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as 
repeated measures of population densities, but authors sometimes also 
give overall averages and s.d. or s.e.. Because I want to combine these 
data into one analysis, I am interested in the overall effect size of 
the repeated measures, so would like to combine all data of the time 
series into one average and s.d. The time series are repeated several 
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                         N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation 
for the s.d.

The formula normally used for calculating the combined variance of two 
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1 
+ n2 -- 1)

does not seem to apply when combining the measurements through time, 
because this increases the number of replicates, which in my opinion, 
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could 
advise me on this matter.

Thanks very much in advance.
Arne Janssen