[R] relation in aggregated data
Petr PIKAL
petr.pikal at precheza.cz
Wed Jul 7 16:24:31 CEST 2010
Dear all
My question is more on statistics than on R, however it can be
demonstrated by R. It is about pros and cons trying to find a relationship
by aggregated data. I can have two variables which can be related and I
measure them regularly during some time (let say a year) but I can not
measure them in a same time - (e.g. I can not measure x and respective
value of y, usually I have 3 or more values of x and only one value of y
per day).
I can make a aggregated values (let say quarterly). My questions are:
1. Is such approach sound? Can I use it?
2. What could be the problems
3. Is there any other method to inspect variables which can be
related but you can not directly measure them in a same time?
My opinion is, that it is not much sound to inspect aggregated values and
there can be many traps especially if there are only few aggregated
values. Below you can see my examples.
If you have some opinion on this issue, please let me know.
Best regards
Petr
Let us have a relation x/y
set.seed(555)
x <- rnorm(120)
y <- 5*x+3+rnorm(120)
plot(x, y)
As you can see there is clear relation which can be seen from plot. Now I
make a factor for aggregation.
fac <- rep(1:4,each=30)
xprum <- tapply(x, fac, mean)
yprum <- tapply(y, fac, mean)
plot(xprum, yprum)
Relationship is completely gone. Now let us make other fake data
xn <- runif(120)*rep(1:4, each=30)
yn <- runif(120)*rep(1:4, each=30)
plot(xn,yn)
There is no visible relation, xn and yn are independent but related to
aggregation factor.
xprumn <- tapply(xn, fac, mean)
yprumn <- tapply(yn, fac, mean)
plot(xprumn, yprumn)
Here you can see perfect relation which is only due to aggregation factor.
More information about the R-help
mailing list