[R] Estimating correlation in multiple measures data

Thu Mar 24 20:45:59 CET 2011

Peter,

Regarding 1) I do not agree. See the following, simplified example:
x <- data.frame(ID=rep(1:2, each=4), Visit=rep(c(1:4), 2), 
ptA=c(7,8,9,10,17,18,19,20), ptB=c(5,6,7,8,21,20,19,18))

In this data frame you have only 2 patients with 4 visits each, but the 
correlation of ptA and ptB is in opposite direction in these 2 patients. 
See the plot:
plot(ptB~ptA, x)

If you do 'cor.test(x$ptA, x$ptB)' you get a very high correlation 
(0.961) and a significant p-value (0.0001356). However, doing it by patient:
xx <- x[x$ID==1,]; cor.test(xx$ptA, xx$ptB)
xx <- x[x$ID==2,]; cor.test(xx$ptA, xx$ptB)
you get 2 opposite correlation values (1 and -1). So in the instance of 
patient 2 the correlation on individual level is _very_ far from the one 
estimated on the whole dataset. My problem is: in what way can I 
estimate the correlation between ptA and ptB taking into account the 
multiple measures?

Regarding 2) This is not as much of a problem. Simplest solution is to 
build a model with and without correlation and compare them with anova. 
P value from anova will indicate significance of the correlation.

Regarding 3) I know of this solution - Bland & Altman paper from BMJ 
1994 recommended that. I'm looking for something more sophisticated...

Best regards,

--
Michal J. Figurski, PhD
HUP, Pathology & Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413

On 3/24/2011 1:58 PM, Peter Langfelder wrote:
> I see, so it's more of a statistics than R question. A couple thoughts:
>
> 1. The fact that 4 measurements in each single patient are possibly
> highly related should not change the correlation, only the p-value.
> Here's an example: generate two variables a and b
>
> a = c(1:10);
> b = sample(a) + a
>
>> cor(a,b)
>            [,1]
> [1,] 0.4735424
>> cor (rep(a, 4), rep(b, 4))
>            [,1]
> [1,] 0.4735424
>
> Notice that the correlation of a,b, and the correlation of 4-times
> repeated a with 4-times repeated b is exactly the same.
>
> 2. The calculation of a p-value is more complicated and I don't have a
> good answer, but an upper bound on the p-value can be obtained by
> calculating the p-value pretending that there are only 10
> measurements. In the package WGCNA we have a function for that, it's
> called corPvalueStudent.
>
> 3. If the 4 measurements for each patient are very similar, you could
> simply average them, then proceed as if you had 10 independent
> measurements.
>
> Peter
>
> On Thu, Mar 24, 2011 at 10:38 AM, Michal Figurski
> <figurski at mail.med.upenn.edu>  wrote:
>> Peter,
>>
>> This is actually too simple - it doesn't take into account the fact that the
>> data were measured several times on the same subject. This is one thing I
>> know for sure, that one should not just lump such data together and pretend
>> that each point comes from a different patient...
>>
>> --
>> Michal J. Figurski, PhD
>> HUP, Pathology&  Laboratory Medicine
>> Biomarker Research Laboratory
>> 3400 Spruce St. 7 Maloney S
>> Philadelphia, PA 19104
>> tel. (215) 662-3413