[R] Correlations by group

Gabor Grothendieck ggrothendieck at gmail.com
Mon Jul 24 14:32:04 CEST 2006


On 7/24/06, Peter J. Lee <peterjl at bilkent.edu.tr> wrote:
> I'm aware that S N Krishna asked the same
> question. However, I have failed to implement the
> posted solution for running rank order
> correlations on multiple subsets of data using the by() function.
>
> Here is my problem:
>
> Take a set of data from two subjects, who
> provided numerical infant mortality (IM) estimates for five countries:
>
>         sub <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
> #grouping variable = 5 rows x 2 subjects
>         est <- c(60, 20, 260, 160, 42, 2, 1, 3,
> 7, 12) #response variable = 5 estimates x 2 subjects
>         im <- c(4, 5, 7, 8, 10, 4, 5, 7, 8, 10) #actual IM values x 2 subjects
>         data <- cbind(sub, est, im)
>         data
>
> Using the by() function:
>
>         by(data, sub, function(x) cor(est, im, method = "spearman"))

The calculation in your function does not depend on x so its
giving a constant return value.  Try:

  by(data, sub, function(x) cor(x[,2], x[,3], method = "spearman"))

or

   tapply(1:length(sub), sub, function(i) cor(est[i], im[i], method =
"spearman"))

or either the following which returns correlation matrices instead of
the correlations:

   by(data[,2:3], sub, function(x) cor(x, method = "spearman"))
   by(data[,2:3], sub, cor, method = "spearman")

>
> does result in two correlation coefficients. But
> instead of by subject, the est x im correlation
> for the entire set is reported, and then assigned
> to both subjects. This can be checked using:
>
>         cor(est, im, method = "spearman")
>
> Nevertheless, the true coeff's and p-values should be:
>
>         sub[1] cor.coef = 0.1 p > .1
>         sub[2] cor.coef = 0.9 p < .05
>
> I find it peculiar that running a simple regression by groups does work:
>
>         by(data, sub, function(x) lm(est ~ im, data = x))
>
> indicating that perhaps I'm using the wrong
> grouping function for correlations. I'm using a
> fairly standard Pentium 4 running Windows XP.
>
> On occasion I am required to calculate up to a
> quarter of a million individual correlations, so
> any help would be very much appreciated.
>
> Best wishes,
>
> Peter James Lee
> _________________________
>
> Peter James Lee
> Assistant Professor
>
> Psikoloji Bölümü
> Bilkent University
> Bilkent
> Ankara
> Turkey
> 06800
>
> e-mail: peterjl at bilkent.edu.tr
> office: (90) 312 290 1807
> home: (90) 312 290 3447
> website: http://www.bilkent.edu.tr/~peterjl/index.html
> _________________________
>        [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>



More information about the R-help mailing list