[R] a question about "by" and "ddply"

David Winsemius dwinsemius at comcast.net
Wed May 30 06:58:36 CEST 2012


On May 29, 2012, at 6:32 PM, jacaranda tree wrote:

> Hi all,
> I have a data set (df, n=10 for the sake of simplicity here) where I  
> have two continuous variables (age and weight) and I also have a  
> grouping variable (group, with two levels). I want to run  
> correlations for each group separately (kind of similar to "split  
> file" in SPSS). I've been experimenting with different functions,  
> and I was able to do this correctly using ddply function, but output  
> is a little bit difficult to read when I do the cor.test to get all  
> the data with p values, df, and pearson r (see below). I also tried  
> to do it with by function. Although, with by, it shows the data for  
> two groups separately, it seems like it calculates the same r for  
> both groups. Here is my code for both ddply and by, and the output  
> as well. I was wondering if there is a way to display the output  
> better with ddply or run the correlations correctly for each group  
> using by.
> Thanks in advance,
>

I would have imagined something along the lines of

lapply( split( df, df$group, function(x) cor.test(x[["age"]],  
x[["weight")] )

... but without an example it's only a guess.

-- 
David

> 1.with  "ddply"
> r<-ddply(df, .(group), summarise, "corr" = cor.test(age, weight,  
> method = "pearson"))
>
> Output:
>    Group                                 corr
> 1      1                                  Inf
> 2      1                                    3
> 3      1                                    0
> 4      1                                    1
> 5      1                                    0
> 6      1                            two.sided
> 7      1 Pearson's product-moment correlation
> 8      1                       age and weight
> 9      1                                 1, 1
> 10     2                             9.722211
> 11     2                                    3
> 12     2                          0.002311412
> 13     2                            0.9844986
> 14     2                                    0
> 15     2                            two.sided
> 16     2 Pearson's product-moment correlation
> 17     2                       age and weight
> 18     2                 0.7779640, 0.9990233
>
> 2. with "by"
> r <- by(df, group, FUN = function(x) cor.test(age, weight, method =  
> "pearson"))
>
> Output:
> Group: 1
>
>         Pearson's product-moment correlation
>
> data:  age and weight
> t = 6.4475, df = 8, p-value = 0.0001988
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
>  0.6757758 0.9802100
> sample estimates:
>       cor
> 0.9157592
>
> ------------------------------------------------------------
> Group: 2
>
>         Pearson's product-moment correlation
>
> data:  age and weight
> t = 6.4475, df = 8, p-value = 0.0001988
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
>  0.6757758 0.9802100
> sample estimates:
>       cor
> 0.9157592
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list