[R] "tapply versus by" in function with more than 1 arguments

Henrique Dallazuanna wwwhsd at gmail.com
Wed Oct 1 17:59:25 CEST 2008


Try this:

sapply(by(dataf[,c("V1","V2")], dataf$class, cor), '[', 3)



On Wed, Oct 1, 2008 at 9:21 AM, Cézar Freitas <cafanselmo12 at yahoo.com.br> wrote:
> Hi. I searched the list and didn't found nothing similar to this. I simplified my example like below:
>
> #I need calculate correlation (for example) between 2 columns classified by a third one at a data.frame, like below:
>
> #number of rows
> nr = 10
>
> #the third column is to enforce that I need correlation on two variables only
> dataf = as.data.frame(matrix(c(rnorm(nr),rnorm(nr)*2,runif(nr),sort(c(1,1,2,2,3,3,sample(1:3,nr-6,replace=TRUE)))),ncol=4))
> names(dataf)[4] = "class"
>
> #> dataf
> #            V1             V2                V3                 class
> #1   0.56933020      1.2529931     0.30774422     1
> #2   0.41702299     -1.6441547     0.76140046     1
> #3  -1.07671647     -4.8747575     0.43706944     1
> #4  -1.97701167      1.3015196     0.04390175     2
> #5   0.56501325      1.8597720     0.08174124     2
> #6   0.70068638      1.7922641     0.74730126     2
> #7  -1.39956177     -1.9918904     0.64521918     3
> #8   0.27086664      0.3745362     0.61026133     3
> #9   0.04282347      3.7360407     0.48696109     3
> #10 -0.34262654      0.7933674    0.09824913     3
>
> #I tried:
>
> tapply(dataf$V1, dataf$class, cor, dataf$V2)
> #Error FUN(X[[1L]], ...) : incompatible dimensions
>
> tapply(dataf$V1, dataf$class, cor, tapply(dataf$V2, dataf$class))
> #Error FUN(X[[1L]], ...) : incompatible dimensions
>
> #But using "by" I obtain:
>
> by(dataf[,c("V1","V2")], dataf$class, cor)
>
> #dataf$class: 1
> #        V1      V2
> #V1 1.00000 0.91777
> #V2 0.91777 1.00000
> #--------------------------------------------------------------------------------------------------
> #dataf$class: 2
> #         V1       V2
> #V1 1.000000 0.987857
> #V2 0.987857 1.000000
> #--------------------------------------------------------------------------------------------------
> #dataf$class: 3
> #          V1        V2
> #V1 1.0000000 0.7318938
> #V2 0.7318938 1.0000000
>
> #My interest is on cor(V1,V2)[1,2], so I can take 0.91777, 0.987857 and 0.7318938, but I think that tapply can works better, if I can solve the problem.
>
> Thanks,
> Cezar
>
>
>      Novos endereços, o Yahoo! que você conhece. Crie um email novo com a sua cara @ymail.com ou @rocketmail.com.
> http://br.new.mail.yahoo.com/addresses
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O



More information about the R-help mailing list