[R] within-groups variance and between-groups variance

Coghlan, Avril A.Coghlan at ucc.ie
Thu Aug 25 17:02:15 CEST 2011


Hello,

I have been looking for functions for calculating the within-groups
variance and between-groups variance, for the case where you have
several numerical variables describing samples from a number of groups.

I didn't find such functions in R, so wrote my own versions myself (see
below). I can calculate the within- and between-groups variance for the
Sepal.length variable (iris[1]) in the "iris" data set, by typing:
> calcWithinGroupsVariance(iris[1],iris[5])
[1] 0.2650082
> calcBetweenGroupsVariance(iris[1],iris[5])
[1] 0.4300145

I am wondering however if there are functions for doing this already in
R?
I would prefer to use a standard R function if one exists. 

Kind Regards,
Avril


Within-Groups Variance:
=======================

calcWithinGroupsVariance <- function(variable,groupvariable) 
      {
         # find out how many values the group variable can take
         groupvariable2 <- as.factor(groupvariable[[1]])
         levels <- levels(groupvariable2)
         numlevels <- length(levels)
         # get the mean and standard deviation for each group:
         numtotal <- 0
         denomtotal <- 0
         for (i in 1:numlevels)
         {
            leveli <- levels[i]
            levelidata <- variable[groupvariable==leveli,]
            levelilength <- length(levelidata)
            # get the mean and standard deviation for group i:
            meani <- mean(levelidata)
            sdi <- sd(levelidata)
            numi <- (levelilength - 1)*(sdi * sdi)
            denomi <- levelilength
            numtotal <- numtotal + numi
            denomtotal <- denomtotal + denomi 
         } 
         # calculate the within-groups variance
         Vw <- numtotal / (denomtotal - numlevels) 
         return(Vw)
      } 

Between-Groups-Variance:
========================

calcBetweenGroupsVariance <- function(variable,groupvariable) 
      {
         # find out how many values the group variable can take
         groupvariable2 <- as.factor(groupvariable[[1]])
         levels <- levels(groupvariable2)
         numlevels <- length(levels)
         # calculate the overall grand mean: 
         grandmean <- mean(variable) 
         # get the mean and standard deviation for each group:
         numtotal <- 0
         denomtotal <- 0
         for (i in 1:numlevels)
         {
            leveli <- levels[i]
            levelidata <- variable[groupvariable==leveli,]
            levelilength <- length(levelidata)
            # get the mean and standard deviation for group i:
            meani <- mean(levelidata)
            sdi <- sd(levelidata)
            numi <- levelilength * ((meani - grandmean)^2)
            denomi <- levelilength
            numtotal <- numtotal + numi
            denomtotal <- denomtotal + denomi 
         } 
         # calculate the between-groups variance
         Vb <- numtotal / (denomtotal - numlevels) 
         Vb <- Vb[[1]]
         return(Vb)
      }



More information about the R-help mailing list