[R] within-groups variance and between-groups variance
Coghlan, Avril
A.Coghlan at ucc.ie
Thu Aug 25 17:02:15 CEST 2011
Hello,
I have been looking for functions for calculating the within-groups
variance and between-groups variance, for the case where you have
several numerical variables describing samples from a number of groups.
I didn't find such functions in R, so wrote my own versions myself (see
below). I can calculate the within- and between-groups variance for the
Sepal.length variable (iris[1]) in the "iris" data set, by typing:
> calcWithinGroupsVariance(iris[1],iris[5])
[1] 0.2650082
> calcBetweenGroupsVariance(iris[1],iris[5])
[1] 0.4300145
I am wondering however if there are functions for doing this already in
R?
I would prefer to use a standard R function if one exists.
Kind Regards,
Avril
Within-Groups Variance:
=======================
calcWithinGroupsVariance <- function(variable,groupvariable)
{
# find out how many values the group variable can take
groupvariable2 <- as.factor(groupvariable[[1]])
levels <- levels(groupvariable2)
numlevels <- length(levels)
# get the mean and standard deviation for each group:
numtotal <- 0
denomtotal <- 0
for (i in 1:numlevels)
{
leveli <- levels[i]
levelidata <- variable[groupvariable==leveli,]
levelilength <- length(levelidata)
# get the mean and standard deviation for group i:
meani <- mean(levelidata)
sdi <- sd(levelidata)
numi <- (levelilength - 1)*(sdi * sdi)
denomi <- levelilength
numtotal <- numtotal + numi
denomtotal <- denomtotal + denomi
}
# calculate the within-groups variance
Vw <- numtotal / (denomtotal - numlevels)
return(Vw)
}
Between-Groups-Variance:
========================
calcBetweenGroupsVariance <- function(variable,groupvariable)
{
# find out how many values the group variable can take
groupvariable2 <- as.factor(groupvariable[[1]])
levels <- levels(groupvariable2)
numlevels <- length(levels)
# calculate the overall grand mean:
grandmean <- mean(variable)
# get the mean and standard deviation for each group:
numtotal <- 0
denomtotal <- 0
for (i in 1:numlevels)
{
leveli <- levels[i]
levelidata <- variable[groupvariable==leveli,]
levelilength <- length(levelidata)
# get the mean and standard deviation for group i:
meani <- mean(levelidata)
sdi <- sd(levelidata)
numi <- levelilength * ((meani - grandmean)^2)
denomi <- levelilength
numtotal <- numtotal + numi
denomtotal <- denomtotal + denomi
}
# calculate the between-groups variance
Vb <- numtotal / (denomtotal - numlevels)
Vb <- Vb[[1]]
return(Vb)
}
More information about the R-help
mailing list