[R] Need to calculate standard deviation by groups

Zsuzsanna Papp Zsuzsanna.Papp at shaw.ca
Fri Dec 9 07:28:33 CET 2011


Hello, 

please help me with this basic question, I already spent two days on the
internet and textbooks trying to come up with an answer...
I will simplify my question to an example, rather than base it on the
original variable names.
I have a Dataset with 4 variables, 20000 cases. Variable A is an ID.
Variable B is a continuous numerical variable, unique to each A.
Variable C is categorical factor, has 6 possible levels. Variable D is
also categorical factor, has 300 different levels.

I would like to create a new variable=E, which is the standard deviation
of B around the group means of B, groups defined by C and D.

I had no problem creating such column to get group means (with the ave()
function), but can not find a solution for another function like sd that
would assign proper group value to each case.

I tried

Dataset$E <- with(Dataset, tapply(B, list(C,D),FUN=sd))

but it is wrong, as it takes the 1800 different SD values, puts them in
column E, then puts the same array of numbers there below it, repeats as
many times as possible until the column is filled. The SD values are not
corresponding to the proper groups.

How can I match these data (1800 different SD values) to their
corresponding cases in my original data?
Is there a shortcut to do this all in one line, as for the means with
the ave() function?

I also tried ddply but I am doing something wrong (my R is on Linux and
do not yet know how to get error messages, so I do not know what is
wrong with my lines).

Thank you for any help! Please give me as detailed script as possible.

Zsuzsa



More information about the R-help mailing list