[R] Calculating SD according to groups of rows
hadley wickham
h.wickham at gmail.com
Thu Nov 20 14:01:18 CET 2008
On Thu, Nov 20, 2008 at 2:20 AM, Dieter Menne
<dieter.menne at menne-biomed.de> wrote:
> pufftissue pufftissue <pufftissue <at> gmail.com> writes:
>
>>
>> What I am getting is indeed:
>>
>> 7200 23955 34563 8934
>> 16.39977 10.03896 11.234 14.02
>>
>> I'd like the final output to be:
>>
>> subject_id hr_Stand_Deviation
>> 7200 16.39977
>> 23955 10.03896
>> 34563 11.234
>> 8934 14.02
>>
>
> The hard way could go like that; I personally got used to it, but I admit
> it is one of the thinks that are unusually difficult in R.
>
> dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100))
> sd.list = with(dat, tapply(HR, SUBJECT_ID, sd))
> data.frame(SUBJECT_ID=rownames(sd.list),sd=sd.list)
>
> I think Hadley Wickham tried to make life easier with the plyr package,
> so I thought something like the below would work out of the box.
> However, there must be something wrong with the syntax, the
> result is only "approximately" correct.
>
> Dieter
>
> library(plyr)
> daply(dat,.(SUBJECT_ID),sd)
> ddply(dat,.(SUBJECT_ID),sd)
Well that calculates sd on the whole data frame. (Like sd(dat)). You
probably want:
ddply(dat,.(SUBJECT_ID), numcolwise(sd))
which calculates sd for numeric columns only, or
ddply(dat,.(SUBJECT_ID), function(df) sd(df$HR))
which calculates it for HR explicitly.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list