[R] Calculating SD according to groups of rows

hadley wickham h.wickham at gmail.com
Thu Nov 20 14:01:18 CET 2008


On Thu, Nov 20, 2008 at 2:20 AM, Dieter Menne
<dieter.menne at menne-biomed.de> wrote:
> pufftissue pufftissue <pufftissue <at> gmail.com> writes:
>
>>
>> What I am getting is indeed:
>>
>> 7200          23955        34563        8934
>> 16.39977 10.03896    11.234      14.02
>>
>> I'd like the final output to be:
>>
>> subject_id         hr_Stand_Deviation
>> 7200                  16.39977
>> 23955                10.03896
>> 34563                11.234
>> 8934                  14.02
>>
>
> The hard way could go like that; I personally got used to it, but I admit
> it is one of the thinks that are unusually difficult in R.
>
> dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100))
> sd.list = with(dat, tapply(HR, SUBJECT_ID, sd))
> data.frame(SUBJECT_ID=rownames(sd.list),sd=sd.list)
>
> I think Hadley Wickham tried to make life easier with the plyr package,
> so I thought something like the below would work out of the box.
> However, there must be something wrong with the syntax, the
> result is only "approximately" correct.
>
> Dieter
>
> library(plyr)
> daply(dat,.(SUBJECT_ID),sd)
> ddply(dat,.(SUBJECT_ID),sd)

Well that calculates sd on the whole data frame.  (Like sd(dat)). You
probably want:

ddply(dat,.(SUBJECT_ID), numcolwise(sd))

which calculates sd for numeric columns only, or

ddply(dat,.(SUBJECT_ID), function(df) sd(df$HR))

which calculates it for HR explicitly.


Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list