[R] Normalizing grouped data in a data frame
Sandy Small
sandy.small at nhs.net
Fri Nov 9 16:22:11 CET 2007
Thank you very much.
That works nicely.
The trick I particularly needed was "within"which I didn't know about.
Also nice to get a data frame out with "sparseby" instead of just a
mulit-array with "by"
Sandy
Duncan Murdoch wrote:
> Sandy Small wrote:
>> Hi
>> I am a newbie to R but have tried a number of ways in R to do this
>> and can't find a good solution. (I could do it out of R in perl or
>> awk but would like to know how to do this in R).
>>
>> I have a large data frame 49 variables and 7000 observations however
>> for simplicity I can express it in the following data frame
>>
>> Base, Image, LVEF, ES_Time
>> A, 1, 4.32, 0.89
>> A, 2, 4.98, 0.67
>> A, 3, 3.7, 0.5
>> A, 3. 4.1, 0.8
>> B, 1, 7.4, 0.7
>> B, 3, 7.2, 0.8
>> B, 4, 7.8, 0.6
>> C, 1, 5.6, 1.1
>> C, 4, 5.2, 1.3
>> C, 5, 5.9, 1.2
>> C, 6, 6.1, 1.2
>> C, 7. 3.2, 1.1
>>
>> For each value of LVEF and ES_Time I would like to normalise the
>> value to the maximum for that factor grouped by Base or Image number,
>> adding an extra column to the data frame with the normalised value in
>> it.
>>
>> So for the Base = B group in the data frame (the data frame should
>> have the same length I'm just showing the B part) I would get a
>> modified data frame as follows.
>>
>> Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time
>> ...
>> B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8
>> B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8
>> B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8
>> ...
>>
>> Where the results of the division would replace the division shown here.
>> I hope this makes sense.
>> If anyone can help I would be very grateful.
>>
> You want to look at the by(), tapply() or sparseby() functions (the
> latter in the reshape package, the others are in base R).
>
> For example, I think this untested code does what you want:
>
> newdf <- sparseby(olddf, c("Base", "Image"),
> function(subset)
> within(subset,
> { Norm_LVEF <- LVEF/max(LVEF)
> Norm_ES_Time <-
> ES_Time/max(ES_Time)
> }))
>
> where olddf is the old dataframe, and newdf is newly created.
>
> Duncan Murdoch
**********************************************************************
This message may contain confidential and privileged information.
If you are not the intended recipient please accept our apologies.
Please do not disclose, copy or distribute information in this e-mail
or take any action in reliance on its contents: to do so is strictly
prohibited and may be unlawful. Please inform us that this message has
gone astray before deleting it. Thank you for your co-operation.
NHSmail is used daily by over 100,000 staff in the NHS. Over a million
messages are sent every day by the system. To find out why more and
more NHS personnel are switching to this NHS Connecting for Health
system please visit www.connectingforhealth.nhs.uk/nhsmail
More information about the R-help
mailing list