[R] Normalizing grouped data in a data frame

Fri Nov 9 16:22:11 CET 2007

Thank you very much.
That works nicely.
The trick I particularly needed was "within"which I didn't know about.
Also nice to get a data frame out with "sparseby" instead of just a 
mulit-array with "by"
Sandy

Duncan Murdoch wrote:
> Sandy Small wrote:
>> Hi
>> I am a newbie to R but have tried a number of ways in R to do this 
>> and can't find a good solution. (I could do it out of R in perl or 
>> awk but would like to know how to do this in R).
>>
>> I have a large data frame 49 variables and 7000 observations however 
>> for simplicity I can express it in the following data frame
>>
>> Base, Image, LVEF, ES_Time
>> A, 1,  4.32, 0.89
>> A, 2, 4.98, 0.67
>> A, 3, 3.7, 0.5
>> A, 3. 4.1, 0.8
>> B, 1, 7.4, 0.7
>> B, 3, 7.2, 0.8
>> B, 4, 7.8, 0.6
>> C, 1, 5.6, 1.1
>> C, 4, 5.2, 1.3
>> C, 5, 5.9, 1.2
>> C, 6, 6.1, 1.2
>> C, 7. 3.2, 1.1
>>
>> For each value of LVEF and ES_Time I would like to normalise the 
>> value to the maximum for that factor grouped by Base or Image number, 
>> adding an extra column to the data frame with the normalised value in 
>> it.
>>
>> So for the Base = B group in the data frame (the data frame should 
>> have the same length I'm just showing the B part) I would get a 
>> modified data frame as follows.
>>
>> Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time
>> ...
>> B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8
>> B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8
>> B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8
>> ...
>>
>> Where the results of the division would replace the division shown here.
>> I hope this makes sense.
>> If anyone can help I would be very grateful.
>>   
> You want to look at the by(), tapply() or sparseby() functions (the 
> latter in the reshape package, the others are in base R).
>
> For example, I think this untested code does what you want:
>
> newdf <- sparseby(olddf, c("Base", "Image"),
>                               function(subset)
>                                    within(subset,
>                                           { Norm_LVEF <- LVEF/max(LVEF)
>                                              Norm_ES_Time <- 
> ES_Time/max(ES_Time)
>                                           }))
>
> where olddf is the old dataframe, and newdf is newly created.
>
> Duncan Murdoch

**********************************************************************
This message  may  contain  confidential  and  privileged information.
If you are not  the intended  recipient please  accept our  apologies.
Please do not disclose, copy or distribute  information in this e-mail
or take any  action in reliance on its  contents: to do so is strictly
prohibited and may be unlawful. Please inform us that this message has
gone  astray  before  deleting it.  Thank  you for  your co-operation.

NHSmail is used daily by over 100,000 staff in the NHS. Over a million
messages  are sent every day by the system.  To find  out why more and
more NHS personnel are  switching to  this NHS  Connecting  for Health
system please visit www.connectingforhealth.nhs.uk/nhsmail