[R] scale subsets of grouped data in data frame

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat Aug 1 03:38:52 CEST 2009


Hi,

On Jul 31, 2009, at 7:17 PM, Noah Silverman wrote:

> Hello,
>
> I'm trying to duplicate what's an easy process in RapidMiner.
>
> In RM, we can simply use two operators:
>     subgroup iteration
>     attribute value selection (Can use a regex for the attrribute  
> name.)
>
> I can do this in R with a lot of code and manual steps.  It would be
> really nice to find a more automated way.
>
> My data looks like this
>
> group 	group_height 	group_weight 	height 	weight
> g22 	3.2 	8.896 	3.2 	8.896
> g22 	2.5 	6.95 	2.5 	6.95
> g22 	3.1 	8.618 	3.1 	8.618
> g49 	2.4 	6.672 	2.4 	6.672
> g49 	4.2 	11.676 	4.2 	11.676
> g49 	2.5 	6.95 	2.5 	6.95
> g55 	2.6 	7.228 	2.6 	7.228
> g55 	3.4 	9.452 	3.4 	9.452
> g55 	3.3 	9.174 	3.3 	9.174
>
> What I want to do is scale the data by each group
> So in pseudo-code
>     for(group in groups){
>         if(column_name = regex(group_.*)){
>             data[column_name] = scale(data[group,column_name])
>         }
>     }
>
> This way I get "group wise" normalization of my data, but still have  
> the
> original values which I will normailze "database wide" for some  
> comparisons.
>
> Can anybody help solve this one?
>
> -N


You can do this quite easily.

Just take what you learned from the last example re: scaling subsets,  
and play around with some of the functions you see in the ?grep help  
page. You'll be using those functions against the strings you get back  
from colnames(data).

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list