[R] Function on columns of a dataframe

Eik Vettorazzi E.Vettorazzi at uke.uni-hamburg.de
Fri Jul 9 16:26:44 CEST 2010


you are right. But maybe "aggregate" is close to the desired result?

aggregate(bla, list(bla$cat), max)

Am 09.07.2010 16:01, schrieb David Winsemius:
>
> On Jul 9, 2010, at 9:46 AM, Eik Vettorazzi wrote:
>
>> Hi Nils,
>> have a look at
>> ?tapply
>> hth.
>
> Perhaps this will be part way there (I couldn't really figure out the
> desired structure of the final object):
> > lapply( bla[, -(1:2)], function(x) tapply(x, bla$cat, max) )
> $v1
>      cat1      cat2      cat3      cat4      cat5      cat6
> 0.4634519 0.4062700 0.4816403 0.6354560 0.6663811 0.5260832
>
> $v2
>      cat1      cat2      cat3      cat4      cat5      cat6
> 0.5274645 0.4282639 0.4996033 0.3558259 0.2154201 0.3934063
>
> $v3
>      cat1      cat2      cat3      cat4      cat5      cat6
> 0.6051479 0.4443707 0.3538144 0.3646292 0.5059900 0.3545962
>
> $v4
>      cat1      cat2      cat3      cat4      cat5      cat6
> 0.7586322 0.8419526 0.9456385 0.1907295 0.7573575 0.6412563
>
>
>>
>> Am 09.07.2010 15:37, schrieb LogLord:
>>> Hi,
>>>
>>> I would like to assign the largest value of a column to a specific
>>> category
>>> and repeat this for each column (v1 - v4).
>>>
>>>
>>>> x=c(1:12)
>>>> cat=c("cat1","cat5","cat2","cat2","cat1","cat5","cat3","cat4","cat5","cat2","cat3","cat6")
>>>>
>>>> v1=rnorm(12,0.5,0.1)
>>>> v2=rnorm(12,0.3,0.2)
>>>> v3=rnorm(12,0.4,0.1)
>>>> v4=rnorm(12,0.6,0.3)
>>>> bla=data.frame(x,cat,v1,v2,v3,v4)
>>>> bla
>>>>
>>>    x  cat        v1         v2        v3         v4
>>> 1   1 cat1 0.4013144 0.54839317 0.3946393  0.8679266
>>> 2   2 cat5 0.4595873 0.45788906 0.4030078  0.5919596
>>> 3   3 cat2 0.4542865 0.21516928 0.2777649  0.6112099
>>> 4   4 cat2 0.4787950 0.06252512 0.5095611  0.6450795
>>> 5   5 cat1 0.4910746 0.56591049 0.5151813  0.8465181
>>> 6   6 cat5 0.4194397 0.16592579 0.4361643  0.6415192
>>> 7   7 cat3 0.6148564 0.32240342 0.2690108  0.7114133
>>> 8   8 cat4 0.6174652 0.28076152 0.4577064 -0.2567284
>>> 9   9 cat5 0.4775395 0.28611768 0.4660210  0.4634120
>>> 10 10 cat2 0.4802962 0.03715569 0.4506361  1.0063235
>>> 11 11 cat3 0.6495094 0.33303172 0.3352933  1.4390324
>>> 12 12 cat6 0.4891481 0.45355589 0.3880739  0.7831656
>>>
>>>>
>>> I can assign this by the sqldf() command for each column but I would
>>> like to
>>> automate this as I have many columns.
>>>
>>>
>>>> select=sqldf("select cat, max(v1) FROM bla GROUP BY cat")
>>>> select
>>>>
>>>   cat   max(v1)
>>> 1 cat1 0.4910746
>>> 2 cat2 0.4802962
>>> 3 cat3 0.6495094
>>> 4 cat4 0.6174652
>>> 5 cat5 0.4775395
>>> 6 cat6 0.4891481
>>>
>>>>
>>> Finally, I would like to have a dataframe where where the cat is
>>> followed by
>>> each column maximum.
>>>
>>> Thanks for your help!
>>>
>>
>> -- 
>> Eik Vettorazzi
>> Institut für Medizinische Biometrie und Epidemiologie
>> Universitätsklinikum Hamburg-Eppendorf
>>
>> Martinistr. 52
>> 20246 Hamburg
>>
>> T ++49/40/7410-58243
>> F ++49/40/7410-57790
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>

-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790



More information about the R-help mailing list