[R] Function on columns of a dataframe

David Winsemius dwinsemius at comcast.net
Fri Jul 9 16:47:35 CEST 2010


On Jul 9, 2010, at 10:26 AM, Eik Vettorazzi wrote:

> you are right. But maybe "aggregate" is close to the desired result?
>
> aggregate(bla, list(bla$cat), max)

Right. I couldn't get it to work until I removed the first two columns:

aggregate(bla[,-(1:2)], list(bla$cat), max)

Then I got pretty much the same dataframe as I would have with :

as.data.frame(lapply( bla[, -(1:2)], function(x) tapply(x, bla$cat,  
max) ))
             v1        v2        v3        v4
cat1 0.4634519 0.5274645 0.6051479 0.7586322
cat2 0.4062700 0.4282639 0.4443707 0.8419526
cat3 0.4816403 0.4996033 0.3538144 0.9456385
cat4 0.6354560 0.3558259 0.3646292 0.1907295
cat5 0.6663811 0.2154201 0.5059900 0.7573575
cat6 0.5260832 0.3934063 0.3545962 0.6412563

Except that aggregate version returns it with a "Group.1" column of  
"cat"s while the other version returned it with the "cat" names in the  
rownames. A matter of taste?

-- 
David.
>
> Am 09.07.2010 16:01, schrieb David Winsemius:
>>
>> On Jul 9, 2010, at 9:46 AM, Eik Vettorazzi wrote:
>>
>>> Hi Nils,
>>> have a look at
>>> ?tapply
>>> hth.
>>
>> Perhaps this will be part way there (I couldn't really figure out the
>> desired structure of the final object):
>>> lapply( bla[, -(1:2)], function(x) tapply(x, bla$cat, max) )
>> $v1
>>     cat1      cat2      cat3      cat4      cat5      cat6
>> 0.4634519 0.4062700 0.4816403 0.6354560 0.6663811 0.5260832
>>
>> $v2
>>     cat1      cat2      cat3      cat4      cat5      cat6
>> 0.5274645 0.4282639 0.4996033 0.3558259 0.2154201 0.3934063
>>
>> $v3
>>     cat1      cat2      cat3      cat4      cat5      cat6
>> 0.6051479 0.4443707 0.3538144 0.3646292 0.5059900 0.3545962
>>
>> $v4
>>     cat1      cat2      cat3      cat4      cat5      cat6
>> 0.7586322 0.8419526 0.9456385 0.1907295 0.7573575 0.6412563
>>
>>
>>>
>>> Am 09.07.2010 15:37, schrieb LogLord:
>>>> Hi,
>>>>
>>>> I would like to assign the largest value of a column to a specific
>>>> category
>>>> and repeat this for each column (v1 - v4).
>>>>
>>>>
>>>>> x=c(1:12)
>>>>> cat 
>>>>> = 
>>>>> c 
>>>>> ("cat1 
>>>>> ","cat5 
>>>>> ","cat2 
>>>>> ","cat2","cat1","cat5","cat3","cat4","cat5","cat2","cat3","cat6")
>>>>>
>>>>> v1=rnorm(12,0.5,0.1)
>>>>> v2=rnorm(12,0.3,0.2)
>>>>> v3=rnorm(12,0.4,0.1)
>>>>> v4=rnorm(12,0.6,0.3)
>>>>> bla=data.frame(x,cat,v1,v2,v3,v4)
>>>>> bla
>>>>>
>>>>   x  cat        v1         v2        v3         v4
>>>> 1   1 cat1 0.4013144 0.54839317 0.3946393  0.8679266
>>>> 2   2 cat5 0.4595873 0.45788906 0.4030078  0.5919596
>>>> 3   3 cat2 0.4542865 0.21516928 0.2777649  0.6112099
>>>> 4   4 cat2 0.4787950 0.06252512 0.5095611  0.6450795
>>>> 5   5 cat1 0.4910746 0.56591049 0.5151813  0.8465181
>>>> 6   6 cat5 0.4194397 0.16592579 0.4361643  0.6415192
>>>> 7   7 cat3 0.6148564 0.32240342 0.2690108  0.7114133
>>>> 8   8 cat4 0.6174652 0.28076152 0.4577064 -0.2567284
>>>> 9   9 cat5 0.4775395 0.28611768 0.4660210  0.4634120
>>>> 10 10 cat2 0.4802962 0.03715569 0.4506361  1.0063235
>>>> 11 11 cat3 0.6495094 0.33303172 0.3352933  1.4390324
>>>> 12 12 cat6 0.4891481 0.45355589 0.3880739  0.7831656
>>>>
>>>>>
>>>> I can assign this by the sqldf() command for each column but I  
>>>> would
>>>> like to
>>>> automate this as I have many columns.
>>>>
>>>>
>>>>> select=sqldf("select cat, max(v1) FROM bla GROUP BY cat")
>>>>> select
>>>>>
>>>>  cat   max(v1)
>>>> 1 cat1 0.4910746
>>>> 2 cat2 0.4802962
>>>> 3 cat3 0.6495094
>>>> 4 cat4 0.6174652
>>>> 5 cat5 0.4775395
>>>> 6 cat6 0.4891481
>>>>
>>>>>
>>>> Finally, I would like to have a dataframe where where the cat is
>>>> followed by
>>>> each column maximum.
>>>>
>>>> Thanks for your help!
>>>>
>>>
>>> -- 
>>> Eik Vettorazzi
>>> Institut für Medizinische Biometrie und Epidemiologie
>>> Universitätsklinikum Hamburg-Eppendorf
>>>
>>> Martinistr. 52
>>> 20246 Hamburg
>>>
>>> T ++49/40/7410-58243
>>> F ++49/40/7410-57790
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
> -- 
> Eik Vettorazzi
> Institut für Medizinische Biometrie und Epidemiologie
> Universitätsklinikum Hamburg-Eppendorf
>
> Martinistr. 52
> 20246 Hamburg
>
> T ++49/40/7410-58243
> F ++49/40/7410-57790
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list