[R] Summary information by groups programming assitance

Ranney, Steven steven.ranney at montana.edu
Mon Dec 22 22:51:16 CET 2008

All - 

I have data that looks like

          psd 	Species Lake Length  Weight    St.weight    Wr
Wr.1     vol
432  substock     SMB      Clear    150   41.00      0.01  95.12438
95.10118  0.0105
433  substock     SMB      Clear    152   39.00      0.01  86.72916
86.70692  0.0105
434  substock     SMB      Clear    152   40.00      3.11  88.95298
82.03689  3.2655
435  substock     SMB      Clear    159   48.00      0.04  92.42095
92.34393  0.0420
436  substock     SMB      Clear    159   48.00      0.01  92.42095
92.40170  0.0105
437  substock     SMB      Clear    165   47.00      0.03  80.38023
80.32892  0.0315
438  substock     SMB      Clear    171   62.00      0.21  94.58105
94.26070  0.2205
439  substock     SMB      Clear    178   70.00      0.01  93.91912
93.90571  0.0105
440  substock     SMB      Clear    179   76.00      1.38 100.15760
98.33895  1.4490
441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
97.08035  0.0105
442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
119.07522  0.0210

where psd and lake are categorical variables, with five and four
categories, respectively.  I'd like to find the maximum vol and the
lengths associated with each maximum vol by each category by each lake.
In other words, I'd like to have a data frame that looks something like 

Lake		Category	Length	vol
Clear		substock	152		3.2655
Clear		S-Q		266		11.73
Clear		Q-P		330		14.89
Pickerel	substock	170		3.4965
Pickerel	S-Q		248		10.69
Pickerel	Q-P		335		25.62
Pickerel	P-M		415		32.62
Pickerel	M-T		442		17.25	

In order to originally get this, I used 

with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))

and pulled the values I needed out by hand and put them into a .csv.
Unfortunately, I've got a number of other data sets upon which I'll need
to do the same analysis.  Finding a programmable alternative would
provide a much easier (and likely less error prone) method to achieve
the same results.  Ideally, the "Length" and "vol" data would be in a
data frame such that I could then analyze with nls.  

Does anyone have any thoughts as to how I might accomplish this?  

Thanks in advance, 

Steven Ranney	

More information about the R-help mailing list