[R] Reshape or Plyr?
Law, Jason
Jason.Law at portlandoregon.gov
Mon Apr 22 18:35:47 CEST 2013
Hi Bruce,
I work with a lot of similar data and have to do these types of things quite often. I find it helps to keep to vectorized code as much as possible. That is, do as many of the calculations as possible outside of the aggregation code. Here's one way:
library(reshape2)
# stick to a variable naming convention and you'll avoid a lot of simple code errors
names(d) <- gsub('_', '.', tolower(names(d)), fixed = T)
dm <- melt(d, measure.var = c('ai', 'survey.time'))
results <- dcast(dm, location.name + spec.code ~ variable, fun.aggregate = sum)
results$ra <- results$ai / results$survey.time * 10
The output:
location.name spec.code ai survey.time ra
1 079-f2p1-Acetuna Buzz 8 72.8 1.0989011
2 079-f2p1-Acetuna Eumspp 5 24.3 2.0576132
3 079-f2p1-Acetuna Frag 18 12.1 14.8760331
4 079-f2p1-Acetuna Molmol 1 12.1 0.8264463
5 079-f2p1-Acetuna Molspp 28 72.8 3.8461538
6 079-f2p1-Acetuna Myokea 1 12.2 0.8196721
7 079-f2p1-Acetuna Nocalb 10 24.3 4.1152263
8 079-f2p1-Acetuna Phyllo 4 36.4 1.0989011
9 079-f2p1-Acetuna Ptedav 3 36.4 0.8241758
10 079-f2p1-Acetuna Ptegym 6 36.4 1.6483516
11 079-f2p1-Acetuna Ptepar 9 36.4 2.4725275
12 079-f2p1-Acetuna Pteper 4 24.3 1.6460905
13 079-f2p1-Acetuna Rhotum 30 36.4 8.2417582
14 079-f2p1-Acetuna Sacbil 11 36.4 3.0219780
15 079-f2p1-Acetuna Saclep 32 36.4 8.7912088
For a simple aggregation like this, reshape is simple and fast. I tend to use plyr when things get more complicated.
Jason Law
Statistician
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason.law at portlandoregon.gov
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bruce Miller
Sent: Saturday, April 20, 2013 6:55 AM
To: r-help at r-project.org
Subject: [R] Reshape or Plyr?
H all,
I have relative abundance data from >100 sites. This is from acoustic monitoring and usually the data is for 2-3 nights but in some cases my be longer like months or years for each location..
The data output from my management data base is proved by species by night for each location so data frame would look like this below. What I need to do is sum the Survey_time by Spec_Code for each location name and divide summed AI values for each Spec_code by the summed Survey time to adjust for unit effort then standardize it all by *10 to represent the relative abundance by survey hour to 10 hours. How best to do this?
Using Plyr or reshape?
Location name SPEC_CODE Start_Day Survey_Time AI Std AI
079-f2p1-Acetuna Buzz 2/14/2012 12.1 1 0.8264463
079-f2p1-Acetuna Buzz 2/14/2012 12.1 1 0.8264463
079-f2p1-Acetuna Eumspp 2/14/2012 12.1 1 0.8264463
079-f2p1-Acetuna Frag 2/14/2012 12.1 18 14.87603
079-f2p1-Acetuna Molspp 2/14/2012 12.1 5 4.132231
079-f2p1-Acetuna Molspp 2/14/2012 12.1 5 4.132231
079-f2p1-Acetuna Phyllo 2/14/2012 12.1 2 1.652893
079-f2p1-Acetuna Ptedav 2/14/2012 12.1 1 0.8264463
079-f2p1-Acetuna Ptegym 2/14/2012 12.1 1 0.8264463
079-f2p1-Acetuna Ptepar 2/14/2012 12.1 2 1.652893
079-f2p1-Acetuna Rhotum 2/14/2012 12.1 6 4.958678
079-f2p1-Acetuna Sacbil 2/14/2012 12.1 6 4.958678
079-f2p1-Acetuna Saclep 2/14/2012 12.1 11 9.090909
079-f2p1-Acetuna Buzz 2/15/2012 12.1 2 1.652893
079-f2p1-Acetuna Buzz 2/15/2012 12.1 2 1.652893
079-f2p1-Acetuna Molmol 2/15/2012 12.1 1 0.8264463
079-f2p1-Acetuna Molspp 2/15/2012 12.1 7 5.785124
079-f2p1-Acetuna Molspp 2/15/2012 12.1 7 5.785124
079-f2p1-Acetuna Nocalb 2/15/2012 12.1 6 4.958678
079-f2p1-Acetuna Phyllo 2/15/2012 12.1 1 0.8264463
079-f2p1-Acetuna Ptedav 2/15/2012 12.1 1 0.8264463
079-f2p1-Acetuna Ptegym 2/15/2012 12.1 4 3.305785
079-f2p1-Acetuna Ptepar 2/15/2012 12.1 4 3.305785
079-f2p1-Acetuna Pteper 2/15/2012 12.1 3 2.479339
079-f2p1-Acetuna Rhotum 2/15/2012 12.1 7 5.785124
079-f2p1-Acetuna Sacbil 2/15/2012 12.1 2 1.652893
079-f2p1-Acetuna Saclep 2/15/2012 12.1 6 4.958678
079-f2p1-Acetuna Buzz 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Buzz 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Eumspp 2/16/2012 12.2 4 3.278688
079-f2p1-Acetuna Molspp 2/16/2012 12.2 2 1.639344
079-f2p1-Acetuna Molspp 2/16/2012 12.2 2 1.639344
079-f2p1-Acetuna Myokea 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Nocalb 2/16/2012 12.2 4 3.278688
079-f2p1-Acetuna Phyllo 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Ptedav 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Ptegym 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Ptepar 2/16/2012 12.2 3 2.459016
079-f2p1-Acetuna Pteper 2/16/2012 12.2 1 0.8196721
079-f2p1-Acetuna Rhotum 2/16/2012 12.2 17 13.93443
079-f2p1-Acetuna Sacbil 2/16/2012 12.2 3 2.459016
079-f2p1-Acetuna Saclep 2/16/2012 12.2 15 12.29508
Thanks for any suggestions. Excel will be a mess to try to do that.
Bruce
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list