[R] trying ti use a function in aggregate
arun
smartpink111 at yahoo.com
Thu Oct 25 22:07:31 CEST 2012
HI,
In my previous solution, the order got messed up. I should have ordered the columns.
Try this:
dat1<-read.table(text="
Trip_id Vessel CommonName Length Count
1 230 Sunlight ShadAmerican 19 1
2 230 Sunlight ShadAmerican 20 1
3 230 Sunlight ShadAmerican 21 1
4 230 Sunlight ShadAmerican 23 1
5 230 Sunlight ShadAmerican 26 1
6 230 Sunlight ShadAmerican 27 1
7 230 Sunlight ShadAmerican 30 2
8 230 Sunlight ShadAmerican 33 1
9 230 Sunlight ShadAmerican 34 1
10 230 Sunlight ShadAmerican 37 1
11 230 Sunlight HerringBlueback 20 1
12 230 Sunlight HerringBlueback 21 2
13 230 Sunlight HerringBlueback 22 5
14 230 Sunlight HerringBlueback 26 1
15 230 Sunlight Alewife 17 1
16 230 Sunlight Alewife 18 1
17 230 Sunlight Alewife 20 2
18 230 Sunlight Alewife 21 4
19 230 Sunlight Alewife 22 16
20 230 Sunlight Alewife 23 22
21 230 Sunlight Alewife 24 16
22 230 Sunlight Alewife 25 4
23 230 Sunlight Alewife 26 1
24 230 Sunlight Alewife 27 2
25 230 Sunlight Alewife 28 2
26 231 Western_Venture ShadAmerican 23 1
27 231 Western_Venture ShadAmerican 24 1
28 231 Western_Venture ShadAmerican 25 1
29 231 Western_Venture ShadAmerican 28 2
30 231 Western_Venture ShadAmerican 29 2
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-dat1[order(dat1$Trip_id,dat1$Vessel,dat1$CommonName,dat1$Length,dat1$Count),]
dat3<-dat2
dat3$Prop<-unlist(tapply(dat3$Count,list(dat3$Trip_id,dat3$CommonName),function(x) x/sum(x)))
#Jean's method:
agg <- with(dat2, aggregate(data.frame(Total=Count), data.frame(Trip_id,
CommonName), sum))
# combine the totals with the full data frame
data2 <- merge(dat2, agg)
# then calculate proportions
data2$Prop <- data2$Count/data2$Total
data3<-data2[,-6]
data4<-data3[,c(1,3,2,4:6)]
rownames(dat3)<-1:nrow(dat3)
identical(dat3,data4)
#[1] TRUE
head(dat3)
# Trip_id Vessel CommonName Length Count Prop
#1 230 Sunlight Alewife 17 1 0.01408451
#2 230 Sunlight Alewife 18 1 0.01408451
#3 230 Sunlight Alewife 20 2 0.02816901
#4 230 Sunlight Alewife 21 4 0.05633803
#5 230 Sunlight Alewife 22 16 0.22535211
#6 230 Sunlight Alewife 23 22 0.30985915
head(data4)
# Trip_id Vessel CommonName Length Count Prop
#1 230 Sunlight Alewife 17 1 0.01408451
#2 230 Sunlight Alewife 18 1 0.01408451
#3 230 Sunlight Alewife 20 2 0.02816901
#4 230 Sunlight Alewife 21 4 0.05633803
#5 230 Sunlight Alewife 22 16 0.22535211
#6 230 Sunlight Alewife 23 22 0.30985915
A.K.
----- Original Message -----
From: Jean V Adams <jvadams at usgs.gov>
To: Sally_roman <sroman at umassd.edu>
Cc: r-help at r-project.org
Sent: Thursday, October 25, 2012 2:45 PM
Subject: Re: [R] trying ti use a function in aggregate
Sally,
It's great that you provided data and code. To make it even more
user-friendly for R-help readers, supply your data as Rcode, using (for
example) the dput() function.
The reason you were getting all 1s with your code, is that you had told it
to aggregate by trip, LENGTH, and species. But the data are already
summarized by trip, LENGTH, and species, so your myfun() function is
calculating the count/count=1 for each row. You could get rid of LENGTH
to use your myfun() function, but the results aren't pretty ...
with(data, aggregate(data.frame(Total=Count), data.frame(Trip_id,
CommonName), myfun))
Instead, I suggest you can use the aggregate function to calculate the
total counts, then merge these totals with your original data to calculate
the proportions.
# small subset of data
data <- structure(list(Trip_id = c(230L, 230L, 230L, 230L, 230L, 230L,
230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L,
230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 231L, 231L, 231L,
231L, 231L), Vessel = c("Sunlight", "Sunlight", "Sunlight", "Sunlight",
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight",
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight",
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight",
"Sunlight", "Sunlight", "Sunlight", "Western Venture", "Western Venture",
"Western Venture", "Western Venture", "Western Venture"), CommonName =
c("Shad,American",
"Shad,American", "Shad,American", "Shad,American", "Shad,American",
"Shad,American", "Shad,American", "Shad,American", "Shad,American",
"Shad,American", "Herring,Blueback", "Herring,Blueback",
"Herring,Blueback",
"Herring,Blueback", "Alewife", "Alewife", "Alewife", "Alewife",
"Alewife", "Alewife", "Alewife", "Alewife", "Alewife", "Alewife",
"Alewife", "Shad,American", "Shad,American", "Shad,American",
"Shad,American", "Shad,American"), Length = c(19L, 20L, 21L,
23L, 26L, 27L, 30L, 33L, 34L, 37L, 20L, 21L, 22L, 26L, 17L, 18L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 23L, 24L, 25L, 28L,
29L), Count = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
5L, 1L, 1L, 1L, 2L, 4L, 16L, 22L, 16L, 4L, 1L, 2L, 2L, 1L, 1L,
1L, 2L, 2L)), .Names = c("Trip_id", "Vessel", "CommonName", "Length",
"Count"), row.names = c(NA, -30L), class = "data.frame")
# calculate the total count for each trip and Species
agg <- with(data, aggregate(data.frame(Total=Count), data.frame(Trip_id,
CommonName), sum))
# combine the totals with the full data frame
data2 <- merge(data, agg)
# then calculate proportions
data2$Prop <- data2$Count/data2$Total
data2
Jean
Sally_roman <sroman at umassd.edu> wrote on 10/25/2012 09:19:57 AM:
>
> Hi -I am using R v 2.13.0. I am trying to use the aggregate function to
> calculate the percent at length for each Trip_id and CommonName. Here
is a
> small subset of the data.
> Trip_id Vessel CommonName Length Count
> 1 230 Sunlight Shad,American 19 1
> 2 230 Sunlight Shad,American 20 1
> 3 230 Sunlight Shad,American 21 1
> 4 230 Sunlight Shad,American 23 1
> 5 230 Sunlight Shad,American 26 1
> 6 230 Sunlight Shad,American 27 1
> 7 230 Sunlight Shad,American 30 2
> 8 230 Sunlight Shad,American 33 1
> 9 230 Sunlight Shad,American 34 1
> 10 230 Sunlight Shad,American 37 1
> 11 230 Sunlight Herring,Blueback 20 1
> 12 230 Sunlight Herring,Blueback 21 2
> 13 230 Sunlight Herring,Blueback 22 5
> 14 230 Sunlight Herring,Blueback 26 1
> 15 230 Sunlight Alewife 17 1
> 16 230 Sunlight Alewife 18 1
> 17 230 Sunlight Alewife 20 2
> 18 230 Sunlight Alewife 21 4
> 19 230 Sunlight Alewife 22 16
> 20 230 Sunlight Alewife 23 22
> 21 230 Sunlight Alewife 24 16
> 22 230 Sunlight Alewife 25 4
> 23 230 Sunlight Alewife 26 1
> 24 230 Sunlight Alewife 27 2
> 25 230 Sunlight Alewife 28 2
> 26 231 Western Venture Shad,American 23 1
> 27 231 Western Venture Shad,American 24 1
> 28 231 Western Venture Shad,American 25 1
> 29 231 Western Venture Shad,American 28 2
> 30 231 Western Venture Shad,American 29 2
>
> My code is:
> myfun<-function (x) x/sum(x)
> b<-with(data,aggregate(x=list(Percent=Count),by=list
> (Trip_id=Trip_id,Length=Length,Species=CommonName),
> FUN="myfun"))
>
> My issue is that the percent is not be calculated by Trip_id and
CommonName.
> The result is that each row has a percent of 1 indicating that myfun is
not
> dividing by the sum of counts with a Trip_id/CommonName group. Any help
> would be appreciated.
> Thank you
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list