[R] trying ti use a function in aggregate

arun smartpink111 at yahoo.com
Thu Oct 25 22:07:31 CEST 2012


HI,
In my previous solution, the order got messed up.  I should have ordered the columns.
Try this:
dat1<-read.table(text="
 Trip_id          Vessel      CommonName Length Count
1      230        Sunlight    ShadAmerican    19    1
2      230        Sunlight    ShadAmerican    20    1
3      230        Sunlight    ShadAmerican    21    1
4      230        Sunlight    ShadAmerican    23    1
5      230        Sunlight    ShadAmerican    26    1
6      230        Sunlight    ShadAmerican    27    1
7      230        Sunlight    ShadAmerican    30    2
8      230        Sunlight    ShadAmerican    33    1
9      230        Sunlight    ShadAmerican    34    1
10    230        Sunlight    ShadAmerican    37    1
11    230        Sunlight HerringBlueback    20    1
12    230        Sunlight HerringBlueback    21    2
13    230        Sunlight HerringBlueback    22    5
14    230        Sunlight HerringBlueback    26    1
15    230        Sunlight          Alewife    17    1
16    230        Sunlight          Alewife    18    1
17    230        Sunlight          Alewife    20    2
18    230        Sunlight          Alewife    21    4
19    230        Sunlight          Alewife    22    16
20    230        Sunlight          Alewife    23    22
21    230        Sunlight          Alewife    24    16
22    230        Sunlight          Alewife    25    4
23    230        Sunlight          Alewife    26    1
24    230        Sunlight          Alewife    27    2
25    230        Sunlight          Alewife    28    2
26    231 Western_Venture    ShadAmerican    23    1
27    231 Western_Venture    ShadAmerican    24    1
28    231 Western_Venture    ShadAmerican    25    1
29    231 Western_Venture    ShadAmerican    28    2
30    231 Western_Venture    ShadAmerican    29    2
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-dat1[order(dat1$Trip_id,dat1$Vessel,dat1$CommonName,dat1$Length,dat1$Count),]
dat3<-dat2
dat3$Prop<-unlist(tapply(dat3$Count,list(dat3$Trip_id,dat3$CommonName),function(x) x/sum(x)))


#Jean's method:

agg <- with(dat2, aggregate(data.frame(Total=Count), data.frame(Trip_id,
CommonName), sum))
# combine the totals with the full data frame
data2 <- merge(dat2, agg)
# then calculate proportions
data2$Prop <- data2$Count/data2$Total
data3<-data2[,-6]
data4<-data3[,c(1,3,2,4:6)]
rownames(dat3)<-1:nrow(dat3)
 identical(dat3,data4)
#[1] TRUE

head(dat3)
#  Trip_id   Vessel CommonName Length Count       Prop
#1     230 Sunlight    Alewife     17     1 0.01408451
#2     230 Sunlight    Alewife     18     1 0.01408451
#3     230 Sunlight    Alewife     20     2 0.02816901
#4     230 Sunlight    Alewife     21     4 0.05633803
#5     230 Sunlight    Alewife     22    16 0.22535211
#6     230 Sunlight    Alewife     23    22 0.30985915
 head(data4)
#  Trip_id   Vessel CommonName Length Count       Prop
#1     230 Sunlight    Alewife     17     1 0.01408451
#2     230 Sunlight    Alewife     18     1 0.01408451
#3     230 Sunlight    Alewife     20     2 0.02816901
#4     230 Sunlight    Alewife     21     4 0.05633803
#5     230 Sunlight    Alewife     22    16 0.22535211
#6     230 Sunlight    Alewife     23    22 0.30985915
A.K.





----- Original Message -----
From: Jean V Adams <jvadams at usgs.gov>
To: Sally_roman <sroman at umassd.edu>
Cc: r-help at r-project.org
Sent: Thursday, October 25, 2012 2:45 PM
Subject: Re: [R] trying ti use a function in aggregate

Sally,

It's great that you provided data and code.  To make it even more 
user-friendly for R-help readers, supply your data as Rcode, using (for 
example) the dput() function.

The reason you were getting all 1s with your code, is that you had told it 
to aggregate by trip, LENGTH, and species.  But the data are already 
summarized by trip, LENGTH, and species, so your myfun() function is 
calculating the count/count=1 for each row.  You could get rid of LENGTH 
to use your myfun() function, but the results aren't pretty ...

with(data, aggregate(data.frame(Total=Count), data.frame(Trip_id, 
CommonName), myfun))

Instead, I suggest you can use the aggregate function to calculate the 
total counts, then merge these totals with your original data to calculate 
the proportions.

# small subset of data
data <- structure(list(Trip_id = c(230L, 230L, 230L, 230L, 230L, 230L, 
230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 
230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 231L, 231L, 231L, 
231L, 231L), Vessel = c("Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Western Venture", "Western Venture", 
"Western Venture", "Western Venture", "Western Venture"), CommonName = 
c("Shad,American", 
"Shad,American", "Shad,American", "Shad,American", "Shad,American", 
"Shad,American", "Shad,American", "Shad,American", "Shad,American", 
"Shad,American", "Herring,Blueback", "Herring,Blueback", 
"Herring,Blueback", 
"Herring,Blueback", "Alewife", "Alewife", "Alewife", "Alewife", 
"Alewife", "Alewife", "Alewife", "Alewife", "Alewife", "Alewife", 
"Alewife", "Shad,American", "Shad,American", "Shad,American", 
"Shad,American", "Shad,American"), Length = c(19L, 20L, 21L, 
23L, 26L, 27L, 30L, 33L, 34L, 37L, 20L, 21L, 22L, 26L, 17L, 18L, 
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 23L, 24L, 25L, 28L, 
29L), Count = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 
5L, 1L, 1L, 1L, 2L, 4L, 16L, 22L, 16L, 4L, 1L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L)), .Names = c("Trip_id", "Vessel", "CommonName", "Length", 
"Count"), row.names = c(NA, -30L), class = "data.frame")

# calculate the total count for each trip and Species
agg <- with(data, aggregate(data.frame(Total=Count), data.frame(Trip_id, 
CommonName), sum))

# combine the totals with the full data frame
data2 <- merge(data, agg)

# then calculate proportions
data2$Prop <- data2$Count/data2$Total

data2


Jean



Sally_roman <sroman at umassd.edu> wrote on 10/25/2012 09:19:57 AM:
> 
> Hi -I am using R v 2.13.0.  I am trying to use the aggregate function to
> calculate the percent at length for each Trip_id and CommonName.  Here 
is a
> small subset of the data. 
>    Trip_id          Vessel       CommonName Length Count
> 1      230        Sunlight    Shad,American     19     1
> 2      230        Sunlight    Shad,American     20     1
> 3      230        Sunlight    Shad,American     21     1
> 4      230        Sunlight    Shad,American     23     1
> 5      230        Sunlight    Shad,American     26     1
> 6      230        Sunlight    Shad,American     27     1
> 7      230        Sunlight    Shad,American     30     2
> 8      230        Sunlight    Shad,American     33     1
> 9      230        Sunlight    Shad,American     34     1
> 10     230        Sunlight    Shad,American     37     1
> 11     230        Sunlight Herring,Blueback     20     1
> 12     230        Sunlight Herring,Blueback     21     2
> 13     230        Sunlight Herring,Blueback     22     5
> 14     230        Sunlight Herring,Blueback     26     1
> 15     230        Sunlight          Alewife     17     1
> 16     230        Sunlight          Alewife     18     1
> 17     230        Sunlight          Alewife     20     2
> 18     230        Sunlight          Alewife     21     4
> 19     230        Sunlight          Alewife     22    16
> 20     230        Sunlight          Alewife     23    22
> 21     230        Sunlight          Alewife     24    16
> 22     230        Sunlight          Alewife     25     4
> 23     230        Sunlight          Alewife     26     1
> 24     230        Sunlight          Alewife     27     2
> 25     230        Sunlight          Alewife     28     2
> 26     231 Western Venture    Shad,American     23     1
> 27     231 Western Venture    Shad,American     24     1
> 28     231 Western Venture    Shad,American     25     1
> 29     231 Western Venture    Shad,American     28     2
> 30     231 Western Venture    Shad,American     29     2
> 
> My code is:
> myfun<-function (x) x/sum(x)
> b<-with(data,aggregate(x=list(Percent=Count),by=list
> (Trip_id=Trip_id,Length=Length,Species=CommonName),
> FUN="myfun"))
> 
> My issue is that the percent is not be calculated by Trip_id and 
CommonName. 
> The result is that each row has a percent of 1 indicating that myfun is 
not
> dividing by the sum of counts with a Trip_id/CommonName group.  Any help
> would be appreciated.
> Thank you 

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list