[R] Unique?
Francisco J. Zagmutt
gerifalte28 at hotmail.com
Thu May 11 19:10:15 CEST 2006
Hi Cameron
You need to be more specific when you ask a question so you can get a better
answer. Anyhow, when you say that you want to retain all the other
variables do you mean that you want to create a new column in the dataset
that contains the calculated sum? If that is the case you can use a
construction like:
set.seed(1)
step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
step4[,"SUM"]=result[match(step4[,"TRIPID"],names(result))]
step4
TRIPID CONVUNIT Sum
1 111 36 122
2 222 48 121
3 333 48 129
4 111 42 122
5 222 30 121
6 333 43 129
7 111 44 122
8 222 43 121
9 333 38 129
Cheers
Francisco
>From: "Guenther, Cameron" <Cameron.Guenther at MyFWC.com>
>To: "Francisco J. Zagmutt" <gerifalte28 at hotmail.com>
>Subject: RE: [R] Unique?
>Date: Thu, 11 May 2006 12:08:31 -0400
>
>It is close but not quite what I want. I need to retain all of the
>other variables as well.
>
>
>Cameron Guenther, Ph.D.
>Associate Research Scientist
>FWC/FWRI, Marine Fisheries Research
>100 8th Avenue S.E.
>St. Petersburg, FL 33701
>(727)896-8626 Ext. 4305
>cameron.guenther at myfwc.com
>-----Original Message-----
>From: Francisco J. Zagmutt [mailto:gerifalte28 at hotmail.com]
>Sent: Wednesday, May 10, 2006 6:06 PM
>To: Guenther, Cameron; r-help at stat.math.ethz.ch
>Subject: RE: [R] Unique?
>
>If you only care about the sum of CONVUNIT by each TRIPID then you can
>use tapply i.e.:
>
>step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
>result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
>result
>111 222 333
>115 107 123
>
>Is this what you wanted to do? I can't think of anything faster than
>tapply for your problem.
>
>I hope this helps
>
>Francisco
>
>
>
>
> >From: "Guenther, Cameron" <Cameron.Guenther at MyFWC.com>
> >To: <r-help at stat.math.ethz.ch>
> >Subject: [R] Unique?
> >Date: Wed, 10 May 2006 17:02:33 -0400
> >
> >
> >Hello,
> >I have sample data set that looks like:
> >
> >YEAR MONTH DAY CONTINUE SPL TIMEFISH
> >TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID
> >CONVUNIT
> >1992 1 26 1 SP0073928 8
> >H 7 25 4 NA 1000000
> >02163399054 161
> >1992 1 26 1 SP0073928 8
> >H 7 25 4 NA 1000000
> >02163399054 8
> >1992 1 26 2 SP0004228 8
> >H 7 25 4 NA 1000000
> >02163399054 161
> >1992 1 26 2 SP0004228 8
> >H 7 25 4 NA 1000000
> >02163399054 8
> >1992 1 25 NA SP0052652 8
> >H 7 25 4 NA 1000000
> >02163399057 85
> >1992 1 26 NA SP0037940 8
> >H 7 25 4 NA 1000000
> >02163399058 70
> >1992 1 27 NA SP0072357 8
> >H 7 25 4 NA 1000000
> >02163399059 15
> >1992 1 27 NA SP0072357 8
> >H 7 25 4 NA 1000000
> >02163399059 20
> >1992 1 27 NA SP0026324 8
> >H 7 25 4 NA 1000000
> >02163399060 8
> >1992 1 28 1 SP0072357 8
> >H 7 25 4 NA 1000000
> >02163399062 200
> >
> >How can I use unique to extract the rows that have repeated tripid's
> >only, not a unique value for each variable but only for TRIPID. I then
>
> >want to condense the unique values by summing the CONVUNIT for each
> >unique value of TRIPID. I posted a similar question last week and
> >received a sufficient answer of how to do this without using uniqe.
> >The solution below worked just fine on this sample data set but the
> >full data set has 446,000 rows of data and my computer and R simply
> >cannot handle this follwing code on data this large.
> >
> >conds<-by(Step4,Step4$TRIPID,function(x)
> >replace(x[1,],"CONVUNIT",sum(x$CONVUNIT)))
> >Step5<-do.call(rbind,conds)
> >
> >Thank you,
> >
> >Cameron Guenther, Ph.D.
> >Associate Research Scientist
> >FWC/FWRI, Marine Fisheries Research
> >100 8th Avenue S.E.
> >St. Petersburg, FL 33701
> >(727)896-8626 Ext. 4305
> >cameron.guenther at myfwc.com
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide!
> >http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list