[R] Summing rows based on criteria
PDXRugger
J_R_36 at hotmail.com
Mon Nov 2 18:56:07 CET 2009
I am attempting to clean up some land use building data and need to join some
buildings together making sure not to double count GIS slivers. The first
data.frame is the original, the 2nd adds all the acres for each identical
bldgid. I now want to
a) throw out all but one of the the cases where the Years and ImpValue are
Identical,
b) Sum the impvalues based on:
1) The Years are identical and the ImpValue are not
2)The ImpValues are identical and the Years are not
Resulting in the 3rd data frame.
Please consider the following
DF=cbind(Acres,Bldgid,Year,ImpValue)
DF<-as.data.frame(DF)
DF
Acres Bldgid Year ImpValue
1 100.00 1 1946 1000
2 101.00 2 1952 1400
3 100.00 3 1922 1300
4 130.00 4 1910 900
5 156.00 5 1955 5000
6 0.50 5 1955 1200
7 293.00 6 1999 500
8 300.00 7 1990 9000
9 0.09 7 1991 9000
10 100.00 8 2000 1000
11 12.50 8 2000 1000
#Aggregate acres where identical ids
dupbuild<-aggregate(DF$Acres,DF["Bldgid"],sum)
colnames(dupbuild)[2]<-"Acres"
#Add aggregated Acres to DF
DF$Acres<-dupbuild$Acres[match(DF$Bldgid,dupbuild$Bldgid)]
DF
Acres dgid Year ImpValue
1 100.00 1 1946 1000
2 101.00 2 1952 1400
3 100.00 3 1922 1300
4 130.00 4 1910 900
5 156.50 5 1955 5000
6 156.50 5 1955 1200
7 293.00 6 1999 500
8 300.09 7 1990 9000
9 300.09 7 1991 9000
10 112.50 8 2000 1000
11 112.50 8 2000 1000
#desired outcome data frame
Acres dgid Year ImpValue
1 100.00 1 1946 1000
2 101.00 2 1952 1400
3 100.00 3 1922 1300
4 130.00 4 1910 900
5 156.50 5 1955 7200 #combined 5 & 6
7 293.00 6 1999 500
8 300.09 7 1990 18000 #combined 8 & 9
10 112.50 8 2000 1000 #one case thrown out
So in this case the Impvalue are added together for rows 5 & 6 (from
dataframe example 2) b/c the years are identical and the Impvalue is not,
rows 8 and 9 have their Impvalue summed because there Years are identical
but the improvement value is not, and one of the cases is thrown out of rows
10 & 11 because they have identical years and ImpValue.
When rows are joined the Year value is no longer important but to remain
consistent i would like to keep the earliest (lowest) year. There will be
instances in the actual data where there are more than 2 cases to consider
if that makes any coding difference as i didnt include any in my example
data. It would also be useful to include a new column keeping track of how
many joined bldgids . I think i can figure that one out though.
Hope this is all clear. Thanks for the guidance and insights
JR
--
View this message in context: http://old.nabble.com/Summing-rows-based-on-criteria-tp26157755p26157755.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list