[R] Sum by column

Joshua Wiley jwiley.psych at gmail.com
Wed Jan 12 15:28:08 CET 2011


Hi Peter,

R has some fairly flexible ways of passing values of some variable (X)
by another (the INDEX) to different FUNctions.  Here is an example
using your data:

## your email data, in convenient form
dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232,
27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372,
35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615,
6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615,
12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101",
"AT0101", "AT0801", "AT1201", "PA1201", "PA1301"), class = "factor")),
.Names = c("ED",
"ECOCODE"), class = "data.frame", row.names = c(NA, -27L))
## look at the structure of the data
str(dat)

## inside of "dat" (to avoid typing its name repeatedly)
## find the sum of ED at each level of ECOCODE
with(dat, tapply(X = ED, INDEX = ECOCODE, FUN = sum, na.rm = TRUE))

## should give something like
   AA0101    AT0101    AT0801    AT1201    PA1201    PA1301
 21.80947 236.83684  27.12871  34.48566 209.60328  35.67018

For documentation, look at:

?tapply
## similar in many ways though sometimes slightly more/less convenient
?by

Hope that helps,

Josh

On Wed, Jan 12, 2011 at 2:38 AM, Peter Francis <peterfrancis at me.com> wrote:
> Dear List,
>
> I have a question of convenience,
>
> I am looking to sum the values of one column based on another column - a example may help explain better!
>
> ED                      ECOCODE
> 21.809467       AA0101
> 36.229566       PA1201
> 51.861284       PA1201
> 11.36232        PA1201
> 27.264634       PA1201
> 12.261986       PA1201
> 46.519313       PA1201
> 7.815376        PA1201
> 2.810428        PA1201
> 13.478372       PA1201
> 35.670182       PA1301
> 27.128715       AT0801
> 19.010294       AT1201
> 15.475368       AT1201
> 18.597983       AT0101
> 29.292615       AT0101
> 6.749846        AT0101
> 14.981488       AT0101
> 14.93511        AT0101
> 14.93511        AT0101
> 21.040785       AT0101
> 8.271615        AT0101
> 12.94232        AT0101
> 6.749846        AT0101
> 15.484412       AT0101
> 29.644494       AT0101
> 43.211212       AT0101
>
> So for AA0101 it would be = 21.809467
>            AT1201 it would be = 19.010294+15.475368
>
> etc
>
> I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other.
>
> This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance!
>
> Thanks
>
> Peter
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list