[R] Calculating total observations based on combinations of variable values

hadley wickham h.wickham at gmail.com
Wed Aug 27 21:43:14 CEST 2008


On Wed, Aug 27, 2008 at 12:11 PM, Josip Dasovic <jjd9 at sfu.ca> wrote:
> Hello:
>
> As someone making the move from STATA to R, I'm finding it difficult at times to perform basic tasks in R, so forgive me if I've missed an obvious and easily obtained solution to my problem.   I've searched the help guides and the archives and have not been able to find a solution that works.
>
> I have a data frame with thousands of observations that looks something like this:
>
> YEAR MONTH DAY   COUNTRY         REGION                  PROVINCE              CITY
> 1994     1  22 Sri Lanka     South Asia       Northern (Province)       Pungudutivu
> 1994     1  25 Sri Lanka     South Asia        Central (Province)             Kandy
> 1994     2  26 Sri Lanka     South Asia        Central (Province)             Kandy
> 1994     2  28 Sri Lanka     South Asia        Eastern (Province)         Wakianeri
> 1994     6  28 Sri Lanka     South Asia        Eastern (Province)        Valachenai
> 1994     6  31 Sri Lanka     South Asia        Central (Province)             Kandy
> 1995     3   1 Sri Lanka     South Asia          North (Province)       Kilinochchi
> 1995     3   6 Sri Lanka     South Asia        Western (Province)           Colombo
> 1995     7  15 Sri Lanka     South Asia       Northern (Province)          Mankulam
> 1995     7  23 Sri Lanka     South Asia       Northern (Province)       Point Pedro
> 1995     9  25 Sri Lanka     South Asia       Northern (Province)            Kilali
> ...
>
> What I would like to do is to calculate the total number of observations by unique combinations of the values of (some of the) variables above.
>
> For example, I would like to know how many observations (i.e. rows) have the values YEAR==1994 and MONTH==1.
>
> In the end, I'd like a table that looks like this:
>
> YEAR MONTH #OBS
> 1994     1  2
> 1994     2  2
> 1994     3  0
> 1994     4  0
> 1994     5  0
> 1994     6  2
> 1994     7  0
> 1994     8  0
> 1994     9  0
> 1994     10  0
> 1994     11  0
> 1994     12  0
> 1995     1  0
> 1995     2  0
> 1995     3  2
> 1995     4  0
> ...
>
> I do need to fill out the table with all the possible combinations, even where there are no observations with that combination in the data set.
> At first, it seemed like this would not be  think that aggregate is probably the way to go, but there doesn't seem to be an appropriate summary function (FUN) available.  Thanks in advance for any help in this matter,


For this, and other related problems, you might want to look at the
reshape package - http://had.co.nz/reshape

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list