[R] Aggregate and cross tabulation
jim holtman
jholtman at gmail.com
Wed Oct 28 04:54:07 CET 2009
FIRST:
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
If you expect an answer, please provide the data. Here is one way of doing it:
> N <- 30
> x <- data.frame(A=sample(1:3, N, TRUE), B=sample(1:2, N, TRUE),
+ C=sample(1:2, N, TRUE), D=sample(1:4, N, TRUE), data=runif(N))
> require(reshape)
> x.m <- melt(x, measure='data')
> cast(x.m, A+B+C~D, mean)
A B C 1 2 3 4
1 1 1 1 0.51473265 0.7396417 0.0853110 NaN
2 1 1 2 0.07246063 0.2939918 NaN NaN
3 1 2 1 NaN NaN 0.5297180 0.10505014
4 1 2 2 NaN 0.8383841 NaN NaN
5 2 1 1 NaN NaN 0.8016877 0.04152843
6 2 1 2 0.34448739 NaN NaN 0.35757999
7 2 2 1 0.87943330 NaN 0.1431666 0.92051784
8 2 2 2 NaN NaN 0.5008505 NaN
9 3 1 1 0.48216957 0.4230986 NaN 0.53786492
10 3 1 2 NaN 0.7602803 NaN 0.33989081
11 3 2 1 0.43471764 NaN 0.2642490 NaN
12 3 2 2 NaN 0.3665636 NaN 0.37875944
On Tue, Oct 27, 2009 at 8:32 PM, Jonathan Greenberg
<greenberg at ucdavis.edu> wrote:
> R-helpers:
>
> I have a data frame containing 4 factor variables (let's say A,B,C, and D)
> and 1 numerical variable (N). I would like to produce a cross-tabulated
> data frame in which A,B,C are individual columns, each factor of D is its
> own column, and the field is calculated as a given function of N (I would
> like to have two output data frames, one with the mean(N) and one with the
> sum(N), e.g.:
>
> A, B, C, D1, D2,
> ..., DM
> A1 B1 C1 mean(N{A1,B1,C1,D1)}) mean(N{A1,B1,C1,D2)})
> mean(N{A1,B1,C1,DM)})
> A2 B1 C1 mean(N{A2,B1,C1,D1)}) mean(N{A2,B1,C1,D2)})
> mean(N{A2,B1,C1,DM)})
> etc...
>
> I can mostly do this with aggregate, e.g.
> output = aggregate(N,list(A,B,C,D),mean), but I can't get that last step of
> cross-tabulating the Ds to column headers. table() and xtabs() appear to
> just count, rather than giving me access to sum() and mean(). Any ideas?
> Ideally I'd like to do this in a single step, as the aggregate output
> (above) produces a much larger data frame than a cross-tabulated output
> would (in my particular case).
>
> --j
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list