[R] Using tapply to create a new table
Marc Schwartz
marc_schwartz at comcast.net
Fri Jan 26 19:47:04 CET 2007
Josh,
As per the "Value" section of ?tapply, it "returns a single atomic value
for each cell". This is easily viewed by using:
tapply(set2$f1, set2$commonField, mean)
in a stand alone fashion or:
str(tapply(set2$f1, set2$commonField, mean))
which will display the internal structure of the result. See ?str
To use merge() in the fashion you seem to want, you would want to use
aggregate() and not tapply(). The former returns a data frame, where
the "key" or "by" values will be part of the output.
See ?aggregate for more information.
I would also recommend that both for the readability of the code you
write and to help clarify for yourself, the objects that are returned
from each step, that you not nest the function calls as you have below.
There are times when it makes sense, but there are times when the code
would end up being a good candidate for an Obfuscated R contest. :-)
HTH,
Marc
On Fri, 2007-01-26 at 13:21 -0500, Kalish, Josh wrote:
> Marc,
>
> Thanks for pointing out the merge function. That gets me part of the
> way there. The only thing is that I can't get the tapply() results
> into a format that merge() will take. For example:
>
> merge( set1 , tapply( set2$f1 , set2$commonField, mean ) ,
> by="commonField" )
>
> Gives me "Error in names... Unused arguments..."
>
> I'm not sure what the result of a tapply() exactly is, but it doesn't
> seem to be a table.
>
> Yeah, rank amateur questions...
>
> Thanks,
>
> Josh
>
> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz at comcast.net]
> Sent: Friday, January 26, 2007 1:08 PM
> To: Kalish, Josh
> Cc: 'r-help at stat.math.ethz.ch'
> Subject: Re: [R] Using tapply to create a new table
>
> On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote:
> > All,
> >
> > I'm sure that this is covered somewhere, but I can't seem to find a
> > good explanation. I have an existing table that contains
> information
> > grouped by date. This is as so:
> >
> > Day NumberOfCustomers NumberOfComplaints
> > 20060512 10040 40
> > 20060513 32420 11
> > ...
> >
> >
> > I also have a table at the detail level as so:
> >
> > Day Meal PricePaid UsedCupon
> > 20060512 Fish 14 Y
> > 20060512 Chicken 20 N
> > ...
> >
> > Is there a simple way to create summaries on the detail table and
> then
> > join them into the first table above so that it looks like this:
> >
> > Day NumberOfCustomers NumberOfComplaints AveragePricePaid
> > NumberUsingCupon
> >
> >
> > I can do a tapply to get what I want from the detail table, but I
> > can't figure out how to turn that into a table and join it back in.
> >
> >
> >
> > Thanks,
> >
> > Josh
>
> Skipping the steps of using tapply() or aggregate() to get the
> summarized data from the second data frame, you would then use merge()
> to perform a SQL-like 'join' operation:
>
> > DF.1
> Day NumberOfCustomers NumberOfComplaints
> 1 20060512 10040 40
> 2 20060513 32420 11
>
> > DF.2
> Day Meal PricePaid UsedCupon
> 1 20060512 Fish 14 Y
> 2 20060512 Chicken 20 N
>
> > merge(DF.1, DF.2, by = "Day")
> Day NumberOfCustomers NumberOfComplaints Meal PricePaid
> 1 20060512 10040 40 Fish 14
> 2 20060512 10040 40 Chicken 20
> UsedCupon
> 1 Y
> 2 N
>
>
> By default, only rows matching on the 'by' argument in both data
> frames will be in the result. See the 'all.x' and 'all.y' arguments to
> handle other scenarios of including non-matching rows.
>
> See ?merge, which BTW:
>
> help.search("join")
>
> would point you to, if you are familiar with the term from relational
> data base operations.
>
> HTH,
>
> Marc Schwartz
More information about the R-help
mailing list