[R] Using tapply to create a new table

Marc Schwartz marc_schwartz at comcast.net
Fri Jan 26 19:47:04 CET 2007


Josh,

As per the "Value" section of ?tapply, it "returns a single atomic value
for each cell".  This is easily viewed by using:

  tapply(set2$f1, set2$commonField, mean)

in a stand alone fashion or:

  str(tapply(set2$f1, set2$commonField, mean))

which will display the internal structure of the result. See ?str

To use merge() in the fashion you seem to want, you would want to use
aggregate() and not tapply().  The former returns a data frame, where
the "key" or "by" values will be part of the output.

See ?aggregate for more information.

I would also recommend that both for the readability of the code you
write and to help clarify for yourself, the objects that are returned
from each step, that you not nest the function calls as you have below.

There are times when it makes sense, but there are times when the code
would end up being a good candidate for an Obfuscated R contest.  :-)

HTH,

Marc

On Fri, 2007-01-26 at 13:21 -0500, Kalish, Josh wrote:
> Marc, 
> 
> Thanks for pointing out the merge function.  That gets me part of the
> way there.  The only thing is that I can't get the tapply() results
> into a format that merge() will take.  For example:
> 
> merge( set1 , tapply( set2$f1 , set2$commonField, mean ) ,
> by="commonField" )
> 
> Gives me  "Error in names... Unused arguments..."
> 
> I'm not sure what the result of a tapply() exactly is, but it doesn't
> seem to be a table.
> 
> Yeah, rank amateur questions...
> 
> Thanks,
> 
> Josh
> 
> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz at comcast.net] 
> Sent: Friday, January 26, 2007 1:08 PM
> To: Kalish, Josh
> Cc: 'r-help at stat.math.ethz.ch'
> Subject: Re: [R] Using tapply to create a new table
> 
> On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote:
> > All,
> > 
> > I'm sure that this is covered somewhere, but I can't seem to find a 
> > good explanation.  I have an existing table that contains
> information 
> > grouped by date.  This is as so:
> > 
> > Day		NumberOfCustomers		NumberOfComplaints
> > 20060512	10040				40
> > 20060513	32420				11
> > ...
> > 
> > 
> > I also have a table at the detail level as so:
> > 
> > Day		Meal		PricePaid	UsedCupon
> > 20060512	Fish		14		Y
> > 20060512	Chicken	20		N
> > ...
> > 
> > Is there a simple way to create summaries on the detail table and
> then 
> > join them into the first table above so that it looks like this:
> > 
> > Day		NumberOfCustomers		NumberOfComplaints	AveragePricePaid
> > NumberUsingCupon
> > 
> > 
> > I can do a tapply to get what I want from the detail table, but I 
> > can't figure out how to turn that into a table and join it back in.
> > 
> > 
> > 
> > Thanks,
> > 
> > Josh
> 
> Skipping the steps of using tapply() or aggregate() to get the
> summarized data from the second data frame, you would then use merge()
> to perform a SQL-like 'join' operation:
> 
> > DF.1
>        Day NumberOfCustomers NumberOfComplaints
> 1 20060512             10040                 40
> 2 20060513             32420                 11
> 
> > DF.2
>        Day    Meal PricePaid UsedCupon
> 1 20060512    Fish        14         Y
> 2 20060512 Chicken        20         N
> 
> > merge(DF.1, DF.2, by = "Day")
>        Day NumberOfCustomers NumberOfComplaints    Meal PricePaid
> 1 20060512             10040                 40    Fish        14
> 2 20060512             10040                 40 Chicken        20
>   UsedCupon
> 1         Y
> 2         N
> 
> 
> By default, only rows matching on the 'by' argument in both data
> frames will be in the result. See the 'all.x' and 'all.y' arguments to
> handle other scenarios of including non-matching rows.
> 
> See ?merge, which BTW:
> 
>   help.search("join")
> 
> would point you to, if you are familiar with the term from relational
> data base operations.
> 
> HTH,
> 
> Marc Schwartz



More information about the R-help mailing list