# [R] Summary Statistics for data.frame

justin rapp jdrapp at gmail.com
Sat Jul 8 22:55:40 CEST 2006

When I attempt to use the mysummary function, I obtain the following error:

Error in var(x) : missing observations in cov/cor

When I use:
by(data.logistic,data.logistic\$Ydrafted,summary)

I receive no errors. I cut and pasted your mysummary function directly
into my r console.  Should I have made any adjustments to the code?

jdr

On 7/8/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 7/8/2006 3:44 PM, justin rapp wrote:
> > I apologize for my constant questions but I am new to R and trying to
> > gain an appreciation for its capabilities.  The following task is easy
> > in Excel and I was hoping somebody could give me a quick explanation
> > for how it can be acheived in R so I can avoid having to switch
> > between the two applications.
> >
> > How do I find the Summary Statistics in one Vector of the dataframe by
> > levels in another of the vectors.
> >
> > For example, I have the following headings for my data.frame.
> > Conference
> > Year Drafted
> > Height
> > Weight
> > Ratio
> >
> > I would like to see compute the mean Height, Weight, and Ratio as well
> > as their variances for each of the years under Year
> > Drafted(1980-2000).  What is the most efficient way of doing this?
>
> I think the quickest is
>
> by(mydf, mydf\$Year, summary)
>
> but this won't give you the variance.  You'll need your own little
> function to calculate mean and variance, e.g.
>
> mysummary <- function(df) apply(df, 2,
>                 function(x) c(mean=mean(x), variance=var(x)))
>
> by(mydf, mydf\$Year, mysummary)
>
> If you don't like the format of the output, you can play around with the
> mysummary function.  It will be applied to each subset of the
> data.frame, and the results will be put together into a list with one
> entry per level of mydf\$Year.
>
>
> Duncan
>