[R] as.data.frame.table() to convert by() output to a data frame
David Winsemius
dwinsemius at comcast.net
Thu Nov 26 06:17:59 CET 2009
On Nov 25, 2009, at 9:54 PM, Michael Ash wrote:
> I remain confused by the difference between
>
> library(MASS)
> data(Cars93)
>
> as
> .data
> .frame
> (tapply
> (Cars93
> $Price,list(Cars93$Origin,Cars93$AirBags,Cars93$Passengers),median))
> as
> .data
> .frame
> .table
> (tapply
> (Cars93
> $Price,list(Cars93$Origin,Cars93$AirBags,Cars93$Passengers),median))
>
The display may not make it clear, but applying str() to both should
make it clear that the first is similar to what one might get with
cbind'ing the results of the inner function (which would not work in
this case on a list object). You get two rows of 18 variables all
medians or NA, while the second is the unique combinations of the (USA/
nonUSA)*(Passengers)*(Airbags)*(Passengers) cross and their associated
medians which as.data.frame has labeled "Freq" since that is the usual
element of a contingency table. I think of as.data.frame.table as
simply another way of accomplishing as.data.frame(table()). Is that
not how you were intending it?
>
> I clearly want the latter, but that's not clear from the
> documentation.
Sometimes it is helpful to look at the code as well:
> as.data.frame.table
function (x, row.names = NULL, ..., responseName = "Freq",
stringsAsFactors = TRUE)
{
x <- as.table(x)
ex <- quote(data.frame(do.call("expand.grid", c(dimnames(x),
stringsAsFactors = stringsAsFactors)), Freq = c(x), row.names
= row.names))
names(ex)[3L] <- responseName
eval(ex)
}
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
>
>
> Best,
> Michael
>
>
> On Wed, Nov 25, 2009 at 5:55 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>>
>> On Nov 25, 2009, at 4:11 PM, Michael Ash wrote:
>>
>>> Dear all,
>>>
>>> This seems to be working, but I'd like to make sure that I'm not
>>> doing
>>> anything wrong.
>>>
>>> I am using by() to construct a complicated summary statistic by
>>> several factors in my data (specifically, the 90-50 income ratio by
>>> city and race).
>>>
>>> cityrace.by <- by(microdata, list(microdata$city,microdata$race),
>>> function (x) quantile(x$income, probs=0.9) / quantile(x$income,
>>> probs=0.5) )
>>>
>>> I would now like to use the data created by by() as a dataset with
>>> city-race as the unit of observation.
>>>
>>> However, cityrace.data <- as.data.frame(cityrace.by) does not work
>>> because
>>> "Error in as.data.frame.default(city.by) :
>>> cannot coerce class "by" into a data.frame"
>>>
>>> The following is not a documented use of as.data.frame.table(),
>>> but it
>>> seems to work. It gives the columns slightly strange names,
>>> including
>>> "Freq" for the statistic computed in by by() but otherwise, the
>>> dataframe is indexed by city and race with the 90-50 ratio as the
>>> variable
>>>
>>> cityrace.data <- as.data.frame.table(cityrace.by)
>>
>> If the by-object you get happens to be a 2d array, then why not.
>> Tables are
>> matrices after all.
>>
>>> tt <- table(c(1,1), c(1,1))
>>> tt
>>
>> 1
>> 1 2
>>> is.matrix(tt)
>> [1] TRUE
>>
>>> --
>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>
> --
> Michael Ash, Associate Professor
> of Economics and Public Policy
> Department of Economics and CPPA
> University of Massachusetts
> Amherst, MA 01003
> Tel +1-413-545-6329 Fax +1-413-545-2921
> Email mash at econs.umass.edu
> http://people.umass.edu/maash
More information about the R-help
mailing list