[R] as.data.frame.table() to convert by() output to a data frame

David Winsemius dwinsemius at comcast.net
Thu Nov 26 06:27:47 CET 2009


On Nov 26, 2009, at 12:17 AM, David Winsemius wrote:

>
> On Nov 25, 2009, at 9:54 PM, Michael Ash wrote:
>
>> I remain confused by the difference between
>>
>> library(MASS)
>> data(Cars93)
>>
>> as
>> .data
>> .frame
>> (tapply
>> (Cars93
>> $Price,list(Cars93$Origin,Cars93$AirBags,Cars93$Passengers),median))
>> as
>> .data
>> .frame
>> .table
>> (tapply
>> (Cars93
>> $Price,list(Cars93$Origin,Cars93$AirBags,Cars93$Passengers),median))
>>
>
> The display may not make it clear, but applying str() to both should  
> make it clear that the first is similar to what one might get with  
> cbind'ing the results of the inner function (which would not work in  
> this case on a list object).

I mis-wrote... tapply returned an array, although not one that would  
allow simple cbinding. Ths is the code from as.data.frame.array, x  
being the passed object:
dn <- dimnames(x)
         dim(x) <- c(d[1L], prod(d[-1L]))
         if (!is.null(dn)) {
             if (length(dn[[1L]]))
                 rownames(x) <- dn[[1L]]
             for (i in 2L:length(d)) if (is.null(dn[[i]]))
                 dn[[i]] <- seq_len(d[i])
             colnames(x) <- interaction(expand.grid(dn[-1L]))
         }
         as.data.frame.matrix(x, row.names, optional, ...)

> You get two rows of 18 variables all medians or NA, while the second  
> is the unique combinations of the (USA/ 
> nonUSA)*(Passengers)*(Airbags)*(Passengers) cross and their  
> associated medians which as.data.frame has labeled "Freq" since that  
> is the usual element of a contingency table. I think of  
> as.data.frame.table as simply another way of accomplishing  
> as.data.frame(table()). Is that not how you were intending it?
>
>>
>> I clearly want the latter, but that's not clear from the  
>> documentation.
>
> Sometimes it is helpful to look at the code as well:
>
> > as.data.frame.table
> function (x, row.names = NULL, ..., responseName = "Freq",  
> stringsAsFactors = TRUE)
> {
>    x <- as.table(x)
>    ex <- quote(data.frame(do.call("expand.grid", c(dimnames(x),
>        stringsAsFactors = stringsAsFactors)), Freq = c(x), row.names  
> = row.names))
>    names(ex)[3L] <- responseName
>    eval(ex)
> }
>
> -- 
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>>
>>
>> Best,
>> Michael
>>
>>
>> On Wed, Nov 25, 2009 at 5:55 PM, David Winsemius <dwinsemius at comcast.net 
>> > wrote:
>>>
>>> On Nov 25, 2009, at 4:11 PM, Michael Ash wrote:
>>>
>>>> Dear all,
>>>>
>>>> This seems to be working, but I'd like to make sure that I'm not  
>>>> doing
>>>> anything wrong.
>>>>
>>>> I am using by() to construct a complicated summary statistic by
>>>> several factors in my data (specifically, the 90-50 income ratio by
>>>> city and race).
>>>>
>>>> cityrace.by <- by(microdata, list(microdata$city,microdata$race),
>>>> function (x) quantile(x$income, probs=0.9) / quantile(x$income,
>>>> probs=0.5) )
>>>>
>>>> I would now like to use the data created by by() as a dataset with
>>>> city-race as the unit of observation.
>>>>
>>>> However, cityrace.data <- as.data.frame(cityrace.by) does not  
>>>> work because
>>>> "Error in as.data.frame.default(city.by) :
>>>> cannot coerce class "by" into a data.frame"
>>>>
>>>> The following is not a documented use of as.data.frame.table(),  
>>>> but it
>>>> seems to work.  It gives the columns slightly strange names,  
>>>> including
>>>> "Freq" for the statistic computed in by by() but otherwise, the
>>>> dataframe is indexed by city and race with the 90-50 ratio as the
>>>> variable
>>>>
>>>> cityrace.data <- as.data.frame.table(cityrace.by)
>>>
>>> If the by-object you get happens to be a 2d array, then why not.  
>>> Tables are
>>> matrices after all.
>>>
>>>> tt <- table(c(1,1), c(1,1))
>>>> tt
>>>
>>>   1
>>> 1 2
>>>> is.matrix(tt)
>>> [1] TRUE
>>>
>>>> --
>>>
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>
>> -- 
>> Michael Ash, Associate Professor
>> of Economics and Public Policy
>> Department of Economics and CPPA
>> University of Massachusetts
>> Amherst, MA 01003
>> Tel +1-413-545-6329 Fax +1-413-545-2921
>> Email mash at econs.umass.edu
>> http://people.umass.edu/maash
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list