[R] as.data.frame(do.call(rbind, lapply)) produces something weird
William Dunlap
wdunlap at tibco.com
Fri Nov 9 22:24:51 CET 2012
Note that the column-wise conversion I suggested might be better
done on the matrix R before conversion to a data.frame. E.g.
> R <- rbind(
list(Letter="a", Integer=1L, Complex=1+1i),
list(Letter="b", Integer=2L, Complex=2+2i))
> Rconverted <- lapply(structure(seq_len(ncol(R)), names=colnames(R)), function(i)as(R[,i], class(R[[1,i]])))
> str(data.frame(Rconverted)) # use stringsAsFactors=FALSE if you like
'data.frame': 2 obs. of 3 variables:
$ Letter : Factor w/ 2 levels "a","b": 1 2
$ Integer: int 1 2
$ Complex: cplx 1+1i 2+2i
In any case, a long list will use a lot of memory.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of William Dunlap
> Sent: Friday, November 09, 2012 12:24 PM
> To: sds at gnu.org; r-help at r-project.org
> Subject: Re: [R] as.data.frame(do.call(rbind, lapply)) produces something weird
>
> Your call to rbind() creates matrix of mode "list". Thus every element
> can be of a different type, although you "know" that there is a pattern
> to the types. E.g.,
> > R <- rbind(
> list(Letter="a", Integer=1L, Complex=1+1i),
> list(Letter="b", Integer=2L, Complex=2+2i))
> > str(R)
> List of 6
> $ : chr "a"
> $ : chr "b"
> $ : int 1
> $ : int 2
> $ : cplx 1+1i
> $ : cplx 2+2i
> - attr(*, "dim")= int [1:2] 2 3
> - attr(*, "dimnames")=List of 2
> ..$ : NULL
> ..$ : chr [1:3] "Letter" "Integer" "Complex"
>
> data.frame(R), since R is a matrix, will make a data.frame containing
> the columns of R. It does not decide that since each column is a list
> that it should what data.frame(list(...)) would do, it just sticks those
> columns, as is, into the data.frame that it creates:
>
> > Rdf <- data.frame(R)
> > str(Rdf)
> 'data.frame': 2 obs. of 3 variables:
> $ Letter :List of 2
> ..$ : chr "a"
> ..$ : chr "b"
> $ Integer:List of 2
> ..$ : int 1
> ..$ : int 2
> $ Complex:List of 2
> ..$ : cplx 1+1i
> ..$ : cplx 2+2i
>
> You can convert those columns to their "natural" type, at least
> the type of their first element, with
>
> > for(i in seq_along(Rdf)) Rdf[[i]] <- as(Rdf[[i]], class(Rdf[[i]][[1]]))
> > str(Rdf)
> 'data.frame': 2 obs. of 3 variables:
> $ Letter : chr "a" "b"
> $ Integer: int 1 2
> $ Complex: cplx 1+1i 2+2i
>
> Note the as(list(...), atomicType) does the conversion if every element
> of list(...) has length 1 and throws an error otherwise. That is probably
> a good check in this case. unlist() would give the same result, perhaps
> more quickly, if the list has the structure you expect but would silently
> give bad results if some element of the list did not have length one.
>
> Is that what you are looking for?
>
> Note that storing things in a list takes a lot more memory than storing
> them as atomic vectors so your technique may not scale up very well.
> > object.size(as.list(1:1e6)) / object.size(1:1e6)
> 13.9998700013 bytes
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> > Of Sam Steingold
> > Sent: Friday, November 09, 2012 11:22 AM
> > To: r-help at r-project.org
> > Subject: [R] as.data.frame(do.call(rbind,lapply)) produces something weird
> >
> > The following code:
> > --8<---------------cut here---------------start------------->8---
> > > myfun <- function (x) list(x=x,y=x*x)
> > > z <- as.data.frame(do.call(rbind,lapply(1:3,function(x)
> > c(a=paste("a",x,sep=""),as.list(unlist(list(b=myfun(x),c=myfun(x*x*x))))))))
> > > z
> > a b.x b.y c.x c.y
> > 1 a1 1 1 1 1
> > 2 a2 2 4 8 64
> > 3 a3 3 9 27 729
> > --8<---------------cut here---------------end--------------->8---
> > the appearance of z is good, but str() and summary betray some weirdness:
> > --8<---------------cut here---------------start------------->8---
> > > str(z)
> > 'data.frame': 3 obs. of 5 variables:
> > $ a :List of 3
> > ..$ : chr "a1"
> > ..$ : chr "a2"
> > ..$ : chr "a3"
> > $ b.x:List of 3
> > ..$ : int 1
> > ..$ : int 2
> > ..$ : int 3
> > $ b.y:List of 3
> > ..$ : int 1
> > ..$ : int 4
> > ..$ : int 9
> > $ c.x:List of 3
> > ..$ : int 1
> > ..$ : int 8
> > ..$ : int 27
> > $ c.y:List of 3
> > ..$ : int 1
> > ..$ : int 64
> > ..$ : int 729
> > --8<---------------cut here---------------end--------------->8---
> > how do I ensure that the columns of z are vectors, as in
> > --8<---------------cut here---------------start------------->8---
> > > z <-
> > data.frame(a=c("a1","a2","a3"),b.x=c(1,2,3),b.y=c(1,4,9),c.x=c(1,8,27),c.y=c(1,64,729))
> > > z
> > a b.x b.y c.x c.y
> > 1 a1 1 1 1 1
> > 2 a2 2 4 8 64
> > 3 a3 3 9 27 729
> > > str(z)
> > 'data.frame': 3 obs. of 5 variables:
> > $ a : Factor w/ 3 levels "a1","a2","a3": 1 2 3
> > $ b.x: num 1 2 3
> > $ b.y: num 1 4 9
> > $ c.x: num 1 8 27
> > $ c.y: num 1 64 729
> > --8<---------------cut here---------------end--------------->8---
> > thanks!
> >
> > --
> > Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> > http://www.childpsy.net/ http://jihadwatch.org http://think-israel.org
> > http://www.PetitionOnline.com/tap12009/ http://honestreporting.com
> > Programming is like sex: one mistake and you have to support it for a lifetime.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list