[R] as.data.frame(do.call(rbind, lapply)) produces something weird

William Dunlap wdunlap at tibco.com
Fri Nov 9 22:24:51 CET 2012


Note that the column-wise conversion I suggested might be better
done on the matrix R before conversion to a data.frame.  E.g.

> R <- rbind(
           list(Letter="a", Integer=1L, Complex=1+1i),
           list(Letter="b", Integer=2L, Complex=2+2i))
> Rconverted <- lapply(structure(seq_len(ncol(R)), names=colnames(R)), function(i)as(R[,i], class(R[[1,i]])))
> str(data.frame(Rconverted)) # use stringsAsFactors=FALSE if you like
'data.frame':   2 obs. of  3 variables:
 $ Letter : Factor w/ 2 levels "a","b": 1 2
 $ Integer: int  1 2
 $ Complex: cplx  1+1i 2+2i

In any case, a long list will use a lot of memory.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of William Dunlap
> Sent: Friday, November 09, 2012 12:24 PM
> To: sds at gnu.org; r-help at r-project.org
> Subject: Re: [R] as.data.frame(do.call(rbind, lapply)) produces something weird
> 
> Your call to rbind() creates matrix of mode "list".  Thus every element
> can be of a different type, although you "know" that there is a pattern
> to the types.  E.g.,
>   > R <- rbind(
>           list(Letter="a", Integer=1L, Complex=1+1i),
>           list(Letter="b", Integer=2L, Complex=2+2i))
>   > str(R)
>   List of 6
>    $ : chr "a"
>    $ : chr "b"
>    $ : int 1
>    $ : int 2
>    $ : cplx 1+1i
>    $ : cplx 2+2i
>    - attr(*, "dim")= int [1:2] 2 3
>    - attr(*, "dimnames")=List of 2
>     ..$ : NULL
>     ..$ : chr [1:3] "Letter" "Integer" "Complex"
> 
> data.frame(R), since R is a matrix, will make a data.frame containing
> the columns of R.  It does not decide that since each column is a list
> that it should what data.frame(list(...)) would do, it just sticks those
> columns, as is, into the data.frame that it creates:
> 
>   > Rdf <- data.frame(R)
>   > str(Rdf)
>   'data.frame':   2 obs. of  3 variables:
>    $ Letter :List of 2
>     ..$ : chr "a"
>     ..$ : chr "b"
>    $ Integer:List of 2
>     ..$ : int 1
>     ..$ : int 2
>    $ Complex:List of 2
>     ..$ : cplx 1+1i
>     ..$ : cplx 2+2i
> 
> You can convert those columns to their "natural" type, at least
> the type of their first element, with
> 
>   > for(i in seq_along(Rdf)) Rdf[[i]] <- as(Rdf[[i]], class(Rdf[[i]][[1]]))
>   > str(Rdf)
>   'data.frame':   2 obs. of  3 variables:
>    $ Letter : chr  "a" "b"
>    $ Integer: int  1 2
>    $ Complex: cplx  1+1i 2+2i
> 
> Note the as(list(...), atomicType) does the conversion if every element
> of list(...) has length 1 and throws an error otherwise.  That is probably
> a good check in this case.  unlist() would give the same result, perhaps
> more quickly, if the list has the structure you expect but would silently
> give bad results if some element of the list did not have length one.
> 
> Is that what you are looking for?
> 
> Note that storing things in a list takes a lot more memory  than storing
> them as atomic vectors so your technique may not scale up very well.
>   > object.size(as.list(1:1e6)) / object.size(1:1e6)
>   13.9998700013 bytes
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> > Of Sam Steingold
> > Sent: Friday, November 09, 2012 11:22 AM
> > To: r-help at r-project.org
> > Subject: [R] as.data.frame(do.call(rbind,lapply)) produces something weird
> >
> > The following code:
> > --8<---------------cut here---------------start------------->8---
> > > myfun <- function (x) list(x=x,y=x*x)
> > > z <- as.data.frame(do.call(rbind,lapply(1:3,function(x)
> > c(a=paste("a",x,sep=""),as.list(unlist(list(b=myfun(x),c=myfun(x*x*x))))))))
> > > z
> >    a b.x b.y c.x c.y
> > 1 a1   1   1   1   1
> > 2 a2   2   4   8  64
> > 3 a3   3   9  27 729
> > --8<---------------cut here---------------end--------------->8---
> > the appearance of z is good, but str() and summary betray some weirdness:
> > --8<---------------cut here---------------start------------->8---
> > > str(z)
> > 'data.frame':	3 obs. of  5 variables:
> >  $ a  :List of 3
> >   ..$ : chr "a1"
> >   ..$ : chr "a2"
> >   ..$ : chr "a3"
> >  $ b.x:List of 3
> >   ..$ : int 1
> >   ..$ : int 2
> >   ..$ : int 3
> >  $ b.y:List of 3
> >   ..$ : int 1
> >   ..$ : int 4
> >   ..$ : int 9
> >  $ c.x:List of 3
> >   ..$ : int 1
> >   ..$ : int 8
> >   ..$ : int 27
> >  $ c.y:List of 3
> >   ..$ : int 1
> >   ..$ : int 64
> >   ..$ : int 729
> > --8<---------------cut here---------------end--------------->8---
> > how do I ensure that the columns of z are vectors, as in
> > --8<---------------cut here---------------start------------->8---
> > > z <-
> > data.frame(a=c("a1","a2","a3"),b.x=c(1,2,3),b.y=c(1,4,9),c.x=c(1,8,27),c.y=c(1,64,729))
> > > z
> >    a b.x b.y c.x c.y
> > 1 a1   1   1   1   1
> > 2 a2   2   4   8  64
> > 3 a3   3   9  27 729
> > > str(z)
> > 'data.frame':	3 obs. of  5 variables:
> >  $ a  : Factor w/ 3 levels "a1","a2","a3": 1 2 3
> >  $ b.x: num  1 2 3
> >  $ b.y: num  1 4 9
> >  $ c.x: num  1 8 27
> >  $ c.y: num  1 64 729
> > --8<---------------cut here---------------end--------------->8---
> > thanks!
> >
> > --
> > Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> > http://www.childpsy.net/ http://jihadwatch.org http://think-israel.org
> > http://www.PetitionOnline.com/tap12009/ http://honestreporting.com
> > Programming is like sex: one mistake and you have to support it for a lifetime.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list