[R] apply --> data.frame
William Dunlap
wdunlap at tibco.com
Fri Aug 31 20:38:52 CEST 2012
It is hard to help when you don't give an example of your input data
and what you want to be computed (in a form one can source or copy
into an R session). Is the following something like what you are doing?
Suppose you have a function that takes a file name and
returns a list of things of various types extracted from the
file. A toy example would be
fileExtract <- function(fileName) {
fi <- file.info(fileName)
byte0 <- if (fi$isdir || fi$size < 1) NA_integer_ else readBin(fileName, what="integer", size=1, n=1)
list(Name=basename(fileName), IsDir=fi$isdir, Size=fi$size, FirstByte = byte0, ModTime=fi$mtime)
}
Then you can get the list of rows that you want converted to a data.frame
with
rows <- lapply(dir(R.home(), full.names=TRUE), fileExtract)
E.g., I get
> dput(rows[1:2])
list(structure(list(Name = "bin", IsDir = TRUE, Size = 0, FirstByte = NA_integer_,
ModTime = structure(1343316337, class = c("POSIXct", "POSIXt"
))), .Names = c("Name", "IsDir", "Size", "FirstByte", "ModTime"
)), structure(list(Name = "CHANGES", IsDir = FALSE, Size = 28204,
FirstByte = 87L, ModTime = structure(1340406834, class = c("POSIXct",
"POSIXt"))), .Names = c("Name", "IsDir", "Size", "FirstByte",
"ModTime")))
Note that the j'th element of each row has a fixed type.
You want a data.frame with columns named "Name", "IsDir",
"Size", and "FirstByte" where the i'th row contains the data in row[[i]].
If that is what you want then here is a function that does a pretty good job of it:
function (listOfRows, nItemsPerRow = unique(vapply(listOfRows,
length, 0)), col.names = names(rowTemplate), rowTemplate = listOfRows[[1]],
...)
{
stopifnot(length(nItemsPerRow) == 1, nItemsPerRow == length(rowTemplate))
if (is.null(col.names)) {
col.names <- sprintf("V%d", seq_len(nItemsPerRow))
}
else {
stopifnot(nItemsPerRow == length(col.names))
}
columns <- lapply(structure(seq_len(nItemsPerRow), names = col.names),
FUN = function(i) {
v <- vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]])
if (is.matrix(v)) { # for when length(rowTemplate[[i]])>1
v <- t(v)
}
v
})
data.frame(columns, ...)
}
E.g.,
> str(f(rows))
'data.frame': 19 obs. of 5 variables:
$ Name : Factor w/ 19 levels "bin","CHANGES",..: 1 2 3 4 5 6 7 8 9 10 ...
$ IsDir : logi TRUE FALSE FALSE TRUE TRUE TRUE ...
$ Size : num 0 28204 18351 0 0 ...
$ FirstByte: int NA 87 9 NA NA NA NA 101 NA 82 ...
$ ModTime : num 1.34e+09 1.34e+09 1.34e+09 1.34e+09 1.34e+09 ...
Note that the POSIXct item, ModTime, got converted to numeric because
vapply didn't handle that class properly.
An advantage of vapply is that it will do some type checking:
> f(list(list(a=1,b=11), list(a=2,b="Twelve")))
Error in vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]]) :
values must be type 'double',
but FUN(X[[2]]) result is type 'character'
It will also deal with things like the following, where each row element
contains a few vectors and you want the each vector element in its
own column:
> str(f(list(list(1:2, 1+1i, letters[1:3]), list(11:12, 11+11i, letters[4:6]))))
'data.frame': 2 obs. of 6 variables:
$ V1.1: int 1 11
$ V1.2: int 2 12
$ V2 : cplx 1+1i 11+11i
$ V3.1: Factor w/ 2 levels "a","d": 1 2
$ V3.2: Factor w/ 2 levels "b","e": 1 2
$ V3.3: Factor w/ 2 levels "c","f": 1 2
There are other ways to do this, but I don't know if this is the problem
you want to solve.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Sam Steingold
> Sent: Friday, August 31, 2012 9:11 AM
> To: r-help at r-project.org; David Winsemius
> Subject: Re: [R] apply --> data.frame
>
> > * David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-08-30 10:14:34 -0700]:
> >
> >> str( as.data.frame( do.call(rbind, strsplit(c("a,1","b,2","c,3"),
> > ",") ) , stringsAsFactors=FALSE) )
> > 'data.frame': 3 obs. of 2 variables:
> > $ V1: chr "a" "b" "c"
> > $ V2: chr "1" "2" "3"
>
> do.call/rbind appeared to be TRT. I tried it and got a data frame with
> list columns (instead of vectors);
>
> as.data.frame(do.call(rbind,lapply(list.files(...), function (name) {
> ....
> c(name,list(num1,num2,num3), # num* come from some calculations above
> strsplit(sub("[^-]*(train|test)[^-]*(-(S)?pca([0-9]*))?-s([0-9]*)c([0-9.]*)\\.score",
> "\\1,\\3,\\4,\\5,\\6",name),",")[[1]])
> })), stringsAsFactors = FALSE)
>
> 'data.frame': 2 obs. of 8 variables:
> $ file :List of 2
> ..$ : chr "zzz_test_0531_0630-Spca181-s0c10.score"
> ..$ : chr "zzz_train_0531_0630-Spca181-s0c10.score"
> $ lift.quality:List of 2
> ..$ : num 0.59
> ..$ : num 0.621
> $ proficiency :List of 2
> ..$ : num 0.0472
> ..$ : num 0.0472
> $ set :List of 2
> ..$ : chr "test"
> ..$ : chr "train"
> $ scale :List of 2
> ..$ : chr "S"
> ..$ : chr "S"
> $ pca :List of 2
> ..$ : chr "181"
> ..$ : chr "181"
> $ s :List of 2
> ..$ : chr "0"
> ..$ : chr "0"
> $ c :List of 2
> ..$ : chr "10"
> ..$ : chr "10"
>
> I guess the easiest way is to replace c(...list()...) with c(...) but
> that would mean converting num1,num2,num3 to string and back which I
> want to avoid for aesthetic reasons. Any better suggestions?
>
> thanks a lot!
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://jihadwatch.org http://thereligionofpeace.com
> http://palestinefacts.org http://ffii.org http://pmw.org.il
> I don't have an attitude problem. You have a perception problem.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list