[R] Convert list to data frame while controlling column types
Alexander Shenkin
ashenkin at ufl.edu
Fri Aug 21 21:41:43 CEST 2009
Thanks everyone for their replies, both on- and off-list. I should
clarify, since I left out some important information. My original
dataframe has some numeric columns, which get changed to character by
gsub when I replace spaces with NAs. Thus, in going back to a
dataframe, those (now character) columns get converted to factors. I
recently added stringsAsFactors to get characters to make things a bit
easier. I wrote the column-type reset function below, but it feels
kludgey, so was wondering if there was some other way to specify how one
might want as.data.frame to handle the columns.
str(final_dataf)
'data.frame': 1127 obs. of 43 variables:
$ block : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
$ treatment : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 1 1 1 1 ...
$ transect : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
$ tag : chr NA "121AL" "122AL" "123AL" ...
...
$ h1 : num NA NA NA NA NA NA NA NA NA NA ...
...
reset_col_types <- function (df, col_types) {
# Function to reset column types in dataframes. col_types can be
constructed
# by using lapply(class,df)
coerce_fun = list (
"character" = `as.character`,
"factor" = `as.factor`,
"numeric" = `as.numeric`,
"integer" = `as.integer`,
"POSIXct" = `as.POSIXct`,
"logical" = `as.logical` )
for (i in 1:length(df)) {
df[,i] = coerce_fun[[ col_types[i] ]]( df[,i] ) #apply coerce
function
}
return(df)
}
col_types = lapply(final_dataf, class)
col_types = lapply(col_types, function(x) x[length(x)]) # for posix,
take the more specified class
names(col_types)=NULL
col_types = unlist(col_types)
final_dataf = as.data.frame(lapply(final_dataf, function(x)
gsub('^\\s*$',NA,x)), stringsAsFactors = FALSE)
final_dataf = reset_col_types(final_dataf, col_types)
Thanks,
Allie
On 8/21/2009 10:54 AM, Steve Lianoglou wrote:
> Hi Allie,
>
> On Aug 21, 2009, at 11:47 AM, Alexander Shenkin wrote:
>
>> Hello all,
>>
>> I have a list which I'd like to convert to a data frame, while
>> maintaining control of the columns' data types (akin to the colClasses
>> argument in read.table). My numeric columns, for example, are getting
>> converted to factors by as.data.frame. Is there a way to do this, or
>> will I have to do as I am doing right now: allow as.data.frame to coerce
>> column-types as it sees fit, and then convert them back manually?
>
> This doesn't sound right ... are there characters buried in your
> numeric columns somewhere that might be causing this?
>
> I'm pretty sure this shouldn't happen, and a small test case here goes
> along with my intuition:
>
> R> a <- list(a=1:10, b=rnorm(10), c=LETTERS[1:10])
> R> df <- as.data.frame(a)
> R> sapply(df, is.factor)
> a b c
> FALSE FALSE TRUE
>
> Can you check to see if your data's wonky somehow?
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
More information about the R-help
mailing list