[R] Convert list to data frame while controlling column types
David Winsemius
dwinsemius at comcast.net
Fri Aug 21 22:04:00 CEST 2009
On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
> Thanks everyone for their replies, both on- and off-list. I should
> clarify, since I left out some important information. My original
> dataframe has some numeric columns, which get changed to character by
> gsub when I replace spaces with NAs.
If you used is.na() <- that would not happen to a true _numeric_
vector (but, of course, a numeric vector in a data.frame could not
have spaces, so you are probably not using precise terminology). You
would be well advised to include the actual code rather than applying
loose terminology subject you your and our misinterpretation.
?is.na
I am guessing that you were using read.table() on the original data,
in which case you should look at the colClasses parameter.
--
David Winsemius
> Thus, in going back to a
> dataframe, those (now character) columns get converted to factors. I
> recently added stringsAsFactors to get characters to make things a bit
> easier. I wrote the column-type reset function below, but it feels
> kludgey, so was wondering if there was some other way to specify how
> one
> might want as.data.frame to handle the columns.
>
> str(final_dataf)
> 'data.frame': 1127 obs. of 43 variables:
> $ block : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
> $ treatment : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 1 1 1
> 1 ...
> $ transect : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
> $ tag : chr NA "121AL" "122AL" "123AL" ...
> ...
> $ h1 : num NA NA NA NA NA NA NA NA NA NA ...
> ...
>
> reset_col_types <- function (df, col_types) {
> # Function to reset column types in dataframes. col_types can be
> constructed
> # by using lapply(class,df)
>
> coerce_fun = list (
> "character" = `as.character`,
> "factor" = `as.factor`,
> "numeric" = `as.numeric`,
> "integer" = `as.integer`,
> "POSIXct" = `as.POSIXct`,
> "logical" = `as.logical` )
>
> for (i in 1:length(df)) {
> df[,i] = coerce_fun[[ col_types[i] ]]( df[,i] ) #apply coerce
> function
> }
> return(df)
> }
>
> col_types = lapply(final_dataf, class)
> col_types = lapply(col_types, function(x) x[length(x)]) # for posix,
> take the more specified class
> names(col_types)=NULL
> col_types = unlist(col_types)
>
> final_dataf = as.data.frame(lapply(final_dataf, function(x)
> gsub('^\\s*$',NA,x)), stringsAsFactors = FALSE)
> final_dataf = reset_col_types(final_dataf, col_types)
>
> Thanks,
> Allie
>
>
> On 8/21/2009 10:54 AM, Steve Lianoglou wrote:
>> Hi Allie,
>>
>> On Aug 21, 2009, at 11:47 AM, Alexander Shenkin wrote:
>>
>>> Hello all,
>>>
>>> I have a list which I'd like to convert to a data frame, while
>>> maintaining control of the columns' data types (akin to the
>>> colClasses
>>> argument in read.table). My numeric columns, for example, are
>>> getting
>>> converted to factors by as.data.frame. Is there a way to do this,
>>> or
>>> will I have to do as I am doing right now: allow as.data.frame to
>>> coerce
>>> column-types as it sees fit, and then convert them back manually?
>>
>> This doesn't sound right ... are there characters buried in your
>> numeric columns somewhere that might be causing this?
>>
>> I'm pretty sure this shouldn't happen, and a small test case here
>> goes
>> along with my intuition:
>>
>> R> a <- list(a=1:10, b=rnorm(10), c=LETTERS[1:10])
>> R> df <- as.data.frame(a)
>> R> sapply(df, is.factor)
>> a b c
>> FALSE FALSE TRUE
>>
>> Can you check to see if your data's wonky somehow?
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list