[R] Convert list to data frame while controlling column types

Alexander Shenkin ashenkin at ufl.edu
Sun Aug 23 17:29:48 CEST 2009


On 8/23/2009 9:58 AM, David Winsemius wrote:
> I still have problems with this statement. As I understand R, this
should be impossible. I have looked at both you postings and neither of
them clarify the issues. How can you have blanks or spaces in an R
numeric vector?


Just because I search numeric columns doesn't mean that my regex matches
them!  I posted some info on my data frame in an earlier email:

    str(final_dataf)
    'data.frame':   1127 obs. of  43 variables:
     $ block      : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
     $ treatment  : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 ...
     $ transect   : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
     $ tag        : chr  NA "121AL" "122AL" "123AL" ...
    ...
     $ h1         : num  NA NA NA NA NA NA NA NA NA NA ...
    ...

You can see that I do indeed have some numeric columns.  And while I
search them for spaces, I only do so because my dataset isn't so large
as to require me to exclude them from the search.  If my dataset grows
too big at some point, I will exclude numeric columns, and other columns
which cannot hold blanks or spaces.

To clarify further with an example:

> df = data.frame(a=c(1,2,3,4,5),b=c("a","","c","d"," "))
> df = as.data.frame(lapply(df, function(x){ is.na(x) <-
+ grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
> df
  a    b
1 1    a
2 2 <NA>
3 3    c
4 4    d
5 5 <NA>
> str(df)
'data.frame':   5 obs. of  2 variables:
 $ a: num  1 2 3 4 5
 $ b: Factor w/ 5 levels ""," ","a","c",..: 3 NA 4 5 NA

And one final clarification: I left out "as.data.frame" in my previous
solution.  So it now becomes:

> final_dataf = as.data.frame(lapply(final_dataf, function(x){ is.na(x)
+ <- grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)

Hope that clarifies things, and thanks for your help.

Thanks,
Allie


On 8/23/2009 9:58 AM, David Winsemius wrote:
> 
> On Aug 23, 2009, at 2:47 AM, Alexander Shenkin wrote:
> 
>> On 8/21/2009 3:04 PM, David Winsemius wrote:
>>>
>>> On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
>>>
>>>> Thanks everyone for their replies, both on- and off-list.  I should
>>>> clarify, since I left out some important information.  My original
>>>> dataframe has some numeric columns, which get changed to character by
>>>> gsub when I replace spaces with NAs.
>>>
>>> If you used is.na() <-  that would not happen to a true _numeric_ vector
>>> (but, of course, a numeric vector in a data.frame could not have spaces,
>>> so you are probably not using precise terminology).
>>
>> I do have true numeric columns, but I loop through my entire dataframe
>> looking for blanks and spaces for convenience.
> 
> I still have problems with this statement. As I understand R, this
> should be impossible. I have looked at both you postings and neither of
> them clarify the issues. How can you have blanks or spaces in an R
> numeric vector?
> 
> 
>>
>>> You would be well
>>> advised to include the actual code rather than applying loose
>>> terminology subject you your and our misinterpretation.
>>
>> I did include code in my previous email.  Perhaps you were looking for
>> different parts.
>>
>>>
>>> ?is.na
>>>
>>>
>>> I am guessing that you were using read.table() on the original data, in
>>> which case you should look at the colClasses parameter.
>>>
>>
>> yep - I use read.csv, and I do use colClasses.  But as I mentioned
>> earlier, gsub converts those columns to characters.  Thanks for the tip
>> about is.na() <-.  I'm now using the following, thus side-stepping the
>> whole "controlling as.data.frame's column conversion" issue:
>>
>> final_dataf = lapply(final_dataf, function(x){ is.na(x) <-
>> + grep('^\\s*$',x); return(x) })
> 
> 
> Good that you have a solution.
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>




More information about the R-help mailing list