[R] Convert list to data frame while controlling column types
Alexander Shenkin
ashenkin at ufl.edu
Sun Aug 23 17:29:48 CEST 2009
On 8/23/2009 9:58 AM, David Winsemius wrote:
> I still have problems with this statement. As I understand R, this
should be impossible. I have looked at both you postings and neither of
them clarify the issues. How can you have blanks or spaces in an R
numeric vector?
Just because I search numeric columns doesn't mean that my regex matches
them! I posted some info on my data frame in an earlier email:
str(final_dataf)
'data.frame': 1127 obs. of 43 variables:
$ block : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
$ treatment : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 ...
$ transect : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
$ tag : chr NA "121AL" "122AL" "123AL" ...
...
$ h1 : num NA NA NA NA NA NA NA NA NA NA ...
...
You can see that I do indeed have some numeric columns. And while I
search them for spaces, I only do so because my dataset isn't so large
as to require me to exclude them from the search. If my dataset grows
too big at some point, I will exclude numeric columns, and other columns
which cannot hold blanks or spaces.
To clarify further with an example:
> df = data.frame(a=c(1,2,3,4,5),b=c("a","","c","d"," "))
> df = as.data.frame(lapply(df, function(x){ is.na(x) <-
+ grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
> df
a b
1 1 a
2 2 <NA>
3 3 c
4 4 d
5 5 <NA>
> str(df)
'data.frame': 5 obs. of 2 variables:
$ a: num 1 2 3 4 5
$ b: Factor w/ 5 levels ""," ","a","c",..: 3 NA 4 5 NA
And one final clarification: I left out "as.data.frame" in my previous
solution. So it now becomes:
> final_dataf = as.data.frame(lapply(final_dataf, function(x){ is.na(x)
+ <- grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
Hope that clarifies things, and thanks for your help.
Thanks,
Allie
On 8/23/2009 9:58 AM, David Winsemius wrote:
>
> On Aug 23, 2009, at 2:47 AM, Alexander Shenkin wrote:
>
>> On 8/21/2009 3:04 PM, David Winsemius wrote:
>>>
>>> On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
>>>
>>>> Thanks everyone for their replies, both on- and off-list. I should
>>>> clarify, since I left out some important information. My original
>>>> dataframe has some numeric columns, which get changed to character by
>>>> gsub when I replace spaces with NAs.
>>>
>>> If you used is.na() <- that would not happen to a true _numeric_ vector
>>> (but, of course, a numeric vector in a data.frame could not have spaces,
>>> so you are probably not using precise terminology).
>>
>> I do have true numeric columns, but I loop through my entire dataframe
>> looking for blanks and spaces for convenience.
>
> I still have problems with this statement. As I understand R, this
> should be impossible. I have looked at both you postings and neither of
> them clarify the issues. How can you have blanks or spaces in an R
> numeric vector?
>
>
>>
>>> You would be well
>>> advised to include the actual code rather than applying loose
>>> terminology subject you your and our misinterpretation.
>>
>> I did include code in my previous email. Perhaps you were looking for
>> different parts.
>>
>>>
>>> ?is.na
>>>
>>>
>>> I am guessing that you were using read.table() on the original data, in
>>> which case you should look at the colClasses parameter.
>>>
>>
>> yep - I use read.csv, and I do use colClasses. But as I mentioned
>> earlier, gsub converts those columns to characters. Thanks for the tip
>> about is.na() <-. I'm now using the following, thus side-stepping the
>> whole "controlling as.data.frame's column conversion" issue:
>>
>> final_dataf = lapply(final_dataf, function(x){ is.na(x) <-
>> + grep('^\\s*$',x); return(x) })
>
>
> Good that you have a solution.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
More information about the R-help
mailing list