[R] Convert list to data frame while controlling column types

Petr PIKAL petr.pikal at precheza.cz
Mon Aug 24 09:06:27 CEST 2009


Hi

r-help-bounces at r-project.org napsal dne 23.08.2009 17:29:48:

> On 8/23/2009 9:58 AM, David Winsemius wrote:
> > I still have problems with this statement. As I understand R, this
> should be impossible. I have looked at both you postings and neither of
> them clarify the issues. How can you have blanks or spaces in an R
> numeric vector?
> 
> 
> Just because I search numeric columns doesn't mean that my regex matches
> them!  I posted some info on my data frame in an earlier email:
> 
>     str(final_dataf)
>     'data.frame':   1127 obs. of  43 variables:
>      $ block      : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
>      $ treatment  : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 ...
>      $ transect   : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
>      $ tag        : chr  NA "121AL" "122AL" "123AL" ...
>     ...
>      $ h1         : num  NA NA NA NA NA NA NA NA NA NA ...
>     ...
> 
> You can see that I do indeed have some numeric columns.  And while I

Well, AFAICS you have a data frame with 3 columns which are factors and 1 
which is character. I do not see any numeric column. If you want to change 
block and transect to numeric you can use

df$block <- as.numeric(as.character(df$block))


> search them for spaces, I only do so because my dataset isn't so large
> as to require me to exclude them from the search.  If my dataset grows
> too big at some point, I will exclude numeric columns, and other columns
> which cannot hold blanks or spaces.
> 
> To clarify further with an example:
> 
> > df = data.frame(a=c(1,2,3,4,5),b=c("a","","c","d"," "))
> > df = as.data.frame(lapply(df, function(x){ is.na(x) <-
> + grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
> > df
>   a    b
> 1 1    a
> 2 2 <NA>
> 3 3    c
> 4 4    d
> 5 5 <NA>

which can be done also by
df[,2] <- levels(df[,2])[1:2]<-NA

but maybe with less generalization


> > str(df)
> 'data.frame':   5 obs. of  2 variables:
>  $ a: num  1 2 3 4 5
>  $ b: Factor w/ 5 levels ""," ","a","c",..: 3 NA 4 5 NA
> 
> And one final clarification: I left out "as.data.frame" in my previous
> solution.  So it now becomes:
> 
> > final_dataf = as.data.frame(lapply(final_dataf, function(x){ is.na(x)
> + <- grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)

Again not too much of clarification, in your first data frame second 
column is a factor with some levels you want to convert to NA, which can 
be done by different approaches.

Your final_dataf is same as df.

Columns which shall be numeric and are read as factor/character by 
read.table likely contain some values which strictly can not be considered 
numeric. You can see them quite often in Excel like programs and some 
examples are

1..2, o.5, 12.o5 and or spaces, "-" e.t.c.

and you usually need handle them by hand.

Regards
Petr

> 
> Hope that clarifies things, and thanks for your help.
> 
> Thanks,
> Allie
> 
> 
> On 8/23/2009 9:58 AM, David Winsemius wrote:
> > 
> > On Aug 23, 2009, at 2:47 AM, Alexander Shenkin wrote:
> > 
> >> On 8/21/2009 3:04 PM, David Winsemius wrote:
> >>>
> >>> On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
> >>>
> >>>> Thanks everyone for their replies, both on- and off-list.  I should
> >>>> clarify, since I left out some important information.  My original
> >>>> dataframe has some numeric columns, which get changed to character 
by
> >>>> gsub when I replace spaces with NAs.
> >>>
> >>> If you used is.na() <-  that would not happen to a true _numeric_ 
vector
> >>> (but, of course, a numeric vector in a data.frame could not have 
spaces,
> >>> so you are probably not using precise terminology).
> >>
> >> I do have true numeric columns, but I loop through my entire 
dataframe
> >> looking for blanks and spaces for convenience.
> > 
> > I still have problems with this statement. As I understand R, this
> > should be impossible. I have looked at both you postings and neither 
of
> > them clarify the issues. How can you have blanks or spaces in an R
> > numeric vector?
> > 
> > 
> >>
> >>> You would be well
> >>> advised to include the actual code rather than applying loose
> >>> terminology subject you your and our misinterpretation.
> >>
> >> I did include code in my previous email.  Perhaps you were looking 
for
> >> different parts.
> >>
> >>>
> >>> ?is.na
> >>>
> >>>
> >>> I am guessing that you were using read.table() on the original data, 
in
> >>> which case you should look at the colClasses parameter.
> >>>
> >>
> >> yep - I use read.csv, and I do use colClasses.  But as I mentioned
> >> earlier, gsub converts those columns to characters.  Thanks for the 
tip
> >> about is.na() <-.  I'm now using the following, thus side-stepping 
the
> >> whole "controlling as.data.frame's column conversion" issue:
> >>
> >> final_dataf = lapply(final_dataf, function(x){ is.na(x) <-
> >> + grep('^\\s*$',x); return(x) })
> > 
> > 
> > Good that you have a solution.
> > 
> > David Winsemius, MD
> > Heritage Laboratories
> > West Hartford, CT
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list