[R] Convert list to data frame while controlling column types
Petr PIKAL
petr.pikal at precheza.cz
Mon Aug 24 09:06:27 CEST 2009
Hi
r-help-bounces at r-project.org napsal dne 23.08.2009 17:29:48:
> On 8/23/2009 9:58 AM, David Winsemius wrote:
> > I still have problems with this statement. As I understand R, this
> should be impossible. I have looked at both you postings and neither of
> them clarify the issues. How can you have blanks or spaces in an R
> numeric vector?
>
>
> Just because I search numeric columns doesn't mean that my regex matches
> them! I posted some info on my data frame in an earlier email:
>
> str(final_dataf)
> 'data.frame': 1127 obs. of 43 variables:
> $ block : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
> $ treatment : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 ...
> $ transect : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
> $ tag : chr NA "121AL" "122AL" "123AL" ...
> ...
> $ h1 : num NA NA NA NA NA NA NA NA NA NA ...
> ...
>
> You can see that I do indeed have some numeric columns. And while I
Well, AFAICS you have a data frame with 3 columns which are factors and 1
which is character. I do not see any numeric column. If you want to change
block and transect to numeric you can use
df$block <- as.numeric(as.character(df$block))
> search them for spaces, I only do so because my dataset isn't so large
> as to require me to exclude them from the search. If my dataset grows
> too big at some point, I will exclude numeric columns, and other columns
> which cannot hold blanks or spaces.
>
> To clarify further with an example:
>
> > df = data.frame(a=c(1,2,3,4,5),b=c("a","","c","d"," "))
> > df = as.data.frame(lapply(df, function(x){ is.na(x) <-
> + grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
> > df
> a b
> 1 1 a
> 2 2 <NA>
> 3 3 c
> 4 4 d
> 5 5 <NA>
which can be done also by
df[,2] <- levels(df[,2])[1:2]<-NA
but maybe with less generalization
> > str(df)
> 'data.frame': 5 obs. of 2 variables:
> $ a: num 1 2 3 4 5
> $ b: Factor w/ 5 levels ""," ","a","c",..: 3 NA 4 5 NA
>
> And one final clarification: I left out "as.data.frame" in my previous
> solution. So it now becomes:
>
> > final_dataf = as.data.frame(lapply(final_dataf, function(x){ is.na(x)
> + <- grep('^\\s*$',x); return(x) }), stringsAsFactors = FALSE)
Again not too much of clarification, in your first data frame second
column is a factor with some levels you want to convert to NA, which can
be done by different approaches.
Your final_dataf is same as df.
Columns which shall be numeric and are read as factor/character by
read.table likely contain some values which strictly can not be considered
numeric. You can see them quite often in Excel like programs and some
examples are
1..2, o.5, 12.o5 and or spaces, "-" e.t.c.
and you usually need handle them by hand.
Regards
Petr
>
> Hope that clarifies things, and thanks for your help.
>
> Thanks,
> Allie
>
>
> On 8/23/2009 9:58 AM, David Winsemius wrote:
> >
> > On Aug 23, 2009, at 2:47 AM, Alexander Shenkin wrote:
> >
> >> On 8/21/2009 3:04 PM, David Winsemius wrote:
> >>>
> >>> On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
> >>>
> >>>> Thanks everyone for their replies, both on- and off-list. I should
> >>>> clarify, since I left out some important information. My original
> >>>> dataframe has some numeric columns, which get changed to character
by
> >>>> gsub when I replace spaces with NAs.
> >>>
> >>> If you used is.na() <- that would not happen to a true _numeric_
vector
> >>> (but, of course, a numeric vector in a data.frame could not have
spaces,
> >>> so you are probably not using precise terminology).
> >>
> >> I do have true numeric columns, but I loop through my entire
dataframe
> >> looking for blanks and spaces for convenience.
> >
> > I still have problems with this statement. As I understand R, this
> > should be impossible. I have looked at both you postings and neither
of
> > them clarify the issues. How can you have blanks or spaces in an R
> > numeric vector?
> >
> >
> >>
> >>> You would be well
> >>> advised to include the actual code rather than applying loose
> >>> terminology subject you your and our misinterpretation.
> >>
> >> I did include code in my previous email. Perhaps you were looking
for
> >> different parts.
> >>
> >>>
> >>> ?is.na
> >>>
> >>>
> >>> I am guessing that you were using read.table() on the original data,
in
> >>> which case you should look at the colClasses parameter.
> >>>
> >>
> >> yep - I use read.csv, and I do use colClasses. But as I mentioned
> >> earlier, gsub converts those columns to characters. Thanks for the
tip
> >> about is.na() <-. I'm now using the following, thus side-stepping
the
> >> whole "controlling as.data.frame's column conversion" issue:
> >>
> >> final_dataf = lapply(final_dataf, function(x){ is.na(x) <-
> >> + grep('^\\s*$',x); return(x) })
> >
> >
> > Good that you have a solution.
> >
> > David Winsemius, MD
> > Heritage Laboratories
> > West Hartford, CT
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list