[R] Trim trailng space from data.frame factor variables
Marc Schwartz
marc_schwartz at comcast.net
Thu Aug 16 18:29:57 CEST 2007
The easiest way might be to modify the lapply() call as follows:
d[] <- lapply(d, function(x) if (is.factor(x)) factor(sub(" +$", "", x)) else x)
> str(d)
'data.frame': 60 obs. of 3 variables:
$ x: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ y: num 7.01 8.33 5.48 6.51 5.61 ...
$ f: Factor w/ 3 levels "lev1","lev2",..: 1 1 1 1 1 1 1 1 1 1 ...
This way the coercion back to a factor takes place within the loop as
needed.
Note that I also meant to type sub() and not grep() below. The default
behavior for both is to return a character vector (if 'value = TRUE' in
grep()). There is not an argument to override that behavior.
HTH,
Marc
On Thu, 2007-08-16 at 19:19 +0300, Lauri Nikkinen wrote:
> Thanks Marc! What would be the easiest way to coerce char-variables
> back to factor-variables? Is there a way to prevent the coercion in
> d[] <- lapply(d, function(x) if ( is.factor(x)) sub(" +$", "", x) else
> x) ?
>
>
>
> -Lauri
>
>
>
> 2007/8/16, Marc Schwartz <marc_schwartz at comcast.net>:
> On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
> > Hi folks,
> >
> > I would like to trim the trailing spaces in my factor
> variables using lapply
> > (described in this post by Marc Schwartz:
> > http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html)
> but the code is
> > not functioning (in this example there is only one factor
> with trailing
> > spaces):
>
> Ayep....as I noted in that post, it was untested....my error.
>
> The problem is that by using ifelse() as I did, the test for
> the column
> being a factor returns a single result, not one result per
> element.
> Hence, the appropriate conditional code is only performed on
> the first
> element in each column, rather than being vectorized on the
> entire
> column.
>
> > y1 <- rnorm(20) + 6.8
> > y2 <- rnorm(20) + (1:20* 1.7 + 1)
> > y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> > y <- c(y1,y2,y3)
> > x <- gl(5,12)
> > f <- gl(3,20, labels=paste("lev", 1:3, " ", sep=""))
> > d <- data.frame (x=x,y=y, f=f)
> > str(d)
> >
> > d[] <- lapply(d, function(x) ifelse(is.factor(x), sub(" +$",
> "", x), x))
> > str(d)
> >
> > How should I modify this?
>
> Try this instead:
>
> d[] <- lapply(d, function(x) if (is.factor(x)) sub(" +$", "",
> x) else x)
>
> > str(d)
> 'data.frame': 60 obs. of 3 variables:
> $ x: chr "1" "1" "1" "1" ...
> $ y: num 6.70 4.42 8.03 4.90 6.98 ...
> $ f: chr "lev1" "lev1" "lev1" "lev1" ...
>
> Note that by using grep(), the factors are coerced to
> character vectors
> as expected. You would need to coerce back to factors if you
> need them
> as such.
>
> HTH,
>
> Marc Schwartz
>
More information about the R-help
mailing list