[R] Function for trim blanks from a string(s)?
Marc Schwartz
marc_schwartz at comcast.net
Mon Aug 6 22:42:22 CEST 2007
On Mon, 2007-08-06 at 21:23 +0100, Prof Brian Ripley wrote:
> I am sure Marc knows that ?sub has examples of trimming trailing space and
> whitespace in various styles.
Indeed, though leading spaces are not covered there, so thought that I
would take a minute or two to provide both and the combination of the
two using gsub().
> On Mon, 6 Aug 2007, Marc Schwartz wrote:
>
> > On Mon, 2007-08-06 at 12:15 -0700, adiamond wrote:
> >> I feel like an idiot posting this because every language I've ever seen has a
> >> string function that trims blanks off strings (off the front or back or
> >> both).
>
> Some very common languages do not, though. It is an exercise in Kernighan
> & Ritchie (the original C reference), and an FAQ entry for Perl.
>
> >> Ideally, it would process whole data frames/matrices etc but I don't
> >> even see one that processes a single string. But I've searched and I don't
> >> even see that. There's a strtrim function but it does something completely
> >> different.
> >
> > If you want to do this while initially importing the data into R using
> > one of the read.table() family of functions, see the 'strip.white'
> > argument in ?read.table, which would do an entire data frame in one
> > call.
> >
> > Otherwise, the easiest way to do it would be to use sub() or gsub()
> > along the lines of the following:
> >
> > # Strip leading space
> > sub("^ +", "", YourTextVector)
> >
> >
> > # Strip trailing space
> > sub(" +$", "", YourTextVector)
> >
> >
> > # Strip both
> > gsub("(^ +)|( +$)", "", YourTextVector)
> >
> >
> >
> >
> > Examples of use:
> >
> >> sub("^ +", "", " Leading Space")
> > [1] "Leading Space"
> >
> >
> >> sub(" +$", "", "Trailing Space ")
> > [1] "Trailing Space"
> >
> >
> >> gsub("(^ +)|( +$)", "", " Leading and Trailing Space ")
> > [1] "Leading and Trailing Space"
> >
> >
> > See ?sub which also has ?gsub
> >
> > Note that the above will only strip spaces, not all white space.
> >
> > You can then use the appropriate call in one of the *apply() family of
> > functions to loop over columns/rows as may be appropriate.
>
> Well, arrays are vectors and so can be done by
>
> A[] <- sub(....., A)
>
> and data frames with character columns by
>
> A[] <- lapply(A, function(x) sub(....., x))
Right. One could probably use it on mixed data frames along the lines of
the following (untested):
A[] <- lapply(A, function(x) ifelse(is.character(x) | is.factor(x),
sub(....., x), x))
And leave out the "| is.factor(x)" if one only wanted character columns
affected.
Thanks,
Marc
More information about the R-help
mailing list