[R] Re gular Expression help

Gabor Grothendieck ggrothendieck at gmail.com
Sat Nov 8 23:09:59 CET 2008


I suspect strapply is only relatively slow on short strings where
it doesn't matter anyways since for long strings performance would
likely be dominated by the underlying regexp operations.  I know that
users are using the package for very long strings since I once had
to lift the 25,000 character limit since I had complaints about that.
The expressiveness and brevity of strapply (it would be shortest if it
were not for the length of the word simplify) offset any disadvantage
in my view.

On Sat, Nov 8, 2008 at 5:02 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> Gabor Grothendieck wrote:
>> For the problem at hand I think I would use your solution
>> which is both easily understood and fastest.  On the
>> other hand the tapply based solutions are coordinate
>> free (i.e. no explicit mucking with indices) and readily
>> generalize to more than 2 groups -- just replace [^pq] with
>> [^pqr], say.
>>
>>
>
> for sure, mine was optimized towards the case, not towards generalizability.
> the gsubfn one is a loser, though.
>
> but the first one *is* easily generalizable, e.g.,
>
> letters = "pqrs"
> sapply(sprintf("^[^%s]*%s", letters, unlist(strsplit(letters,
> split=""))), grep, x=x, value=TRUE)
>
> while an order of magnitude faster than the tapply ones.
>
> vQ
>



More information about the R-help mailing list