[R] Using regular expressions to detect clusters of consonants in a string

Mark Heckmann mark.heckmann at gmx.de
Wed Jul 1 11:07:22 CEST 2009


Hi Gabor,

thanks fort his great advice. Just one more question:
I cannot find how to switch off case sensitivity for the regex in the
documentation for gsubfn or strapply, like e.g. in gregexpr the ignore.case
=TRUE command.  Is there a way?

TIA,
Mark 

-------------------------------

Mark Heckmann
+ 49 (0) 421 - 1614618
www.markheckmann.de
R-Blog: http://ryouready.wordpress.com




-----Ursprüngliche Nachricht-----
Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] 
Gesendet: Dienstag, 30. Juni 2009 18:31
An: Mark Heckmann
Cc: r-help at r-project.org
Betreff: Re: [R] Using regular expressions to detect clusters of consonants
in a string

Try this:

library(gsubfn)
s <- "mystring"
strapply(s, "[bcdfghjklmnpqrstvwxyz]+", nchar)[[1]]

which returns a vector of consonant string lengths.
Now apply your algorithm to that.
See http://gsubfn.googlecode.com for more.

On Tue, Jun 30, 2009 at 11:30 AM, Mark Heckmann<mark.heckmann at gmx.de> wrote:
> Hi,
>
> I want to parse a string extracting the number of occurrences where two
> consonants clump together. Consider for example the word "hallo". Here I
> want the algorithm to return 1. For "chess" if want it to return 2. For
the
> word "screw" the result should be negative as it is a clump of three
> consonants not two. Also for word "abstraction" I do not want the
algorithm
> to detect two times a two consonant cluster. In this case the result
should
> be negative as well as it is four consonants in a row.
>
> str <- "hallo"
> gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
> extended = TRUE)[[1]]
>
> [1] 3
> attr(,"match.length")
> [1] 3
>
> The result is correct. Now I change the word to "hall"
>
> str <- "hall"
> gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
> extended = TRUE)[[1]]
>
> [1] -1
> attr(,"match.length")
> [1] -1
>
> Here my expression fails. How can I write a correct regex to do this? I
> always encounter problems at the beginning or end of a string.
>
> Also:
>
> str <- "abstraction"
> gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
> extended = TRUE)[[1]]
>
> [1] 4 7
> attr(,"match.length")
> [1] 3 3
>
> This also fails.
>
> Thanks in advance,
> Mark
>
> -------------------------------
> Mark Heckmann
> www.markheckmann.de
> R-Blog: http://ryouready.wordpress.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list