[R] Regular expression help

Duncan Murdoch murdoch.duncan at gmail.com
Mon Oct 9 18:15:07 CEST 2017


On 09/10/2017 11:23 AM, Ulrik Stervbo wrote:
> Hi Duncan,
> 
> why not split on / and take the correct elements? It is not as elegant 
> as regex but could do the trick.

Thanks for the suggestion.  There are likely many thousands of lines of 
data like the two real examples (which had about 5000 and 60000 lines 
respectively), so I was thinking that would be too slow, as it would 
involve nested strsplit() calls.  But in fact, it's not so bad, so I 
might go with it.  Here's a stab at it:

lines <- <the lines to be split, e.g. the lines starting with "f" in 
http://sci.esa.int/science-e/www/object/doc.cfm?fobjectid=54726>

l2 <- strsplit(lines, " ")
l3 <- lapply(l2, function(x) {
         y <- strsplit(x, "/")
         sapply(y, function(z) if (length(z) == 3) z[3] else "")
       })

Duncan

> 
> Best,
> Ulrik
> 
> On Mon, 9 Oct 2017 at 17:03 Duncan Murdoch <murdoch.duncan at gmail.com 
> <mailto:murdoch.duncan at gmail.com>> wrote:
> 
>     I have a file containing "words" like
> 
> 
>     a
> 
>     a/b
> 
>     a/b/c
> 
>     where there may be multiple words on a line (separated by spaces).  The
>     a, b, and c strings can contain non-space, non-slash characters. I'd
>     like to use gsub() to extract the c strings (which should be empty if
>     there are none).
> 
>     A real example is
> 
>     "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
> 
>     which I'd like to transform to
> 
>     " 587 587 587 587"
> 
>     Another real example is
> 
>     "f 1067 28680 24462"
> 
>     which should transform to "   ".
> 
>     I've tried a few different regexprs, but am unable to find a way to say
>     "transform words by deleting everything up to and including the 2nd
>     slash" when there might be zero, one or two slashes.  Any suggestions?
> 
>     Duncan Murdoch
> 
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list