[R] Regular expression help

David Winsemius dwinsemius at comcast.net
Tue Oct 10 18:09:17 CEST 2017


> On Oct 9, 2017, at 6:08 PM, Georges Monette <georges at yorku.ca> wrote:
> 
> How about this (I'm showing it as a pipe because it's easier to read that way):
> 
> library(magrittr)
> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587" %>%
>   strsplit(' ') %>%
>   unlist %>%
>   sub('^[^/]*/*','',.) %>%
>   sub('^[^/]*/*','',.) %>%
>   paste(collapse = ' ')

I'm old school R, so I don't find that particularly readable. I read the later specification as saying each line began with an f, so the fourth item after an strsplit becomes the target.

This seemed more readable to me:

Lines <- readLines(url("http://sci.esa.int/science-e/www/object/doc.cfm?fobjectid=54726"))
lines <- Lines[ grepl("^f", Lines) ]

str(lines)
# chr [1:62908] "f 14327 6959 18747" "f 8258 15598 18980" "f 27662 21871 21939" ...

l2 <- strsplit(lines, " ")  # in that file the separators were spaces
l3 <- sapply(l2[1:3], function(x) { if (length(x) == 4) x[4] else ""
      })
l3
#[1] "18747" "18980" "21939"

# Remove the `[1:3]` to get the entire result.


Best;
David.

> 
> Georges Monette
> 
> -- 
> Georges Monette, PhD P.Stat.(SSC) | Associate Professor. Faculty of Science, Department of Mathematics & Statistics | North 626 Ross Building | York University | 4700 Keele Street, Toronto, ON M3J 1P3 | Telephone: 416-736-5250 | Fax: 416-736-5757 | E-Mail: georges at yorku.ca
> 
> 
> On 2017-10-09 11:02 AM, Duncan Murdoch wrote:
>> I have a file containing "words" like
>> 
>> 
>> a
>> 
>> a/b
>> 
>> a/b/c
>> 
>> where there may be multiple words on a line (separated by spaces).  The a, b, and c strings can contain non-space, non-slash characters. I'd like to use gsub() to extract the c strings (which should be empty if there are none).
>> 
>> A real example is
>> 
>> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
>> 
>> which I'd like to transform to
>> 
>> " 587 587 587 587"
>> 
>> Another real example is
>> 
>> "f 1067 28680 24462"
>> 
>> which should transform to "   ".
>> 
>> I've tried a few different regexprs, but am unable to find a way to say "transform words by deleting everything up to and including the 2nd slash" when there might be zero, one or two slashes.  Any suggestions?
>> 
>> Duncan Murdoch
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law



More information about the R-help mailing list