[R] Regular expression help
David Winsemius
dwinsemius at comcast.net
Tue Oct 10 18:09:17 CEST 2017
> On Oct 9, 2017, at 6:08 PM, Georges Monette <georges at yorku.ca> wrote:
>
> How about this (I'm showing it as a pipe because it's easier to read that way):
>
> library(magrittr)
> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587" %>%
> strsplit(' ') %>%
> unlist %>%
> sub('^[^/]*/*','',.) %>%
> sub('^[^/]*/*','',.) %>%
> paste(collapse = ' ')
I'm old school R, so I don't find that particularly readable. I read the later specification as saying each line began with an f, so the fourth item after an strsplit becomes the target.
This seemed more readable to me:
Lines <- readLines(url("http://sci.esa.int/science-e/www/object/doc.cfm?fobjectid=54726"))
lines <- Lines[ grepl("^f", Lines) ]
str(lines)
# chr [1:62908] "f 14327 6959 18747" "f 8258 15598 18980" "f 27662 21871 21939" ...
l2 <- strsplit(lines, " ") # in that file the separators were spaces
l3 <- sapply(l2[1:3], function(x) { if (length(x) == 4) x[4] else ""
})
l3
#[1] "18747" "18980" "21939"
# Remove the `[1:3]` to get the entire result.
Best;
David.
>
> Georges Monette
>
> --
> Georges Monette, PhD P.Stat.(SSC) | Associate Professor. Faculty of Science, Department of Mathematics & Statistics | North 626 Ross Building | York University | 4700 Keele Street, Toronto, ON M3J 1P3 | Telephone: 416-736-5250 | Fax: 416-736-5757 | E-Mail: georges at yorku.ca
>
>
> On 2017-10-09 11:02 AM, Duncan Murdoch wrote:
>> I have a file containing "words" like
>>
>>
>> a
>>
>> a/b
>>
>> a/b/c
>>
>> where there may be multiple words on a line (separated by spaces). The a, b, and c strings can contain non-space, non-slash characters. I'd like to use gsub() to extract the c strings (which should be empty if there are none).
>>
>> A real example is
>>
>> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
>>
>> which I'd like to transform to
>>
>> " 587 587 587 587"
>>
>> Another real example is
>>
>> "f 1067 28680 24462"
>>
>> which should transform to " ".
>>
>> I've tried a few different regexprs, but am unable to find a way to say "transform words by deleting everything up to and including the 2nd slash" when there might be zero, one or two slashes. Any suggestions?
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
More information about the R-help
mailing list