[R] extracting values from txt with regular expression

Rui Barradas ruipbarradas at sapo.pt
Fri Jun 8 10:18:27 CEST 2012


Hello,

Just put the entire regexp between parenthesis.

extracted <-
strsplit(gsub("([+-]?(?:\\d+(?:\\.\\d*)|\\.\\d+)(?:[eE][+-]?\\d+)?)","\\1%&",txt_line),"%&") 

extracted

sapply(strsplit(unlist(extracted), "="), "[", 2)


As for speed, I believe that this might take longer. It will have to 
match a regular expression, then substitute, then split. A routine like 
the one I've send usually gives an order of magnitude or more. The first 
time I've written one was around 20 years ago, I can now write it with 
my eyes closed and it consistently beats alternatives but there's no 
harm in trying. Or in combining strategies.

Good luck.

Rui Barradas

Em 08-06-2012 04:52, emorway escreveu:
> Hi Dan and Rui,  Thank you for the suggestions, both were very helpful.
> Rui's code was quite fast...there is one more thing I want to explore for my
> own edification, but first I need some help fixing the code below, which is
> a slight modification to Dan's suggestion.  It'll no doubt be tough to beat
> the time Rui's code finished the task in, but I'm willing to try.  First, I
> need to fix the following, which 'peels' the wrong bit of text from
> "txt_line".  Instead of extracting as it now does (shown below), can the
> code be modified to extract the values 0.01 and -0.05, and store them in the
> variable 'extracted'?
>
> txt_line<-" PERCENT DISCREPANCY =           0.01     PERCENT DISCREPANCY =
> -0.05"
> extracted <-
> strsplit(gsub("[+-]?(?:\\d+(?:\\.\\d*)|\\.\\d+)(?:[eE][+-]?\\d+)?","\\1%&",txt_line),"%&")
> extracted
> #[1] " PERCENT DISCREPANCY =           "    "     PERCENT DISCREPANCY =
> "
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/extracting-values-from-txt-file-that-follow-user-supplied-quote-tp4632558p4632753.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list