[R] Good Package(s) for String and URL processing?

Tobias Verbeke tobias.verbeke at openanalytics.eu
Fri Jul 2 08:19:18 CEST 2010


On 07/02/2010 05:51 AM, Erik Iverson wrote:
> Ralf B wrote:
>> Are there packages that allow improved String and URL processing?
>> E.g. extract parts of a URLs such as sub-domains, top-level domain,
>> protocols (e.g. https, http, ftp), file type based on endings, check
>> if a URL is valid or not, etc...
>>
>> I am currently only using split and paste. Are there better and more
>> efficient ways to handle strings e.g. finding sub-strings or to do
>> pattern matching?
>> What packages do you use if you have to do a lot of String processing
>> and you don't have the option to go to another language such as Perl
>> or Python?
>
>
> Well, much of the power of Perl is built on top of regular expressions,
> which R also supports.
>
> See ?regex for more details. Also the R functions ?grep, ?sub, etc.
>
> I can also highly recommend the book "Mastering Regular Expressions". It
> does not cover R explicitly, but what you learn in there can be directly
> applied to R. Regexs go very, very far with helping you with the task of
> finding substrings and pattern matching.
>
> You might find some things in RCurl helpful:
>
> http://www.omegahat.org/RCurl/
>
> Probably others...

Including gsubfn by Gabor Grothendieck
and stringr by Hadley Wickham

http://cran.r-project.org/web/packages/gsubfn/index.html
http://cran.r-project.org/web/packages/stringr/index.html

Best,
Tobias



More information about the R-help mailing list