[R] Q about strsplit and regexp

Liaw, Andy andy_liaw at merck.com
Wed Oct 20 15:43:15 CEST 2004


Thanks to Barry Rawlingson, Peter Dalgaard, Jean-Pierre Muller, Dimitris
Rizopoulos, John Fox, and Stephen Upton for comments and suggestions.  Looks
like there's no easier way than to strip the spaces before splitting the
fields.  Several people suggested deleting the empty strings afterwards.  In
my particular application, there are typically thousands of fields, and I'd
think stripping leading (and maybe trailing) spaces in the original string
should be more efficient than computing nchar() on all fields afterwards.
(Although in reality it hardly makes any difference for me:  I'm only doing
this once, not gazillion times...)

So, in summary, I'm sticking with what I had originally.  Prof. Fox's
function for nuking leading and trailing white spaces will come in handy,
though.

Thanks again to all!

Best,
Andy

> From: Liaw, Andy
> 
> Dear R-help,
> 
> This one is probably a piece of cake for regexp masters.  I'd 
> like to split
> a character vector (for simplicity, say of length one for 
> now) that contains
> fields that are delimited by arbitrary number of white spaces 
> (e.g., "  a b
> c ").  How do I get the character vector that contain the 
> fields?  In the
> example I gave, I've tried:
> 
> > strsplit("  a b    c ", " +")
> [[1]]
> [1] ""  "a" "b" "c"
> 
> I do not want that empty character in the beginning, but 
> couldn't figure out
> how to strip the starting white spaces, other than something 
> ugly like:
> 
> > strsplit(sub("^ +", "", "  a b    c "), " +")
> [[1]]
> [1] "a" "b" "c"
> 
> Can some kind soul point me to a simpler way?  TIA!!
> 
> Best,
> Andy
> 
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300     
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>




More information about the R-help mailing list