[R] applying strsplit to a whole column

Joshua Wiley jwiley.psych at gmail.com
Wed Aug 4 21:38:56 CEST 2010


On Wed, Aug 4, 2010 at 12:03 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> I am sorry, someone said that strsplit automatically works on a
> column. How exactly does it work?

I was not suggesting it was the ideal choice in this case, merely that
a loop was not required.  I tend to use strsplit() more for doing
things like breaking up addresses (where I want all parts saved).  IF
each element is broken up into an equal number of pieces (as in your
example), then something like this would extract the parts of
interest.

x <- data.frame(nam1 = c("bbb..aba","ccc..abb","ddd..abc","eee..abd"),
              stringsAsFactors=FALSE)

unlist(strsplit(x[[1]],split="\\.."))[seq(1, 2*length(x[[1]]), 2)]
unlist(strsplit(x[[1]],split="\\.."))[seq(2, 2*length(x[[1]]), 2)]

> For example, if I want to grab just the first (or the second) part of
> the string in nam1 that should be split based on ".."
> x<-data.frame(nam1=c("bbb..aba","ccc..abb","ddd..abc","eee..abd"),
> stringsAsFactors=FALSE)
> str(x)
> strsplit(x[[1]],split="\\..")
> str(strsplit(x[[1]],split="\\.."))
>
> I am getting a list - hence, it looks like I have to go in a loop...?
>
> Thank you!
> Dimitri
>
>
> On Wed, Aug 4, 2010 at 2:39 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Thank you very much, everyone!
>> Dimitri
>>
>> On Wed, Aug 4, 2010 at 2:10 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>
>>> On Aug 4, 2010, at 1:42 PM, Dimitri Liakhovitski wrote:
>>>
>>>> I am sorry, I'd like to split my column ("names") such that all the
>>>> beginning of a string ("X..") is gone and only the rest of the text is
>>>> left.
>>>
>>> I could not tell whether it was the string "X.." or the pattern "X.." that
>>> was your goal for matching and removal.
>>>>
>>>> x<-data.frame(names=c("X..aba","X..abb","X..abc","X..abd"))
>>>> x$names<-as.character(x$names)
>>>
>>> a) Instead of "names" which is heavily used function name, use something
>>> more specific. Otherwise you get:
>>>> names(x)
>>> "names"  # and thereby avoid list comments about canines.
>>>
>>> b) Instead of coercing a character vector back to a character vector, use
>>> stringsAsFactors = FALSE.
>>>
>>>> x<-data.frame(nam1=c("X..aba","X..abb","X..abc","X..abd"),
>>>> stringsAsFactors=FALSE)
>>> #Thus is the pattern version:
>>>
>>>> x$nam1 <- gsub("X..",'', x$nam1)
>>>> x
>>>  nam1
>>> 1   aba
>>> 2   abb
>>> 3   abc
>>> 4   abd
>>>
>>> This is the string version:
>>>> x<-data.frame(nam1=c("X......aba","X.y.abb","X..abc","X..abd"),
>>>> stringsAsFactors=FALSE)
>>>>  x$nam1 <- gsub("X\\.+",'', x$nam1)
>>>> x
>>>   nam1
>>> 1   aba
>>> 2 y.abb
>>> 3   abc
>>> 4   abd
>>>
>>>
>>>> (x)
>>>> str(x)
>>>>
>>>> Can't figure out how to apply strsplit in this situation - without
>>>> using a loop. I hope it's possible to do it without a loop - is it?
>>>
>>> --
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list