[R] Counting the occurences of a charater within a string

Florent D. flodel at gmail.com
Fri Dec 2 05:26:20 CET 2011


Inefficient, maybe, but what you suggest does not work if a string
starts or ends with a slash.

On Thu, Dec 1, 2011 at 11:11 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> strsplit is certainly an alternative, but your approach is
> unnecessarily complicated and inefficient. Do this, instead:
>
> sapply(strsplit(x,"/"),length)-1
>
> Cheers,
> Bert
>
> On Thu, Dec 1, 2011 at 7:44 PM, Florent D. <flodel at gmail.com> wrote:
>> Resending my code, not sure why the linebreaks got eaten:
>>
>>> x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
>>> count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) == "/")
>>> within(x, Col2 <- vapply(Col1, count.slashes, 1))
>>         Col1 Col2
>> 1     abc/def    1
>> 2 ghi/jkl/mno    2
>>
>>
>> On Thu, Dec 1, 2011 at 10:32 PM, Florent D. <flodel at gmail.com> wrote:
>>> I used within and vapply:
>>>
>>> x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
>>> count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) ==
>>> "/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
>>>          Col1 Col21     abc/def    12 ghi/jkl/mno    2
>>>
>>> On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter <gunter.berton at gene.com> wrote:
>>>> ## It's not a data frame -- it's just a vector.
>>>>
>>>>> x
>>>> [1] "abc/def"     "ghi/jkl/mno"
>>>>> gsub("[^/]","",x)
>>>> [1] "/"  "//"
>>>>> nchar(gsub("[^/]","",x))
>>>> [1] 1 2
>>>>>
>>>>
>>>> ?gsub
>>>> ?nchar
>>>>
>>>> -- Bert
>>>>
>>>> On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault
>>>> <Douglas.Esneault at mecglobal.com> wrote:
>>>>> I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level.
>>>>>
>>>>> My dataframe, x,  is structured as below:
>>>>>
>>>>> Col1
>>>>> abc/def
>>>>> ghi/jkl/mno
>>>>>
>>>>> I found this code on the board but it counts all occurrences of "/" in the dataframe.
>>>>>
>>>>> chr.pos <- which(unlist(strsplit(x,NULL))=='/')
>>>>> chr.count <- length(chr.pos)
>>>>> chr.count
>>>>> [1] 3
>>>>>
>>>>> I'd like to append a column, say cnt, that has the count of "/" for each row.
>>>>>
>>>>> Can anyone point me in the right direction or offer some code to do this?
>>>>>
>>>>> Thanks in advance for the help.
>>>>>
>>>>> Doug Esneault
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Privileged/Confidential Information may be contained in this message. If you
>>>>> are not the addressee indicated in this message (or responsible for delivery
>>>>> of the message to such person), you may not copy or deliver this message to
>>>>> anyone. In such case, you should destroy this message and kindly notify the
>>>>> sender by reply email. Please advise immediately if you or your employer
>>>>> does not consent to email for messages of this kind. Opinions, conclusions
>>>>> and other information in this message that do not relate to the official
>>>>> business of the GroupM companies shall be understood as neither given nor
>>>>> endorsed by it.   GroupM companies are a member of WPP plc. For more
>>>>> information on our business ethical standards and Corporate Responsibility
>>>>> policies please refer to our website at
>>>>> http://www.wpp.com/WPP/About/
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Bert Gunter
>>>> Genentech Nonclinical Biostatistics
>>>>
>>>> Internal Contact Info:
>>>> Phone: 467-7374
>>>> Website:
>>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list