[R] count occurrence and distance of characters in string

David Winsemius dwinsemius at comcast.net
Fri Nov 5 03:11:29 CET 2010


On Nov 4, 2010, at 8:06 PM, Charles C. Berry wrote:

> On Fri, 5 Nov 2010, Immanuel wrote:
>
>> Hey,
>>
>> thanks for the answer, actually I already typed an example
>> but deleted it since I thought it's superfluous.
>> regards
>>
>> ---------
>> string <- "kjokllokkoadddo"
>>
>> # f1(string, "o") should return that "o" was found 4 times

Other ways:

sum(unlist(strsplit(string, "")) == "o")
[1] 4

>> # f2(string, "o") should return that the distances between the "o"'s
>> found is 3 , 2, 4
>> ---------

 > diff(grep("o", strsplit(string, "")[[1]]) ) -1
[1] 3 2 4


>
> In that case, I'd use split:
>
>> res <- split(seq(nchar(string)),unlist(strsplit(string,'')))
>> length(res[['o']])
> [1] 4
>> ## or sapply(res,length)
> a d j k l o
> 1 3 1 4 2 4
>> diff(res[['o']])-1
> [1] 3 2 4
>> # or
>> sapply(sapply(res,diff),"-",1)
> $a
> numeric(0)
>
> $d
> [1] 0 0
>
> $j
> numeric(0)
>
> $k
> [1] 2 3 0
>
> $l
> [1] 0
>
> $o
> [1] 3 2 4
>
>>
> Chuck
>
>
>>
>>
>> On 11/05/2010 12:28 AM, Charles C. Berry wrote:
>>> On Thu, 4 Nov 2010, Immanuel wrote:
>>>
>>>> Hello all,
>>>>
>>>> I want to know how often one character occurs in a given string
>>>> and the distance from between every two occurences. (distance =  
>>>> other
>>>> characters between them).
>>>
>>> You should provide "commented, minimal, self-contained, reproducible
>>> code" as asked.
>>>
>>> And especially for a question like this one with many simple answers
>>> that RespondeRs will shower you with if only you give them a  
>>> starting
>>> point.
>>>
>>> Use tapply, strsplit, seq, nchar, unlist, diff, "-", and table for  
>>> one
>>> way.
>>>
>>> Chuck
>>>
>>>>
>>>> thanks
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> Charles C. Berry                            Dept of Family/ 
>>> Preventive
>>> Medicine
>>> cberry at tajo.ucsd.edu                UC San Diego
>>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
>>> 92093-0901
>>>
>>>
>>>
>>
>>
>
> Charles C. Berry                            Dept of Family/ 
> Preventive Medicine
> cberry at tajo.ucsd.edu			    UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego  
> 92093-0901
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list