[R] matching a sequence in a vector?

jim holtman jholtman at gmail.com
Thu Feb 16 02:04:17 CET 2012


yet another solution (I think)

> patrn<- c(1,2,3,4)
> exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
> indx <- embed(rev(seq_along(exmpl)), length(patrn))
> matches <- apply(indx, 1, function(.indx){
+     all(exmpl[.indx] == patrn)
+ })
> indx[matches, 1L]
[1] 23 13  6
>


On Wed, Feb 15, 2012 at 6:32 PM, Redding, Matthew
<Matthew.Redding at deedi.qld.gov.au> wrote:
> Thankyou all for your great, and creative solutions.
>
> There is definately more than one way to skin a cat.
>
> A colleague alerted me to another solution:
>
> seq.strt <- which( sapply( 1:(length(exmpl)-length(patrn)+1), function(i) isTRUE( all.equal( patrn, exmpl[ i + 0:(length(patrn)-1) ] ) ) ) )
>
> Apparently this came from a post in the help archive that my searches missed.
>
> I think the solutions you have put up are more readable.
>
> Kind regards
>
> Matt
>
>
>
>>-----Original Message-----
>>From: r-help-bounces at r-project.org
>>[mailto:r-help-bounces at r-project.org] On Behalf Of Berend Hasselman
>>Sent: Thursday, 16 February 2012 1:35 AM
>>To: Martin Morgan
>>Cc: r-help at r-project.org
>>Subject: Re: [R] matching a sequence in a vector?
>>
>>
>>On 15-02-2012, at 15:27, Martin Morgan wrote:
>>
>>> On 02/14/2012 11:45 PM, Petr Savicky wrote:
>>>> On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
>>>>> Hi All,
>>>>>
>>>>>
>>>>> I've been trawling through the documentation and listserv archives
>>>>> on this topic -- but as yet have not found a solution.  I'm sure
>>>>> this is pretty simple with R, but I cannot work out how
>>without resorting to ugly nested loops.
>>>>>
>>>>> As far as I can tell, grep, match, and %in% are not the
>>correct tools.
>>>>>
>>>>> Question:
>>>>> given these vectors --
>>>>> patrn<- c(1,2,3,4)
>>>>> exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
>>>>>
>>>>> how do I get the desired answer by finding the occurence
>>of the pattern and returning the starting indices:
>>>>> 6, 13, 23
>>>
>>> match(exmpl, patrn) returns indexes that differ by 1 if the sequence
>>> patrn occurs
>>>
>>>  n = length(patrn)
>>>  r = rle(diff(match(exmpl, patrn)) == 1)
>>>
>>> we're looking for a run of TRUE's of length 3, and can find
>>their ends
>>> (of the runs of diffs) as cumsum(r$length)
>>>
>>>  cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2)
>>>
>>> Seems like there could be edge cases that I'm missing...
>>
>>Clever.
>>However it is quite slow.
>>
>>Berend
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ********************************DISCLAIMER****************************
> The information contained in the above e-mail message or messages
> (which includes any attachments) is confidential and may be legally
> privileged.  It is intended only for the use of the person or entity
> to which it is addressed.  If you are not the addressee any form of
> disclosure, copying, modification, distribution or any action taken
> or omitted in reliance on the information is unauthorised.  Opinions
> contained in the message(s) do not necessarily reflect the opinions
> of the Queensland Government and its authorities.  If you received
> this communication in error, please notify the sender immediately
> and delete it from your computer system network.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list