[R] matching subvectors in vector sets

David Winsemius dwinsemius at comcast.net
Sat Apr 18 18:16:25 CEST 2009


xlist <-list()
  for (i in 2:length(x.s)){
       x.seq <- embed(length(x.s):1, i)
       xlist[[i]] <- table(apply(x.seq, 1, function(z){
           paste(x.s[z], collapse=":")
       }))
   }
  xlist

--  
David Winsemius
On Apr 18, 2009, at 11:46 AM, Albert Vilella wrote:

> that works very well. how do I store the results into a variable  
> instead of
> doing a print?
>
> On Fri, Apr 17, 2009 at 5:51 PM, jim holtman <jholtman at gmail.com>  
> wrote:
>
>> How about this:
>>
>>> x <- "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
>>> x.s <- unlist(strsplit(x, ":"))
>>> for (i in 2:length(x.s)){
>> +     x.seq <- embed(length(x.s):1, i)
>> +     print(table(apply(x.seq, 1, function(z){
>> +         paste(x.s[z], collapse=":")
>> +     })))
>> + }
>>
>> A00096:A00096 A00096:A02178 A02178:A02178 A02178:A07776
>>           3             1             1             1
>>
>> A00096:A00096:A00096 A00096:A00096:A02178 A00096:A02178:A02178
>> A02178:A02178:A07776
>>                  2                    1                    1
>>          1
>>
>> A00096:A00096:A00096:A00096 A00096:A00096:A00096:A02178
>> A00096:A00096:A02178:A02178 A00096:A02178:A02178:A07776
>>                         1                           1
>>          1                           1
>>
>> A00096:A00096:A00096:A00096:A02178 A00096:A00096:A00096:A02178:A02178
>> A00096:A00096:A02178:A02178:A07776
>>                                 1                                  1
>>                                1
>>
>> A00096:A00096:A00096:A00096:A02178:A02178
>> A00096:A00096:A00096:A02178:A02178:A07776
>>                                       1
>>          1
>>
>> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>>                                              1
>>
>>
>> On Fri, Apr 17, 2009 at 9:33 AM, Albert Vilella <avilella at gmail.com>
>> wrote:
>>> Starting by the first entry:
>>> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>>>
>>> and supposing there aren't any other subvectors identical in the  
>>> set, the
>>> algorithm will slide through the vector, first in pairs, then in  
>>> trios,
>> then
>>> in sets of four, etc, and count the occurrences:
>>>
>>> A00096:A00096
>>> 3
>>> A00096:A02178
>>> 1
>>> A02178:A02178
>>> 1
>>> A02178:A07776
>>> 1
>>> A00096:A00096:A00096
>>> 2
>>> A00096:A00096:A02178
>>> 1
>>> A00096:A02178:A02178
>>> 1
>>> A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096
>>> 1
>>> A00096:A00096:A00096:A02178
>>> 1
>>> A00096:A00096:A02178:A02178
>>> 1
>>> A00096:A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096:A02178
>>> 1
>>> A00096:A00096:A00096:A02178:A02178
>>> 1
>>> A00096:A00096:A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096:A02178:A02178
>>> 1
>>> A00096:A00096:A00096:A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>>> 1
>>>
>>>
>>>
>>>
>>> On Fri, Apr 17, 2009 at 1:04 PM, jim holtman <jholtman at gmail.com>  
>>> wrote:
>>>>
>>>> Can you provide the output that you would expect from the data you
>>>> gave.  I am not sure what you mean by a 'subvector'.
>>>>
>>>> On Fri, Apr 17, 2009 at 5:25 AM, Albert Vilella  
>>>> <avilella at gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I've got a list of ~20000 elements that look like this:
>>>>>
>>>>> [1]
>>>>> "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
>>>>>
>>>>> [2]
>>>>> "A00046:A00076:A01101:A04146:A05671:A07169"
>>>>>
>>>>> [3]
>>>>>
>>>>>
>> "A00038 
>> :A00932 
>> :A02185:A02370:A02818:A02818:A02818:A02818:A04732:A07142:A07142"
>>>>>
>>>>> [4]
>>>>> "A00096:A01352:A01352:A02023:A05001:A05001:A07776"
>>>>>
>>>>> [5]
>>>>>
>>>>>
>> "A00036 
>> :A00047 
>> :A00059 
>> :A00503 
>> :A00904:A00904:A00904:A01023:A01023:A01399:A02029:A03941:A07679"
>>>>> [6]
>>>>>
>>>>>
>> "A00041 
>> :A00533 
>> :A00855 
>> :A02178 
>> :A02178:A02178:A05671:A05671:A05671:A05671:A05671:A05671:A05671"
>>>>> ...
>>>>>
>>>>> And I would like to have a table with the frequency of  
>>>>> occurrences for
>>>>> matching subvectors in all elements, i.e., not
>>>>> only the number of times a vector is found but also how many  
>>>>> times a
>>>>> subvector (of at least 2 ids) is found.
>>>>>
>>>>> How can I do that?
>>>>> Thanks in advance,
>>>>> Albert.
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jim Holtman
>>>> Cincinnati, OH
>>>> +1 513 646 9390
>>>>
>>>> What is the problem that you are trying to solve?
>>>
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list