[R] matching subvectors in vector sets
David Winsemius
dwinsemius at comcast.net
Sat Apr 18 18:16:25 CEST 2009
xlist <-list()
for (i in 2:length(x.s)){
x.seq <- embed(length(x.s):1, i)
xlist[[i]] <- table(apply(x.seq, 1, function(z){
paste(x.s[z], collapse=":")
}))
}
xlist
--
David Winsemius
On Apr 18, 2009, at 11:46 AM, Albert Vilella wrote:
> that works very well. how do I store the results into a variable
> instead of
> doing a print?
>
> On Fri, Apr 17, 2009 at 5:51 PM, jim holtman <jholtman at gmail.com>
> wrote:
>
>> How about this:
>>
>>> x <- "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
>>> x.s <- unlist(strsplit(x, ":"))
>>> for (i in 2:length(x.s)){
>> + x.seq <- embed(length(x.s):1, i)
>> + print(table(apply(x.seq, 1, function(z){
>> + paste(x.s[z], collapse=":")
>> + })))
>> + }
>>
>> A00096:A00096 A00096:A02178 A02178:A02178 A02178:A07776
>> 3 1 1 1
>>
>> A00096:A00096:A00096 A00096:A00096:A02178 A00096:A02178:A02178
>> A02178:A02178:A07776
>> 2 1 1
>> 1
>>
>> A00096:A00096:A00096:A00096 A00096:A00096:A00096:A02178
>> A00096:A00096:A02178:A02178 A00096:A02178:A02178:A07776
>> 1 1
>> 1 1
>>
>> A00096:A00096:A00096:A00096:A02178 A00096:A00096:A00096:A02178:A02178
>> A00096:A00096:A02178:A02178:A07776
>> 1 1
>> 1
>>
>> A00096:A00096:A00096:A00096:A02178:A02178
>> A00096:A00096:A00096:A02178:A02178:A07776
>> 1
>> 1
>>
>> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>> 1
>>
>>
>> On Fri, Apr 17, 2009 at 9:33 AM, Albert Vilella <avilella at gmail.com>
>> wrote:
>>> Starting by the first entry:
>>> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>>>
>>> and supposing there aren't any other subvectors identical in the
>>> set, the
>>> algorithm will slide through the vector, first in pairs, then in
>>> trios,
>> then
>>> in sets of four, etc, and count the occurrences:
>>>
>>> A00096:A00096
>>> 3
>>> A00096:A02178
>>> 1
>>> A02178:A02178
>>> 1
>>> A02178:A07776
>>> 1
>>> A00096:A00096:A00096
>>> 2
>>> A00096:A00096:A02178
>>> 1
>>> A00096:A02178:A02178
>>> 1
>>> A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096
>>> 1
>>> A00096:A00096:A00096:A02178
>>> 1
>>> A00096:A00096:A02178:A02178
>>> 1
>>> A00096:A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096:A02178
>>> 1
>>> A00096:A00096:A00096:A02178:A02178
>>> 1
>>> A00096:A00096:A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096:A02178:A02178
>>> 1
>>> A00096:A00096:A00096:A02178:A02178:A07776
>>> 1
>>> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>>> 1
>>>
>>>
>>>
>>>
>>> On Fri, Apr 17, 2009 at 1:04 PM, jim holtman <jholtman at gmail.com>
>>> wrote:
>>>>
>>>> Can you provide the output that you would expect from the data you
>>>> gave. I am not sure what you mean by a 'subvector'.
>>>>
>>>> On Fri, Apr 17, 2009 at 5:25 AM, Albert Vilella
>>>> <avilella at gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I've got a list of ~20000 elements that look like this:
>>>>>
>>>>> [1]
>>>>> "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
>>>>>
>>>>> [2]
>>>>> "A00046:A00076:A01101:A04146:A05671:A07169"
>>>>>
>>>>> [3]
>>>>>
>>>>>
>> "A00038
>> :A00932
>> :A02185:A02370:A02818:A02818:A02818:A02818:A04732:A07142:A07142"
>>>>>
>>>>> [4]
>>>>> "A00096:A01352:A01352:A02023:A05001:A05001:A07776"
>>>>>
>>>>> [5]
>>>>>
>>>>>
>> "A00036
>> :A00047
>> :A00059
>> :A00503
>> :A00904:A00904:A00904:A01023:A01023:A01399:A02029:A03941:A07679"
>>>>> [6]
>>>>>
>>>>>
>> "A00041
>> :A00533
>> :A00855
>> :A02178
>> :A02178:A02178:A05671:A05671:A05671:A05671:A05671:A05671:A05671"
>>>>> ...
>>>>>
>>>>> And I would like to have a table with the frequency of
>>>>> occurrences for
>>>>> matching subvectors in all elements, i.e., not
>>>>> only the number of times a vector is found but also how many
>>>>> times a
>>>>> subvector (of at least 2 ids) is found.
>>>>>
>>>>> How can I do that?
>>>>> Thanks in advance,
>>>>> Albert.
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jim Holtman
>>>> Cincinnati, OH
>>>> +1 513 646 9390
>>>>
>>>> What is the problem that you are trying to solve?
>>>
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list