[R] matching subvectors in vector sets

jim holtman jholtman at gmail.com
Fri Apr 17 18:51:36 CEST 2009


How about this:

> x <- "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
> x.s <- unlist(strsplit(x, ":"))
> for (i in 2:length(x.s)){
+     x.seq <- embed(length(x.s):1, i)
+     print(table(apply(x.seq, 1, function(z){
+         paste(x.s[z], collapse=":")
+     })))
+ }

A00096:A00096 A00096:A02178 A02178:A02178 A02178:A07776
            3             1             1             1

A00096:A00096:A00096 A00096:A00096:A02178 A00096:A02178:A02178
A02178:A02178:A07776
                   2                    1                    1
           1

A00096:A00096:A00096:A00096 A00096:A00096:A00096:A02178
A00096:A00096:A02178:A02178 A00096:A02178:A02178:A07776
                          1                           1
           1                           1

A00096:A00096:A00096:A00096:A02178 A00096:A00096:A00096:A02178:A02178
A00096:A00096:A02178:A02178:A07776
                                 1                                  1
                                1

A00096:A00096:A00096:A00096:A02178:A02178
A00096:A00096:A00096:A02178:A02178:A07776
                                        1
           1

A00096:A00096:A00096:A00096:A02178:A02178:A07776
                                               1


On Fri, Apr 17, 2009 at 9:33 AM, Albert Vilella <avilella at gmail.com> wrote:
> Starting by the first entry:
> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>
> and supposing there aren't any other subvectors identical in the set, the
> algorithm will slide through the vector, first in pairs, then in trios, then
> in sets of four, etc, and count the occurrences:
>
> A00096:A00096
> 3
> A00096:A02178
> 1
> A02178:A02178
> 1
> A02178:A07776
> 1
> A00096:A00096:A00096
> 2
> A00096:A00096:A02178
> 1
> A00096:A02178:A02178
> 1
> A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096
> 1
> A00096:A00096:A00096:A02178
> 1
> A00096:A00096:A02178:A02178
> 1
> A00096:A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096:A02178
> 1
> A00096:A00096:A00096:A02178:A02178
> 1
> A00096:A00096:A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096:A02178:A02178
> 1
> A00096:A00096:A00096:A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096:A02178:A02178:A07776
> 1
>
>
>
>
> On Fri, Apr 17, 2009 at 1:04 PM, jim holtman <jholtman at gmail.com> wrote:
>>
>> Can you provide the output that you would expect from the data you
>> gave.  I am not sure what you mean by a 'subvector'.
>>
>> On Fri, Apr 17, 2009 at 5:25 AM, Albert Vilella <avilella at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I've got a list of ~20000 elements that look like this:
>> >
>> > [1]
>> > "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
>> >
>> > [2]
>> > "A00046:A00076:A01101:A04146:A05671:A07169"
>> >
>> > [3]
>> >
>> > "A00038:A00932:A02185:A02370:A02818:A02818:A02818:A02818:A04732:A07142:A07142"
>> >
>> > [4]
>> > "A00096:A01352:A01352:A02023:A05001:A05001:A07776"
>> >
>> > [5]
>> >
>> > "A00036:A00047:A00059:A00503:A00904:A00904:A00904:A01023:A01023:A01399:A02029:A03941:A07679"
>> > [6]
>> >
>> > "A00041:A00533:A00855:A02178:A02178:A02178:A05671:A05671:A05671:A05671:A05671:A05671:A05671"
>> > ...
>> >
>> > And I would like to have a table with the frequency of occurrences for
>> > matching subvectors in all elements, i.e., not
>> > only the number of times a vector is found but also how many times a
>> > subvector (of at least 2 ids) is found.
>> >
>> > How can I do that?
>> > Thanks in advance,
>> > Albert.
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list