[R] Making a table: collapsing across sub-strings
dieter_vanderelst at emailengine.org
Thu Oct 4 09:46:54 CEST 2007
A sub string can occur anywhere in the main string.
I think I could use TABLE and than add the numbers. But I don't know how
to access the numbers in the result of table.
Another problem is that there might be a hierarchy in the strings. This
is, string a might be a subset of b while b might be a subset of c. So,
when checking the strings, I would have to start with the longest string
and find all subsets of that one. An than I should check the second
longest string and so on...
But I cannot find a way of ordering strings on their length.
jim holtman wrote:
> How do you determine if one string is a subset of another? Does it
> only match at the beginning, or anywhere? How large is your set of
> strings? Can you use table as you describe and then determine what
> the groupings of subsets are and then just add the numbers together?
> You can use grep/regexpr to determine if one string is a subset of
> On 10/3/07, Dieter Vanderelst <dieter_vanderelst at emailengine.org> wrote:
>> Hi list,
>> I'm currently processing textual data and I would really appreciate some
>> help with one off my problems.
>> I have a set of strings and I want to count how often each of this
>> strings appears in this set.
>> This is not very difficult and can be done as:
>> However, I also want to collapse across sub-strings. This is, I want a
>> sub-string ss of string S to be counted as an occurrence of string S.
>> So, 'abab' should be included in the count of 'ababaaa' and should not
>> be listed as a separate entry in the frequency table.
>> Does somebody has a pointer to a way to do this? I have been checking
>> out the CRAN packages for handling DNA sequences, but this has not
>> really brought me closer to a solution.
>> Dieter Vanderelst
>> Dieter Vanderelst
>> Eindhoven University of Technology
>> Faculty of Industrial Design
>> Designed Intelligence Group
>> Den Dolech 2
>> 5612 AZ Eindhoven
>> The Netherlands
>> Tel +31 40 247 91 11
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help