[Bioc-devel] alphabetFrequency on AAString

Hervé Pagès hpages at fhcrc.org
Fri Dec 20 09:32:06 CET 2013


Hi Michael,

On 11/12/2013 11:31 AM, Hervé Pagès wrote:
> Hi Michael,
>
> On 11/12/2013 10:27 AM, Michael Lawrence wrote:
>> Seems like the output could be more consistent with the behavior on
>> DNAStringSet, i.e., the counts could be named.
>>
>>> alphabetFrequency(AAString("CYGGAGTRQ"))
>>    [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0
>> 0 0
>>   [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
>> 0 0 3
>> 0 0
>>   [75] 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0
>> 0 0
>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0
>> 0 0
>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0
>> 0 0
>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0
>> 0 0
>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> Right. There is actually no specific method for AAString objects. The
> more generic method for XString objects is being called here. I'll
> change this.

This is done in Biostrings 2.31.7:

 > alphabetFrequency(x[[4]])
     A     R     N     D     C     Q     E     G     H     I     L     K 
     M
     3     3     4     3     1     2     3     2     4     4     3     2 
     1
     F     P     S     T     W     Y     V     U     O     B     Z     X 
     *
     1     1     4     2     1     3     2     0     0     0     0     0 
     0
     -     + other
     0     0     0

 > alphabetFrequency(x)
       A R N D C Q E G H I L K M F P S T W Y V U O B Z X * - + other
  [1,] 0 2 1 3 5 0 0 2 1 1 0 2 2 1 1 1 1 2 0 0 0 0 0 0 0 0 0 0     0
  [2,] 3 1 1 2 0 0 0 0 1 2 2 3 1 0 0 3 1 0 2 0 0 0 0 0 0 0 0 0     0
  [3,] 1 2 3 3 2 4 0 2 4 3 0 1 3 4 4 5 0 2 3 1 0 0 0 0 0 0 0 0     0
  [4,] 3 3 4 3 1 2 3 2 4 4 3 2 1 1 1 4 2 1 3 2 0 0 0 0 0 0 0 0     0
  [5,] 1 2 1 1 2 2 1 1 0 1 1 2 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0     0
  [6,] 1 0 2 1 0 0 0 2 1 0 2 2 3 2 0 0 1 2 0 0 0 0 0 0 0 0 0 0     0
  [7,] 1 0 2 1 1 1 1 1 0 1 1 0 1 1 0 2 1 1 1 3 0 0 0 0 0 0 0 0     0
  [8,] 0 3 1 1 1 2 0 1 0 1 0 1 3 5 1 2 0 0 2 2 0 0 0 0 0 0 0 0     0
  [9,] 0 1 3 2 1 1 3 1 2 2 0 1 1 0 3 2 2 1 2 3 0 0 0 0 0 0 0 0     0
[10,] 0 0 0 1 0 1 2 1 3 3 0 2 2 1 1 2 3 5 3 1 0 0 0 0 0 0 0 0     0

The reason there is an "other" col is that the Amino Acid alphabet
is not enforced (yet).

Cheers,
H.

>
> H.
>
>
>>
>> Thanks,
>> Michael
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list