[R] caculate the frequencies of the Amino Acids

David Winsemius dwinsemius at comcast.net
Sat Jan 2 07:43:25 CET 2010


On Jan 2, 2010, at 12:55 AM, che wrote:

>
> i know it would be better to ask R to make the data, but i need to  
> sequence
> this particular file, because it is data for some Amino Acids and i  
> cant
> play with, so i need to ask R to go through the sequence one by one,  
> and
> then give me the numbers of each letters of each sequence, i am quite
> confused between using "i" and "j" and how to iterate both of them  
> and make
> them work functionally. i attached the sequence.txt with my original
> message, and i will attach it here in case. thanks for your help.
> http://n4.nabble.com/file/n997087/sequence.txt sequence.txt

Sorry. I did not read to the very end. My apologies, hopefully the  
following
oneliner will make up for my dereliction of attention.
>
> che wrote:
>>
>> may some one please help me to sort this out, i am trying to writ a  
>> R code
>> for calculating the frequencies of the amino acids in 9 different
>> sequences, i want the code to read the sequence from external text  
>> file, i
>> used the following code to do so:
>> x<-read.table("sequence.txt",header=FALSE)
>>
>> then i defined an array for 20 amino acids as following:
>> AA<- 
>> c 
>> ('A 
>> ','C 
>> ','D 
>> ','E 
>> ','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y')
>> i am using the following code to calculate the frequencies:

After copy-pasting the sequences from a browser window to a character  
object, "seqnc", I then processed it:

 > seqlines <- readLines(textConnection(seqnc))

# Then for the first sequence:

 > table(strsplit(seqlines[1], vector())  )

  A  D  E  F  G  I  K  L  M  N  P  Q  R  S  T  V  W  Y
21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33  3 15

# For "mass production": The names that resulted from my first effort  
were a bit
unwieldly ( > 200 characters long) so I unnamed it:

unname( sapply(seqlines, function(x) table(strsplit(x, vector() ) ) )  )

[[1]]

  A  D  E  F  G  I  K  L  M  N  P  Q  R  S  T  V  W  Y
21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33  3 15

[[2]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
34  5 15 25  6 35  7 24 23 32  9 12 15 10 17 14 13 36  2 13

[[3]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
33  5 17 24  7 36  7 24 24 32  9 13 14  9 17 12 14 36  2 12

[[4]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
33  5 16 25  5 35  6 24 23 33  8 12 15  9 17 17 12 35  2 15

[[5]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
33  4 15  6 21 30  3 19 23 22  8  8  8 14 17 14 12 24  5 12

[[6]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
30  3 13  4 16 22  2 17 16 17  6  6  7 11 15 11 12 18  3 11

[[7]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
39  5 21  8 22 39  2 23 29 25 10  8  7 13 22 14 21 25  7 16

[[8]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
34  4 17  6 19 30  2 20 24 21  8  7  7 12 17 14 16 21  5 14

[[9]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
35  4 17  6 18 31  3 20 23 21  8  7  7 12 18 12 17 21  5 13

[[10]]

A
5

-- 
David.


>>
>> frequency<-function(X)
>> {
>> y<-rep(0,20)
>> for(j in 1:nchar(as.character(x$V1[i]))){
>> for(i in 1:9){
>>
>> 	res<-which(AA==substr(x$V1[i],j,j))
>> 	y[res]=y[res]+1
>> 	}
>> 	}
>> return(y)
>> }
>>
>> but this code actually is not working, it reads only one sequence,  
>> i dont
>> know why the loop is not working for the "i", which suppose to read  
>> the
>> nine rows of the file sequence.txt. the sequence.txt file is  
>> attached to
>> this message.
>>
>> cheers
>> http://n4.nabble.com/file/n997072/sequence.txt sequence.txt
>>
>
> -- 
> View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997087.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list