[R] caculate the frequencies of the Amino Acids
David Winsemius
dwinsemius at comcast.net
Sat Jan 2 07:43:25 CET 2010
On Jan 2, 2010, at 12:55 AM, che wrote:
>
> i know it would be better to ask R to make the data, but i need to
> sequence
> this particular file, because it is data for some Amino Acids and i
> cant
> play with, so i need to ask R to go through the sequence one by one,
> and
> then give me the numbers of each letters of each sequence, i am quite
> confused between using "i" and "j" and how to iterate both of them
> and make
> them work functionally. i attached the sequence.txt with my original
> message, and i will attach it here in case. thanks for your help.
> http://n4.nabble.com/file/n997087/sequence.txt sequence.txt
Sorry. I did not read to the very end. My apologies, hopefully the
following
oneliner will make up for my dereliction of attention.
>
> che wrote:
>>
>> may some one please help me to sort this out, i am trying to writ a
>> R code
>> for calculating the frequencies of the amino acids in 9 different
>> sequences, i want the code to read the sequence from external text
>> file, i
>> used the following code to do so:
>> x<-read.table("sequence.txt",header=FALSE)
>>
>> then i defined an array for 20 amino acids as following:
>> AA<-
>> c
>> ('A
>> ','C
>> ','D
>> ','E
>> ','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y')
>> i am using the following code to calculate the frequencies:
After copy-pasting the sequences from a browser window to a character
object, "seqnc", I then processed it:
> seqlines <- readLines(textConnection(seqnc))
# Then for the first sequence:
> table(strsplit(seqlines[1], vector()) )
A D E F G I K L M N P Q R S T V W Y
21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33 3 15
# For "mass production": The names that resulted from my first effort
were a bit
unwieldly ( > 200 characters long) so I unnamed it:
unname( sapply(seqlines, function(x) table(strsplit(x, vector() ) ) ) )
[[1]]
A D E F G I K L M N P Q R S T V W Y
21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33 3 15
[[2]]
A C D E F G H I K L M N P Q R S T V W Y
34 5 15 25 6 35 7 24 23 32 9 12 15 10 17 14 13 36 2 13
[[3]]
A C D E F G H I K L M N P Q R S T V W Y
33 5 17 24 7 36 7 24 24 32 9 13 14 9 17 12 14 36 2 12
[[4]]
A C D E F G H I K L M N P Q R S T V W Y
33 5 16 25 5 35 6 24 23 33 8 12 15 9 17 17 12 35 2 15
[[5]]
A C D E F G H I K L M N P Q R S T V W Y
33 4 15 6 21 30 3 19 23 22 8 8 8 14 17 14 12 24 5 12
[[6]]
A C D E F G H I K L M N P Q R S T V W Y
30 3 13 4 16 22 2 17 16 17 6 6 7 11 15 11 12 18 3 11
[[7]]
A C D E F G H I K L M N P Q R S T V W Y
39 5 21 8 22 39 2 23 29 25 10 8 7 13 22 14 21 25 7 16
[[8]]
A C D E F G H I K L M N P Q R S T V W Y
34 4 17 6 19 30 2 20 24 21 8 7 7 12 17 14 16 21 5 14
[[9]]
A C D E F G H I K L M N P Q R S T V W Y
35 4 17 6 18 31 3 20 23 21 8 7 7 12 18 12 17 21 5 13
[[10]]
A
5
--
David.
>>
>> frequency<-function(X)
>> {
>> y<-rep(0,20)
>> for(j in 1:nchar(as.character(x$V1[i]))){
>> for(i in 1:9){
>>
>> res<-which(AA==substr(x$V1[i],j,j))
>> y[res]=y[res]+1
>> }
>> }
>> return(y)
>> }
>>
>> but this code actually is not working, it reads only one sequence,
>> i dont
>> know why the loop is not working for the "i", which suppose to read
>> the
>> nine rows of the file sequence.txt. the sequence.txt file is
>> attached to
>> this message.
>>
>> cheers
>> http://n4.nabble.com/file/n997072/sequence.txt sequence.txt
>>
>
> --
> View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997087.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list