[R] R loop for applying an equation to each unique category

arun smartpink111 at yahoo.com
Mon Nov 4 21:04:34 CET 2013


Hi,You could use:
dat1 <- structure(list(Haplotype = c("H1", "H1", "H1", "H2", "H2", "H2", 
"H3", "H3", "H3", "H4", "H4", "H4", "H4", "H4", "H4"), Frequency = c(0.8278, 
0.02248, 0.1494, 0.8238, 0.02248, 0.1497, 0.1497, 0.02248, 0.8244, 
0.628, 0.02248, 0.1483, 0.1637, 0.01081, 0.01798)), .Names = c("Haplotype", 
"Frequency"), class = "data.frame", row.names = c(NA, -15L))

with(dat1,tapply(Frequency,list(Haplotype),function(x) sum(pi*x*log2(1/(pi*x)))))
 #with(dat1,tapply(Frequency,list(Haplotype),function(x) sum(pi*x*log(1/(pi*x)))))
#or
sapply(split(dat1[,-1],dat1$Haplotype),function(x) sum(pi*x*log2(1/(pi*x))))
A.K.




Hi all. I am seeking help in writing an R loop to calculate the shannon's information content (SIC) 
for every unique haplotype. The data includes the haplotypes in 
column 1 and frequency of haplotypes in column 2. As you can see in the 
example data with just 4 unique haplotypes, there are different numbers 
of each haplotype, with a frequency corresponding to each one. The 
frequency for all haplotype H* sums up to 1. The equation for SIC is 

Σi (πhi*log(1/(πhi))) 

where  πhi is the frequency of the hi haplotype. 



Haplotype	Frequency 
H1	0.8278 
H1	0.02248 
H1	0.1494 
H2	0.8238 
H2	0.02248 
H2	0.1497 
H3	0.1497 
H3	0.02248 
H3	0.8244 
H4	0.628 
H4	0.02248 
H4	0.1483 
H4	0.1637 
H4	0.01081 
H4	0.01798 

In this example, the SIC for H1 would be 

(π*0.8278*log(1/(π*0.8278))) + (π*0.02248*log(1/(π*0.02248))) + (π*0.1494*log(1/(π*0.1494))) 

and the final output should give 4 SIC values, one corresponding to each unique haplotype. 

I believe using lappy() is the correct method of going foward, 
but my R skills are very elementary to know what to do next. Thank you 
for any help.


More information about the R-help mailing list