[R] R loop for applying an equation to each unique category
arun
smartpink111 at yahoo.com
Mon Nov 4 21:04:34 CET 2013
Hi,You could use:
dat1 <- structure(list(Haplotype = c("H1", "H1", "H1", "H2", "H2", "H2",
"H3", "H3", "H3", "H4", "H4", "H4", "H4", "H4", "H4"), Frequency = c(0.8278,
0.02248, 0.1494, 0.8238, 0.02248, 0.1497, 0.1497, 0.02248, 0.8244,
0.628, 0.02248, 0.1483, 0.1637, 0.01081, 0.01798)), .Names = c("Haplotype",
"Frequency"), class = "data.frame", row.names = c(NA, -15L))
with(dat1,tapply(Frequency,list(Haplotype),function(x) sum(pi*x*log2(1/(pi*x)))))
#with(dat1,tapply(Frequency,list(Haplotype),function(x) sum(pi*x*log(1/(pi*x)))))
#or
sapply(split(dat1[,-1],dat1$Haplotype),function(x) sum(pi*x*log2(1/(pi*x))))
A.K.
Hi all. I am seeking help in writing an R loop to calculate the shannon's information content (SIC)
for every unique haplotype. The data includes the haplotypes in
column 1 and frequency of haplotypes in column 2. As you can see in the
example data with just 4 unique haplotypes, there are different numbers
of each haplotype, with a frequency corresponding to each one. The
frequency for all haplotype H* sums up to 1. The equation for SIC is
Σi (πhi*log(1/(πhi)))
where πhi is the frequency of the hi haplotype.
Haplotype Frequency
H1 0.8278
H1 0.02248
H1 0.1494
H2 0.8238
H2 0.02248
H2 0.1497
H3 0.1497
H3 0.02248
H3 0.8244
H4 0.628
H4 0.02248
H4 0.1483
H4 0.1637
H4 0.01081
H4 0.01798
In this example, the SIC for H1 would be
(π*0.8278*log(1/(π*0.8278))) + (π*0.02248*log(1/(π*0.02248))) + (π*0.1494*log(1/(π*0.1494)))
and the final output should give 4 SIC values, one corresponding to each unique haplotype.
I believe using lappy() is the correct method of going foward,
but my R skills are very elementary to know what to do next. Thank you
for any help.
More information about the R-help
mailing list