[R] Some code to run Hutcheson t-test using R
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Sep 10 13:04:17 CEST 2020
Hello,
Sorry, there's an instruction missing. See inline.
Às 11:44 de 10/09/20, Rui Barradas escreveu:
> If you want a function automating Karl's code, here it is. It returns an
> object of S3 class "htest", R's standard for hypothesis tests functions.
> The returned object can then be subset in the usual ways, ht$statistic,
> ht$parameter, ht$p.value, etc.
>
>
> library(QSutils)
>
> hutcheson.test <- function(x1, x2){
> dataname1 <- deparse(substitute(x1))
> dataname2 <- deparse(substitute(x2))
> method <- "Hutcheson's t-test for Shannon diversity equality"
> alternative <- "the diversities of the two samples are not equal"
> h1 <- Shannon(x1)
> varh1 <- ShannonVar(x1)
> n1 <- sum(x1)
> h2 <- Shannon(x2)
> varh2 <- ShannonVar(x2)
> n2 <- sum(x2)
> degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
> tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> p.value <- 2*pt(-abs(tstat), df = degfree)
> ht <- list(
> statistic = c(t = tstat),
> parameter = c(df = degfree),
> p.value = p.value,
> alternative = alternative,
> method = method,
> data.name = paste(dataname1, dataname2, sep = ", ")
> )
> class(ht) <- "htest"
> ht
> }
>
> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
>
> hutcheson.test(earlier, later)
>
>
>
> With the data you provided:
>
>
> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> bird_1959 <- c(0,0,14,59,26,68,0)
> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>
> hutcheson.test(bird_1956, bird_1957)
>
>
>
>
> Note that like David said earlier, there might be better ways to
> interpret Shannon's diversity index. If h is the sample's diversity,
> exp(h) gives the number of equally-common species with equivalent
> diversity.
>
>
> s1 <- Shannon(earlier)
> s2 <- Shannon(later)
> c(earlier = s1, later = s2)
> exp(c(earlier = s1, later = s2)) # Both round to 3
> eq_common <- rep(1, 3) # Can be 1 specimen or any other number
> Shannon(eq_common) # Slightly greater than the samples'
> diversity
>
>
# Create a list with all the data
birds <- mget(ls(pattern = "^bird"))
> round(exp(sapply(birds, Shannon))) # Your data
Hope this helps,
Rui Barradas
>
>
> #-------------------------------------
>
>
> Earlier Karl wrote [1] that
>
>
> Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> that explains the minor numerical differences obtained with the code
> above and the published variances.
>
>
> I don't believe the published variances were computed with the published
> variance estimator. The code below computes the variances like QSutils
> and with formula (4) in Hutcheson's paper. The latter does not give the
> same results.
>
> var_est <- function(n){
> s <- length(n)
> N <- sum(n)
> p <- n/N
> i <- p != 0
> inv.p <- numeric(s)
> inv.p[i] <- 1/p[i]
> log.p <- numeric(s)
> log.p[i] <- log(p[i])
> #
> term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> term2 <- (s - 1)/(2*N^2)
> #
> numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p *
> log.p)
> denom3 <- 6*N^3
> term3 <- numer3/denom3
> list(
> Bioc = term1 + term2,
> Hutch = term1 + term2 + term3
> )
> }
>
> Vh1 <- var_est(earlier)
> Vh1
> all.equal(ShannonVar(earlier), Vh1$Bioc)
> ShannonVar(earlier) - Vh1$Bioc # FAQ 7.31
>
> Vh2 <- var_est(later)
> Vh2
> identical(ShannonVar(later), Vh2$Bioc) # TRUE
>
>
>
> [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
>> Update:
>> I can see that you used the function Shannon from the package QSutils.
>> This would supplement the iNext package I used and solve the problem.
>> Thank you.
>>
>> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
>> <marongiu.luigi using gmail.com> wrote:
>>>
>>> Thank you very much for the code, that was very helpful.
>>> I got the article by Hutcheson -- I don't know if I can distribute it
>>> , given the possible copyrights, or if I can attach it here -- but it
>>> does not report numbers directly: it refers to a previous article
>>> counting bird death on a telegraph each year. The numbers
>>> are:
>>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
>>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
>>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
>>> bird_1959 <- c(0,0,14,59,26,68,0)
>>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>>>
>>> This for sake of the argument.
>>> As for my problem, I implemented the Shannon index with the package
>>> iNext, which only gives me the index itself and the 95% CI. Even when
>>> I implemented it with vegan, I only got the index. Essentially I don't
>>> have a count of species I could feed into the Hutcheson's. Is there a
>>> way to extract these data? Or to run a Hutcheson's on the final index?
>>> Thank you
>>>
>>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
>>> <karl.schilling using uni-bonn.de> wrote:
>>>>
>>>> Dear Luigi,
>>>>
>>>> below some code I cobbled together based on the Hutcheson paper you
>>>> mentioned. I was lucky to find code to calculate h and, importantly,
>>>> its
>>>> variance in the R-package QSutils - you may find it on the Bioconductor
>>>> website.
>>>>
>>>> here is the code, along with an example. I also attach the code as an
>>>> R-file.
>>>>
>>>> Hope that helps.
>>>> All my best
>>>>
>>>> Karl
>>>> PS don't forget to adjust for multiple testing if you compare more than
>>>> two groups.
>>>> K
>>>>
>>>>
>>>> # load package needed
>>>> # QSutils is on Bioconductor
>>>> library(QSutils)
>>>>
>>>> # here some exemplary data - these are the data from Pilou 1966 that
>>>> are
>>>> used
>>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
>>>>
>>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
>>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
>>>> # numbers of the first example used by Hutcheson were unfortunately not
>>>> # available to me
>>>>
>>>> # here starts the code ; you may replace the variables "earlier" and
>>>> "later"
>>>> # by your own numbers.
>>>>
>>>> # calculate h, var(h) etc
>>>> h1 <- Shannon(earlier)
>>>> varh1 <- ShannonVar(earlier)
>>>> n1 <- sum (earlier)
>>>> h2 <- Shannon(later)
>>>> varh2 <- ShannonVar(later)
>>>> n2 <- sum (later)
>>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
>>>>
>>>> # compare numbers with those in the paper
>>>> h1
>>>> h2
>>>> varh1
>>>> varh2
>>>> # I assume that minor numerical differences are due to differences
>>>> in the
>>>> # numerical precision of computers in the early seventies and today
>>>> / KS
>>>>
>>>> # this is the actual t-test
>>>> t <- (h1-h2) /sqrt(varh1 + varh2)
>>>> p <- 2*pt(-abs(t),df= degfree)
>>>> p
>>>>
>>>> # that's it
>>>> # Best
>>>> # Karl
>>>> --
>>>> Karl Schilling, MD
>>>> Professor of Anatomy and Cell Biology
>>>> Anatomisches Institut
>>>> Rheinische Friedrich-Wilhelms-Universität
>>>> Nussallee 10
>>>>
>>>> D-53115 Bonn
>>>> Germany
>>>>
>>>> phone ++49-228-73-2602
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Luigi
>>
>>
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list