[R-sig-eco] (no subject)

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon Oct 17 15:59:08 CEST 2011

On Mon, 2011-10-17 at 15:33 +0200, Kerstin Kober wrote:
> Hi,
> This is a question from an R-newby! 
> I’ve got a very extensive data set and will need to run a large number of Mann-Whitney U tests (in R: wilcox.test) between several sets of data. I am trying to automate this as far as possible so I won’t need to run each test one by one. I asked already somebody for help which provided a short piece of code which enables me to run several tests one after another. However, I will need to re-adjust this code as there are missing values and it stops every time it encounters this.
> The data (I’ve attached a small file with some example data, so you can have an idea how it looks like and run code on it):
> We’ve got bird density data from various locations, collected over the course of several years. One location (“0”) is our control, which is supposed to be tested against each of the other locations (“1” – “3”). We would like to run this test only on data collected during the same year, so we would test “0” vs “1” in 1980, “0” vs “1” in 1981 etc.
> This is the code I’ve got so far and which works fine as long as there exists data from all years in all locations:
> for(i in 1980:1983){
>     for(j in 1:3){ 
>         tmp<- d[d$year == i & d$hotspot %in% c(0, j), ]
>         print(c(i, j))
>         print(wilcox.test(densities~factor(hotspot), data= tmp))
>     }
> }

Two options (not immediately clear where the missingness issue arises):

1) From ?wilcox.test we note that the formula method accepts (as is
common to most R functions that employ a formula interface) an na.action


print(wilcox.test(densities~factor(hotspot), data= tmp, 
                  na.action = na.omit))

for example. See ?na.omit for details and other possible options. But
this should have defaulted to na.omit. What does:

> getOption("na.action")
[1] "na.omit"

return for you?

2) If the problem is that the `tmp` you are creating contains no rows or
certain locations are missing, then wrap the `print(wilcox.test(....))`
in `try(....)`. This will try to evaluate the function and catch any
errors, allowing your loop to continue should an error happen.

Having given you the rope, duty demands that I at least mention that
this seems a strange, if not downright dangerous, thing to be doing. Not
least because you are not doing any adjustment of p-values. I most
certainly wouldn't believe that the p-values printed in the output have
their usual meaning. Especially if the number of tests performed is, as
you say, large.

If you save each of the wilcox.test objects you could grab the p-values
later and adjust them: ?p.adjust



> Now, the problem is, that I don’t have data from all years in all locations: during some years some locations are missing and some years are missing entirely.
> I tried to find my way through the R help files, but because I am not quite sure where exactly I would need to insert information about the dealing with missing values (is the problem with the for-loop or in the wilcox.test, possibly in both???), I am not entirely sure how to do this.
> If anybody has an idea how to do this, please let me know.
> Thank you very much for your help!!!
> Kerstin
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk

More information about the R-sig-ecology mailing list