[R-sig-eco] RDA Permutations

Thu Apr 5 18:33:52 CEST 2018

Hi

In the process of developing a multivariate data analysis toolkit for Python (humbly said, a R-vegan little reptilian sister), I'm trying to implement RDA permutations. I could mimic the 'by=axis' option of the anova.cca function thanks to code published in the supplementary material of LOtB (2011). However, I could not generate results similar to the 'by=term' option, which tests the significance of environmental features. Prof. Legendre wrote me that (if I understood correctly) I must perform a linear regression between species and one environmental feature at the time, compute the original F-score, then permute: the proportion of the permuted F-scores which are equal or superior to the original is the p-value of the associated environmental feature. I coded it with R for this exercise. Still, I'm far from results returned by vegan. I would appreciate any input that might help me to code it right.

# Load data from Gist
doubs_env = read.csv('https://gist.githubusercontent.com/essicolo/da940788378316eb180680b25ea11033/raw/6de2262b80f5b46b2b9b79cd05cd541404cae044/DoubsEnv.csv')
doubs_spe = read.csv('https://gist.githubusercontent.com/essicolo/95ed343adf89c715d42c622080011228/raw/e488bb9eaff95243bd3247d1f17c203456033918/DoubsSpe.csv')

# Scale data
Y = scale(as.matrix(doubs_spe[, -1]), center=TRUE, scale=TRUE)
X = scale(as.matrix(doubs_env[, c('pH', 'dur', 'pho', 'nit', 'amm', 'oxy', 'dbo')]), center=TRUE, scale=TRUE)
n = nrow(Y)

# Rank of X
m = qr(X)$rank

# Permutations
nperm = 199
for (i in 1:m) {
  feature_i = X[, i] # isolate the tested feature
  qr_Xi = qr(feature_i) # qr decomposition of vector X
  Y_hat_i = qr.fitted(qr_Xi, Y) # fitted linear regression
  Y_res_i = qr.resid(qr_Xi, Y) # residuals
  rsq_i = sum(Y_hat_i**2) / sum(Y**2) # r-squared
  F_score[i] = (rsq_i/m) / ((1-rsq_i) / (n-m-1)) # original F-score

  F_score_perm = c()
  for (j in 1:nperm) {
    Y_perm = Y_hat_i + Y_res_i[sample(n), ] # reduced permutation model
    Y_hat_perm = qr.fitted(qr_Xi, Y_perm) # fitted linear regression
    Y_res_perm = qr.resid(qr_Xi, Y_perm) # residuals
    rsq_perm = sum(Y_hat_perm**2) / sum(Y_perm**2) # r-squared
    F_score_perm[j] = (rsq_perm/m) / ((1-rsq_perm) / (n-m-1)) # permuted F-score
  }
  p_value[i] = (1 + sum(F_score_perm >= F_score[i])) / (1 + nperm) # proportion of equal or superior including original (which justifies the + 1)
}
p_value

library(vegan)
rda_vegan = rda(Y ~ pH+dur+pho+nit+amm+oxy+dbo,
                data=doubs_env, scale = TRUE)
anova(rda_vegan, by='term', permutations = 199)

Thanks,

Serge-Étienne Parent, ing., Ph.D.
Professeur adjoint
Université Laval,
Département des sols et de génie agroalimentaire,
Faculté des sciences de l'agriculture et de l'alimentation,
Pavillon Paul-Comtois,
2425 rue de l'Agriculture, Local 2301

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
essicolo/nuee

nuee - :bird: Multivariate data analysis toolkit for numerical ecology with Python
 |

 |

 |

	[[alternative HTML version deleted]]