[R] PCA with NA

Birgit Lemcke birgit.lemcke at systbot.uzh.ch
Fri Nov 23 20:02:48 CET 2007


Thanks to all for your help.

Only to complete this:

The NA´s in my case mean that I have no information for this  
character in this species. These are not ecological data, so I have  
to deal somehow with the NA´s without replacing by zero.

I think Thibauts help is very useful.

Thanks a lot

Birgit


Am 23.11.2007 um 17:26 schrieb Thibaut Jombart:

> Birgit Lemcke wrote:
>
>> Dear all,
>> (Mac OS X 10.4.11, R 2.6.0)
>> I have a quantitative dataset with a lot of Na´s in it. So many,  
>> that  it is not possible to delete all rows with NA´s and also  
>> not  possible, to delete all variables with NA´s.
>> Is there a function for a principal component analysis, that can  
>> deal  with so many NA´s.
>>
>> Thanks in advance
>>
>> Birgit
>>
>>
>> Birgit Lemcke
>> Institut für Systematische Botanik
>> Zollikerstrasse 107
>> CH-8008 Zürich
>> Switzerland
>> Ph: +41 (0)44 634 8351
>> birgit.lemcke at systbot.uzh.ch
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
> Hi,
>
> in centred PCA, missing data should be replaced by the mean of  
> available data.
> Let X be your analyzed matrix (variables in columns).
>
> ##
> X = matrix(runif(300),ncol=10)
> idx = sample(1:nrow(X),5)
> X[idx,] = NA
> sum(is.na(X))
> [1] 95
>
> library(ade4)
> dudi.pca(X,center=TRUE,scale=FALSE)
> Erreur dans dudi.pca(X, center = TRUE, scale = FALSE) : na entries  
> in table
> ##
>
> Now we replace missing values :
>
> ##
> f1 <- function(vec) {
>        m <- mean(vec, na.rm = TRUE)
>        vec[is.na(vec)] <- m
>        return(vec)
>    }
>
> Y = apply(X,2,f1)
>
> pcaY = dudi.pca(Y,center=TRUE,scale=FALSE,nf=2,scannf=FALSE)
>
> s.label(pcaY$li)
> sunflowerplot(pcaY$li[idx,1:2], add=TRUE)
> ##
>
> All missing values are placed at the non-informative point, i.e. at  
> the origin.
>
> Regards,
>
> Thibaut.
>
> -- 
> ######################################
> Thibaut JOMBART
> CNRS UMR 5558 - Laboratoire de Biométrie et Biologie Evolutive
> Universite Lyon 1
> 43 bd du 11 novembre 1918
> 69622 Villeurbanne Cedex
> Tél. : 04.72.43.29.35
> Fax : 04.72.43.13.88
> jombart at biomserv.univ-lyon1.fr
> http://lbbe.univ-lyon1.fr/-Jombart-Thibaut-.html?lang=en
> http://pbil.univ-lyon1.fr/software/adegenet/

Birgit Lemcke
Institut für Systematische Botanik
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
birgit.lemcke at systbot.uzh.ch



More information about the R-help mailing list