[R] NAs introduced by coercion in dist()

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Wed May 2 17:34:15 CEST 2007


Silvia Lomascolo wrote:
> It was suggested that the 'NAs introduced by coercion' message might be
> warning me that my data are not what they should be.  I checked this using
> str(PeaksMatrix), as suggested, and the data seem to be what I thought they
> were: 
>
> 'data.frame':   335 obs. of  127 variables:
>  $ Code   : Factor w/ 335 levels "A1MR","A1MU",..: 1 2 3 4 5 6 7 8 9 10 ...
>  $ P3.70  : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ P3.97  : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ P4.29  : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ P4.90  : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ P6.30  : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ P6.45  : num  7.73 0 0 0 0 0 4.03 0 0 0 ...
>  $ P6.55  : num  0 0 0 0 0 0 0 0 0 0 ...
>
> ...
>
> I do have 335 observations, 127 variables that are named P3.70, 3.97, P4.29,
> etc..  This was a relief, but I still don't know whether the distance matrix
> is what it should be.  I tried 'str(dist.PxMx)', which is the name of my
> distance matrix, but I get something that has not much meaning to me, an
> unexperienced R user:
>
> Class 'dist'  atomic [1:55945] 329.6 194.9 130.1  70.7 116.9 ...
>   ..- attr(*, "Size")= int 335
>   ..- attr(*, "Labels")= chr [1:335] "1" "2" "3" "4" ...
>   ..- attr(*, "Diag")= logi FALSE
>   ..- attr(*, "Upper")= logi FALSE
>   ..- attr(*, "method")= chr "euclidean"
>   ..- attr(*, "call")= language dist(x = PeaksMatrix, method = "euclidean",
> diag = FALSE, upper = FALSE,      p = 2)
>
> Any more suggestions, please?
>
>
>   
Actually, you seem to have 126 variables plus a factor called "Code",
which has non-numeric levels. I think you probably want to lose that one
before calculating distances.
> Silvia Lomascolo wrote:
>   
>> I work with Windows and use R version 2.4.1. I am JUST starting to learn
>> this program...
>>
>> I get this warning message 'NAs introduced by coercion' while trying to
>> build a distance matrix (to be analyzed with NMDS later) from a 336 x 100
>> data matrix.  The original matrix has lots of zeros and no missing values,
>> but I don't think this should matter.
>>
>> I searched this forum and people have suggested that the warning should be
>> ignored but when I try to print the distance matrix I only get the row
>> numbers (the matrix seems to be 'empty') and I'm not being able to judge
>> whether the matrix worked or not.
>>
>> To get the distance matrix I wrote:
>> dist.PxMx <- dist (PeaksMatrix, method='euclidean', diag=FALSE,
>> upper=FALSE)
>>
>> I tried including the p argument (included in the help for dist()) and
>> leaving it out, but that didn't seem to change anything.  I think that's
>> required for one distance measure though, not for euclidean dist. 
>>
>> Should I really ignore this warning? If so, why am I not being able to see
>> the distance matrix?
>>
>>     
>
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list