[R] error in rowSums:'x' must be numeric
Jari Oksanen
jarioksa at sun3.oulu.fi
Fri Nov 11 08:48:25 CET 2005
On Thu, 2005-11-10 at 16:49 +0100, Illyes Eszter wrote:
> Dear All,
>
> It's Eszter again from Hungary. I could not solve my problem form
> yesterday, so I still have to ask your help.
>
> I have a binary dataset of vegetation samples and species as a comma
> separated file. I would like to calculate the Jaccard distance of the
> dataset. I have the following error message:
>
> Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric
> In addition: Warning message:
> results may be meaningless because input data have negative entries
> in: vegdist(t2, method = "jaccard", binary = FALSE, diag = FALSE,
>
> Do you have any idea what can be the problem? I have only 0 and 1 in
> the dataset.
>
Eszter,
An old truth is that if The Computer is always right and you are wrong
when The Computer says that you have non-numeric data. Check your data
first. An obvious way of checking this is to repeat the command that
found the problem: rowSums(t2). After that (probably) reports the same
error, you can check your data sayin, e.g., str(t2) which displays you
the variables in a very compact form.
Now some wild speculation. When you read your data as comma separated
file, very often the column names are taken as the first variable. Check
this and remove the first column if needed. This is so common that I've
even thought that I perhaps need to write a sanitizing function for cvs
files to do the following:
rownames(x) <- x[,1]
x <- x[,-1]
or to take the first column as rownames and then remove the non-numeric
first column.
A pertinent problem in R communication with cvs files is that R assumes
that with header=TRUE the header line has one entry less than data rows.
However, popular software (read Excel) refuses to write data so: even if
you make the first column empty in your header line, the popular
software adds a comma before the first intended entry, and so you have
the same number of entries in the header line and in the data. The
result is that the column that intended as rownames is taken as a
non-numeric variable.
This is such a common problem that an innocent user (not me: I'm no more
innocent) would expect R cope with that kind of input format.
cheers, jari oksanen
--
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
More information about the R-help
mailing list