[R] Problem with data conversion
arinbasu@softhome.net
arinbasu at softhome.net
Sun Dec 14 13:19:47 CET 2003
Hi All:
I came across the following problem while working with a dataset, and
wondered if there could be a solution I sought here.
My dataset consists of information on 402 individuals with the followng five
variables (age,sex, status = a binary variable with levels "case" or
"control", mma, dma).
During data check, I found that in the raw data, the data entry operator had
mistakenly put a "0" for one participant, so now, the levels show
> levels(status)
[1] "0" "control" "case"
The variables mma, and dma are actually numerical variables but in the
dataframe, they are represented as "characters". I tried to change the type
of the variables (from character to numeric) using the edit function (and
bringing up the data grid where then I made changes), but the changes were
not saved. I tried
mma1 <- as.numeric(mma)
but I was not successful in converting mma from a character variable to a
numeric variable.
So, to edit and "clean" the data, I exported the dataset as a text file to
Epi Info 2002 (version 2, Windows). I used the following code:
mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA)
After I made changes in the variables using Epi Info (I created a new
variable called "statusrec" containing values "case" and "control"), I
exported the file as a ".rec" file (filename "mydata.rec"). I used the
following code to read the file in R:
require(foreign)
myData <- read.epiinfo("mydata.rec", read.deleted=NA)
Now, the problem is this, when I want to run a logistic regression, R
returns the following error message:
> glm(statusrec~mma, family=binomial(link=logit))
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames, :
invalid variable type
I cannot figure out the solution. I want to run a logistic regression now
with the variable statusrec (which is a binary variable containing values
"case" and "control"), and another
variable (say mma, which is now a numeric variable). What does the above
error message mean and what could be a possible solution?
Would greatly appreciate your insights and wisdom.
-Arin Basu
More information about the R-help
mailing list