[R] Problem with data conversion

arinbasu@softhome.net arinbasu at softhome.net
Sun Dec 14 13:19:47 CET 2003


Hi All: 

I came across the following problem while working with a dataset, and 
wondered if there could be a solution I sought here. 


My dataset consists of information on 402 individuals with the followng five 
variables (age,sex, status = a binary variable with levels "case" or 
"control", mma, dma). 

During data check, I found that in the raw data, the data entry operator had 
mistakenly put a "0" for one participant, so now, the levels show 

> levels(status) 
[1] "0" "control" "case" 

The variables mma, and dma are actually numerical variables but in the 
dataframe, they are represented as "characters". I tried to change the type 
of the variables (from character to numeric) using the edit function (and 
bringing up the data grid where then I made changes), but the changes were 
not saved. I tried 

mma1 <- as.numeric(mma) 

but I was not successful in converting mma from a character variable to a 
numeric variable. 

So, to edit and "clean" the data, I exported the dataset as a text file to 
Epi Info 2002 (version 2, Windows). I used the following code: 

mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA) 

After I made changes in the variables using Epi Info (I created a new 
variable called "statusrec" containing values "case" and "control"), I 
exported the file as a ".rec" file (filename "mydata.rec"). I used the 
following code to read the file in R: 

require(foreign)
myData <- read.epiinfo("mydata.rec", read.deleted=NA) 

Now, the problem is this, when I want to run a logistic regression, R 
returns the following error message: 

> glm(statusrec~mma, family=binomial(link=logit))
Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  :
       invalid variable type 


I cannot figure out the solution. I want to run a logistic regression now 
with the variable statusrec (which is a binary variable containing values 
"case" and "control"), and another
variable (say mma, which is now a numeric variable). What does the above 
error message mean and what could be a possible solution? 

Would greatly appreciate your insights and wisdom. 

 -Arin Basu




More information about the R-help mailing list