[R] Problem with data conversion

Paul E. Johnson pauljohn at ku.edu
Sun Dec 14 10:29:39 CET 2003


I sympathize with your trouble bringing in data, but you need to catch 
your breath and figure out what you really have.  I think when you get a 
bit more R practice, you will be able to manage what you bring in 
without going back to that editor so much.

I feel certain your data is not what you think it is.  Here's an example 
where a factor DOES work on the lhs of a glm:

 > y <- factor(c("S","N","S","N","S","N","S","N"))
 > x <- rnorm(8)
 > glm(y~x,family=binomial(link=logit))

Look here: the system knows y is a factor:
 > attributes(y)
$levels
[1] "N" "S"

$class
[1] "factor"

My guess is that your variables are not really factors, but rather 
character vectors.  You have to convert them into factors.
Watch the error I get is the same that you got.

 > y <- c("S","N","S","N","S","N","S","N")
 > glm(y~x,family=binomial(link=logit))
Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  :
        invalid variable type

Note the system doesn't know y is "supposed" to be a factor. It just 
sees characters.

 > y
[1] "S" "N" "S" "N" "S" "N" "S" "N"
 > levels(y)
NULL
 > attributes(y)
NULL

but look:
 > glm(as.factor(y)~x,family=binomial(link=logit))



arinbasu at softhome.net wrote:

> Hi All:
> I came across the following problem while working with a dataset, and 
> wondered if there could be a solution I sought here.
>
> My dataset consists of information on 402 individuals with the 
> followng five variables (age,sex, status = a binary variable with 
> levels "case" or "control", mma, dma).
> During data check, I found that in the raw data, the data entry 
> operator had mistakenly put a "0" for one participant, so now, the 
> levels show
>
>> levels(status) 
>
> [1] "0" "control" "case"
> The variables mma, and dma are actually numerical variables but in the 
> dataframe, they are represented as "characters". I tried to change the 
> type of the variables (from character to numeric) using the edit 
> function (and bringing up the data grid where then I made changes), 
> but the changes were not saved. I tried
> mma1 <- as.numeric(mma)
> but I was not successful in converting mma from a character variable 
> to a numeric variable.
> So, to edit and "clean" the data, I exported the dataset as a text 
> file to Epi Info 2002 (version 2, Windows). I used the following code:
> mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
> write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA)
> After I made changes in the variables using Epi Info (I created a new 
> variable called "statusrec" containing values "case" and "control"), I 
> exported the file as a ".rec" file (filename "mydata.rec"). I used the 
> following code to read the file in R:
> require(foreign)
> myData <- read.epiinfo("mydata.rec", read.deleted=NA)
> Now, the problem is this, when I want to run a logistic regression, R 
> returns the following error message:
>
>> glm(statusrec~mma, family=binomial(link=logit))
>
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>       invalid variable type
>
> I cannot figure out the solution. I want to run a logistic regression 
> now with the variable statusrec (which is a binary variable containing 
> values "case" and "control"), and another
> variable (say mma, which is now a numeric variable). What does the 
> above error message mean and what could be a possible solution?
> Would greatly appreciate your insights and wisdom.
> -Arin Basu
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help



-- 
Paul E. Johnson                       email: pauljohn at ukans.edu
Dept. of Political Science            http://lark.cc.ukans.edu/~pauljohn
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66045                FAX: (785) 864-5700




More information about the R-help mailing list