[R] Problem with data conversion
Paul E. Johnson
pauljohn at ku.edu
Sun Dec 14 10:29:39 CET 2003
I sympathize with your trouble bringing in data, but you need to catch
your breath and figure out what you really have. I think when you get a
bit more R practice, you will be able to manage what you bring in
without going back to that editor so much.
I feel certain your data is not what you think it is. Here's an example
where a factor DOES work on the lhs of a glm:
> y <- factor(c("S","N","S","N","S","N","S","N"))
> x <- rnorm(8)
> glm(y~x,family=binomial(link=logit))
Look here: the system knows y is a factor:
> attributes(y)
$levels
[1] "N" "S"
$class
[1] "factor"
My guess is that your variables are not really factors, but rather
character vectors. You have to convert them into factors.
Watch the error I get is the same that you got.
> y <- c("S","N","S","N","S","N","S","N")
> glm(y~x,family=binomial(link=logit))
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames, :
invalid variable type
Note the system doesn't know y is "supposed" to be a factor. It just
sees characters.
> y
[1] "S" "N" "S" "N" "S" "N" "S" "N"
> levels(y)
NULL
> attributes(y)
NULL
but look:
> glm(as.factor(y)~x,family=binomial(link=logit))
arinbasu at softhome.net wrote:
> Hi All:
> I came across the following problem while working with a dataset, and
> wondered if there could be a solution I sought here.
>
> My dataset consists of information on 402 individuals with the
> followng five variables (age,sex, status = a binary variable with
> levels "case" or "control", mma, dma).
> During data check, I found that in the raw data, the data entry
> operator had mistakenly put a "0" for one participant, so now, the
> levels show
>
>> levels(status)
>
> [1] "0" "control" "case"
> The variables mma, and dma are actually numerical variables but in the
> dataframe, they are represented as "characters". I tried to change the
> type of the variables (from character to numeric) using the edit
> function (and bringing up the data grid where then I made changes),
> but the changes were not saved. I tried
> mma1 <- as.numeric(mma)
> but I was not successful in converting mma from a character variable
> to a numeric variable.
> So, to edit and "clean" the data, I exported the dataset as a text
> file to Epi Info 2002 (version 2, Windows). I used the following code:
> mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
> write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA)
> After I made changes in the variables using Epi Info (I created a new
> variable called "statusrec" containing values "case" and "control"), I
> exported the file as a ".rec" file (filename "mydata.rec"). I used the
> following code to read the file in R:
> require(foreign)
> myData <- read.epiinfo("mydata.rec", read.deleted=NA)
> Now, the problem is this, when I want to run a logistic regression, R
> returns the following error message:
>
>> glm(statusrec~mma, family=binomial(link=logit))
>
> Error in model.frame(formula, rownames, variables, varnames, extras,
> extranames, :
> invalid variable type
>
> I cannot figure out the solution. I want to run a logistic regression
> now with the variable statusrec (which is a binary variable containing
> values "case" and "control"), and another
> variable (say mma, which is now a numeric variable). What does the
> above error message mean and what could be a possible solution?
> Would greatly appreciate your insights and wisdom.
> -Arin Basu
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
--
Paul E. Johnson email: pauljohn at ukans.edu
Dept. of Political Science http://lark.cc.ukans.edu/~pauljohn
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66045 FAX: (785) 864-5700
More information about the R-help
mailing list