[R] troubles with logistic regression

Bill.Venables at csiro.au Bill.Venables at csiro.au
Mon Mar 14 03:26:07 CET 2011


It means you have selected a response variable from one data frame (unmarried.male) and a predictor from another data frame (fieder.male) and they have different lengths.  

You might be better off if you used the names in the data frame rather than selecting columns in a form such as 'some.data.frame[, 3]',  This just confuses the issue and makes it very easy to make mistakes - as indeed you have done.

Also, to fit models on subsets of the data, you do not have to create separate data frames.  See the 'subset' argument of glm, which is standard for most fitting functions.  This is also a way to avoid problems and would have helped you here as well.

Bill Venables.
 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of gked
Sent: Monday, 14 March 2011 4:33 AM
To: r-help at r-project.org
Subject: [R] troubles with logistic regression

hello everyone,
I working on the dataset for my project in class and got stuck on trying to
run logistic regression. here is my code:
data <- read.csv(file="C:/Users/fieder.data.2000.csv")

# creating subset of men 
fieder.male<-subset(data,data[,8]==1)
unmarried.male<-subset(data,data[,8]==1&data[,6]==1)

# glm fit
agesq.male<-(unmarried.male[,5])^2
male.sqrtincome<-sqrt(unmarried.male[,9])

fieder.male.mar.glm<-glm(as.factor(unmarried.male[,6])~
 factor(fieder.male[,7])+fieder.male[,5]+agesq.male+
  male.sqrtincome,binomial(link="logit") )
par(mfrow=c(1,1))
plot(c(0,300),c(0,1),pch=" ",
   xlab="sqrt income, truncated at 90000",
   ylab="modeled probability of being never-married")
junk<- lowess(male.sqrtincome,
  log(fieder.male.mar.glm$fitted.values/
  (1-fieder.male.mar.glm$fitted.values)))
  lines(junk$x,exp(junk$y)/(1+exp(junk$y)))
title(main="probability of never marrying\n males, by sqrt(income)")
points(male.sqrtincome[unmarried.male==0],
  fieder.male.mar.glm$fitted.values[unmarried.male==0],pch=16)
points(male.sqrtincome[unmarried.male==1],
  fieder.male.mar.glm$fitted.values[unmarried.male==1],pch=1)

The error says: 
Error in model.frame.default(formula = as.factor(unmarried.male[, 6]) ~  : 
  variable lengths differ (found for 'factor(fieder.male[, 7])')
 
What does it mean? Where am i making a mistake?
Thank you
P.S. i  am also attaching data file in .csv format
http://r.789695.n4.nabble.com/file/n3352356/fieder.data.2000.csv
fieder.data.2000.csv 

--
View this message in context: http://r.789695.n4.nabble.com/troubles-with-logistic-regression-tp3352356p3352356.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list