[R] mlogit and weights

Thu Jun 3 12:54:54 CEST 2010

On Wed, 2 Jun 2010, Misha Spisok wrote:

> Hello,
>
> I can't figure out why using and not using weights in mlogit yields
> identical results.  My motivation is for the case when an
> "observation" or "individual" represents a number of individuals.  For
> example,
>
> library(mlogit)
> library(AER)
> data("TravelMode", package = "AER")
> TM <- mlogit.data(TravelMode, choice = "choice", shape = "long",
>                 alt.levels = c("air", "train", "bus", "car"))
> myweight = rep(floor(1000*runif(nrow(TravelMode)/4)), each = 4)
>
> summary(mlogit(choice ~ wait + vcost + travel + gcost, data=TM))
> summary(mlogit(choice ~ wait + vcost + travel + gcost, weights=income, data=TM))
> summary(mlogit(choice ~ wait + vcost + travel + gcost,
> weights=myweight, data=TM))
>
> Each gives the same result.

I can't replicate that. For me all three give different results. For 
example, the first two (which do not contain random elements) are

    alttrain      altbus      altcar        wait       vcost      travel
-0.84413818 -1.44150828 -5.20474275 -0.10364955 -0.08493182 -0.01333220
       gcost
  0.06929537

and

    alttrain      altbus      altcar        wait       vcost      travel
-1.56910793 -1.67020936 -5.44725428 -0.11157800 -0.08866886 -0.01435371
       gcost
  0.08087749

respectively. I'm using the current "mlogit" version from CRAN: 0.1-7.

> Am I specifying "weights" incorrectly?

Yes, I think so.

> Is there a better way to do what I want to do?  That is, if "myweight"
> contains the number of observations represented by an "observation,"
> is this the correct approach?

You will get the correct parameter estimates but not the correct 
inference. Following most of the basic model fitting function (such as 
lm() or glm()), the weights are _not_ interpreted as case weights. I.e., 
the function treats
   length(weights > 0)
as the number of observations and not
   sum(weights)

A simple example using lm():

   x <- 1:5
   y <- c(0, 2, 1, 4, 5)
   w <- rep(2, 5)
   xx <- c(x, x)
   yy <- c(y, y)

Then you can fit both models

   fm1 <- lm(y ~ x, weights = w)
   fm2 <- lm(yy ~ xx)

and you get the same coefficients

   all.equal(coef(fm1), coef(fm2))

(which only mentions that the strings 'xx' and 'x' are different.) But fm1 
thinks 2 parameters have been estimated from 5 observations while the 
latter thinks 2 parameters have been estimated from 10 observations. Hence

   df.residual(fm1) / df.residual(fm2)
   vcov(fm2) / vcov(fm1)

Hope that helps,
Z

> If so, what am I doing wrong?  If not,
> what suggestions are there?
>
> Thank you for your time.
>
> Best,
>
> Misha
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>