[R] Case weighting

Thu Feb 23 21:48:04 CET 2012

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Hed Bar-Nissan
> Sent: Thursday, February 23, 2012 12:27 PM
> To: David Winsemius
> Cc: r-help at r-project.org
> Subject: Re: [R] Case weighting
> 
> It's really weighting - it's just that my simplified example was too
> simplified
> Here is my real weight vector:
> > sc$W_FSCHWT
>   [1]  14.8579  61.9528   3.0420   2.9929   5.1239  14.7507   2.7535
> 2.2693   3.6658   8.6179   2.5926   2.5390   1.7354   2.9767   9.0477
> 2.6589   3.4040   3.0519
> ....
> 
> 
> And still it should somehow set the case weight.
> I could multiply all by 10000 and use maybe your method but it would
> create
> such a bloated dataframe
> 
> working with numeric only i could probably create weighted means
> 
> But something simple as WEIGHTED BY would be nice.
> 
> tnx
> Hed
> 
> 
> 
> 
> 
> On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius
> <dwinsemius at comcast.net>wrote:
> 
> >
> > On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote:
> >
> >  The need comes from the PISA data. (http://www.pisa.oecd.org)
> >>
> >> In the data there are many cases and each of them carries a numeric
> >> variable that signifies it's weight.
> >> In SPSS the command would be "WEIGHT BY"
> >>
> >> In simpler words here is an R sample ( What is get  VS  what i want to
> >> get )
> >>
> >>
> >>  data.recieved <- data.frame(
> >>>
> >> + kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",
> "No")),
> >> + weight=c(10, 1, 1, 1)
> >> + );
> >>
> >>> data.recieved;
> >>>
> >>  kindergarten_attendance weight
> >> 1                      No     10
> >> 2                     Yes      1
> >> 3                     Yes      1
> >> 4                     Yes      1
> >>
> >>>
> >>>
> >>>
> >>> data.weighted <- data.frame(
> >>>
> >> + kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,**1,1,1),
> >> labels =
> >> c("Yes", "No")) );
> >>
> >
> > You want "case repetition" not case weighting, which I would use as a
> term
> > when working on estimation problems:
> >
> > >  ( data.weighted <- unlist(sapply(1:NROW(data.**recieved), function(x)
> > rep(data.recieved[x,1], times=data.recieved[x,2] ))  ) )
> >  [1] No  No  No  No  No  No  No  No  No  No  Yes Yes Yes
> > Levels: Yes No
> >
> >
> >
> >>>
> >>> par(mfrow=c(1,2));
> >>> plot(data.recieved$**kindergarten_attendance,main="**What i get");
> >>> plot(data.weighted$**kindergarten_attendance,main="**What i want to
> >>> get");
> >>>
> >>
> > Seems to work with the factor vector, although I didn't replicate
> > dataframe rows, but I guess you could.
> >
> >

Are these survey sampling weights?  If so, then you need to be using procedures that take the sampling design into account.  Otherwise, your variance estimates are going to be all wrong.

Dan

Daniel Nordlund
Bothell, WA USA