# [R] Case weighting

David Winsemius dwinsemius at comcast.net
Thu Feb 23 21:40:04 CET 2012

```On Feb 23, 2012, at 3:27 PM, Hed Bar-Nissan wrote:

> It's really weighting - it's just that my simplified example was too
> simplified
> Here is my real weight vector:
> > sc\$W_FSCHWT
>   [1]  14.8579  61.9528   3.0420   2.9929   5.1239  14.7507
> 2.7535   2.2693   3.6658   8.6179   2.5926   2.5390   1.7354
> 2.9767   9.0477   2.6589   3.4040   3.0519
> ....

You should always convey the necessary complexity of the problem.
>
>
> And still it should somehow set the case weight.
> I could multiply all by 10000 and use maybe your method but it would
> create such a bloated dataframe
>
> working with numeric only i could probably create weighted means
>
> But something simple as WEIGHTED BY would be nice.

The survey package by Thomas Lumley provides for a wide variety of
weighted analyses.

--
David.
>
> tnx
> Hed
>
>
>
>
>
> On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>
> On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote:
>
> The need comes from the PISA data. (http://www.pisa.oecd.org)
>
> In the data there are many cases and each of them carries a numeric
> variable that signifies it's weight.
> In SPSS the command would be "WEIGHT BY"
>
> In simpler words here is an R sample ( What is get  VS  what i want
> to get )
>
>
> data.recieved <- data.frame(
> + kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",
> "No")),
> + weight=c(10, 1, 1, 1)
> + );
> data.recieved;
>  kindergarten_attendance weight
> 1                      No     10
> 2                     Yes      1
> 3                     Yes      1
> 4                     Yes      1
>
>
>
> data.weighted <- data.frame(
> + kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,1,1,1),
> labels =
> c("Yes", "No")) );
>
> You want "case repetition" not case weighting, which I would use as
> a term when working on estimation problems:
>
> >  ( data.weighted <- unlist(sapply(1:NROW(data.recieved),
> function(x) rep(data.recieved[x,1], times=data.recieved[x,2] ))  ) )
>  [1] No  No  No  No  No  No  No  No  No  No  Yes Yes Yes
> Levels: Yes No
>
>
>
>
> par(mfrow=c(1,2));
> plot(data.recieved\$kindergarten_attendance,main="What i get");
> plot(data.weighted\$kindergarten_attendance,main="What i want to get");
>
> Seems to work with the factor vector, although I didn't replicate
> dataframe rows, but I guess you could.
>
>
>
> Hed
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT

```