[R] Index and dummy

Douglas Bates bates at stat.wisc.edu
Mon Jul 20 09:20:12 CEST 2009


On Sun, Jul 19, 2009 at 11:32 PM, Marujo A.<A.Marujo at soton.ac.uk> wrote:
> Dear R-helpers

> I have 2 variables
> x1=rgamma(6000, 2, 1) and x2=rgamma(6000, 3,2). I have to sort (descending) each one and split it into groups. After this each two groups must be merged into one until all population becomes one group. A dummy vector must be created for each group (8, 4, 2, 1) being equal to 1 if the individual (i) belongs to the group and equal to 0, otherwise.

if I understand correctly you want to create one factor with 8 levels,
one factor with 4 levels and one factor with 2 levels based on equal
divisions of the sorted x1 values.  If so, it is advantageous to use
the "whole object" approach in R.  I would suggest creating a data
frame with the values of x1 and x2 then sorting the rows in descending
order of x1 then adding the factors, which can easily be defined with
the gl() function.  On a small example it looks like

> df <- data.frame(x1 = rgamma(20, 2, 1), x2 = rgamma(20, 3, 2))
> df <- df[rev(order(df$x1)), ]
> df$g4 <- gl(4, 5)
> df$g2 <- gl(2, 10)
> df
          x1        x2 g4 g2
17 3.2050060 1.1395147  1  1
14 2.8422283 2.4612637  1  1
2  2.4286087 2.1572067  1  1
16 2.4108377 1.1360309  1  1
20 2.0954746 1.2974074  1  1
12 2.0641932 1.2820681  2  1
18 1.9857902 1.9888521  2  1
1  1.9394710 1.7363564  2  1
7  1.8907038 1.6302374  2  1
10 1.6421862 1.7538054  2  1
11 1.3926248 1.3363230  3  2
13 1.3590006 0.4226191  3  2
6  1.3172306 2.8610896  3  2
4  1.2888751 2.0672638  3  2
5  1.1358279 1.5365895  3  2
15 1.1017541 2.3689916  4  2
19 0.7358496 1.6427665  4  2
9  0.5669082 0.2964689  4  2
3  0.5657076 0.9320564  4  2
8  0.3211136 0.5938290  4  2


> What I have done was:
> id=(6000)
> x1sort=sort(x1, decreasing=TRUE)
> x1g8_1=x1sort[1:750]
> x1g8_2=x1sort[751:1500]
> x1g8_3=x1sort[1501:2250]
> x1g8_4=x1sort[2251:3000]
> x1g8_5=x1sort[3001:3750]
> x1g8_6=x1sort[3751:4500]
> x1g8_7=x1sort[4501:5250]
> x1g8_8=x1sort[5251:6000]
>
> x1g4_1=c(x1g8_1, x1g8_2)
> x1g4_2=c(x1g8_3, x1g8_4)
> x1g4_3=c(x1g8_5, x1g8_6)
> x1g4_4=c(x1g8_7, x1g8_8)
>
> x1g2_1=c(x1g4_1, x1g4_2)
> x1g2_2=c(x1g4_3, x1g4_4)
>
> x1ng=c(x1g2_1, x1g2_2)
>
> After this I did the dummy vector (the example is for group4)
>
> dum= replace(matrix(0, 4, 1), cbind(4, 1), 0)                           # matrix of zeros
> dummy=lapply(1:4, function(i) replace(dum, cbind(i), 1))        # 4 dummy vectors
> s=split(dummy, 1:4)
> ss=rename.vars(s, c("1", "2", "3", "4"), c("dx14_1", "dx14_2", "dx14_3", "dx14_4"))
>
> The problem is when I split into groups each group only identifies 750 individuals(in the case of x1g8 for instance) only assumes i=1, ..., 750 and I need to keep i=1, ...., 6000. Also my option to dummy vectors don't seem to work because I get 4 vectors with the number one (1) in each different variable and not only one.
>
> So, I need some help on how should I make to keep i=1, ..., 6000 and how to create a dummy vector that assumes only the value one (1) when some I belongs to some group.
>
> Thank you so much.
> Ana
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list