[R] Multiple-Response Analysis: Cleaning of Duplicate Codes

Tue Apr 25 18:36:31 CEST 2017

Hi All,

in my current project I am working with multiple-response questions 
(MRSets):

-- Coding --
100 Main Code 1
110 Sub Code 1.1
120 Sub Code 1.2
130 Sub Code 1.3

200 Main Code 2
210 Sub Code 2.1
220 Sub Code 2.2
230 Sub Code 2.3

300 Main Code 3
310 Sub Code 3.1
320 Sub Code 3.2

The coding for the variables is to detailed. Therefore I have recoded all 
sub codes to the respective main code, e.g. all 110, 120 and 130 to 100, 
all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.

Now it happens that some respondents get several times the same main code. 
If the coding was done for respondent 1 with 120 and 130 after recoding 
the values are 100 and 100. If I count this, it would mean that I weight 
the multiple values of this respondent by factor 2. This is not my aim. I 
would like to count the 100 for the respective respondent only once.

Here is my script so far:

# -- cut --

library(expss)

d_sample <-
  structure(
    list(
      c05_01 = c(
        110,
        110,
        130,
        110,
        110,
        110,
        110,
        110,
        110,
        110,
        110,
        999,
        110,
        495,
        160,
        110,
        410
      ),
      c05_02 = c(NA,
                 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 
170,
                 NA, 130),
      c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
                 NA, NA, NA, NA, NA, NA, NA),
      c05_04 = c(
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_
      ),
      c05_05 = c(
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_
      )
    ),
    .Names = c("c05_01",
               "c05_02", "c05_03", "c05_04", "c05_05"),
    row.names = c(
      "1",
      "2",
      "3",
      "4",
      "5",
      "10",
      "11",
      "12",
      "13",
      "14",
      "15",
      "20",
      "21",
      "22",
      "23",
      "24",
      "25"
    ),
    class = "data.frame"
  )

c05_xx_r01 <- d_sample %>%
  select(starts_with("c05_")) %>%
  recode(c(
    110 %thru% 195 ~ 100,
    210 %thru% 295 ~ 200,
    310 %thru% 395 ~ 300,
    410 %thru% 495 ~ 400,
    510 %thru% 595 ~ 500,
    810 %thru% 895 ~ 800,
    910 %thru% 999 ~ 900))
names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01")
d_sample <- cbind(d_sample, c05_xx_r01)

# -- cut --

I would like to eliminate all duplicates codes, e. g. 100 and 100 for 
respondents in row 3, 6, 13, 14 and 15 to 100 only once:

# -- cut --
d_sample_1 <-
  structure(
    list(
      c05_01 = c(
        110,
        110,
        130,
        110,
        110,
        110,
        110,
        110,
        110,
        110,
        110,
        999,
        110,
        495,
        160,
        110,
        410
      ),
      c05_02 = c(NA,
                 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 
170,
                 NA, 130),
      c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
                 NA, NA, NA, NA, NA, NA, NA),
      c05_04 = c(
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_
      ),
      c05_05 = c(
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_,
        NA_real_
      ),
      c05_01_r01 = c(
        100,
        100,
        100,
        100,
        100,
        100,
        100,
        100,
        100,
        100,
        100,
        900,
        100,
        400,
        100,
        100,
        400
      ),
      c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, 100),
      c05_03_r01 = c(NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA, 
NA, NA),
      c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA),
      c05_05_r01 = c(NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
    ),
    .Names = c(
      "c05_01",
      "c05_02",
      "c05_03",
      "c05_04",
      "c05_05",
      "c05_01_r01",
      "c05_02_r01",
      "c05_03_r01",
      "c05_04_r01",
      "c05_05_r01"
    ),
    row.names = c(
      "1",
      "2",
      "3",
      "4",
      "5",
      "10",
      "11",
      "12",
      "13",
      "14",
      "15",
      "20",
      "21",
      "22",
      "23",
      "24",
      "25"
    ),
    class = "data.frame"
  )

# -- cut --

How could I achieve this?

Kind regards

Georg