[R] the "union" of several data frame rows

Scot W. McNary smcnary at charm.net
Fri Feb 8 23:23:06 CET 2008


Hi,

Thanks to Henrique Dallazuanna, Erik Iverson, Mark Leeds, and J. Scott 
Olson for pointing me down the path of joy.  I finally figured out a 
solution to the problem:

Given the following list of partially overlapping test keys, a data 
frame called keys1:

   ID   X1   X2   X3   X4   X5   X6   X7   X8   X9  X10  X11  X12  X13  
X14  X15
A KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>
B KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>
C KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>
D KEY    D    C    D    A    B    D    D    D    A    D    D    D    
A    C    C
E KEY    D    C    D    A    B    D    D    D    A    D    D    D    
A    C    C
F KEY    D    C    D <NA>    B    D <NA> <NA> <NA>    D <NA> <NA> <NA> 
<NA> <NA>
G KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>
H KEY    D    C    D    A    B    D    D    D    A    D    D    D    
A    C    C
I KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>
J KEY    D    C    D    A    B <NA> <NA> <NA> <NA> <NA>    D    D    
A    C    C
K KEY    D    C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 
<NA> <NA>
L KEY    D    C    D <NA>    B    D <NA> <NA> <NA>    D <NA> <NA> <NA> 
<NA> <NA>
M KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>
N KEY    D <NA>    D    A <NA>    D    D    D    A <NA> <NA> <NA> <NA> 
<NA> <NA>

The goal was to wind up with a common test key:

Common Key  D  C  D  A  B  D  D  D  A  D  D  D  A  C  C

What worked was the following:

ck <- for (i in 1:dim(keys1)[1]) {keys1[1, is.na(keys1[1,])] <- 
keys1[i+1, is.na(keys1[1,])]}

I neglected to mention in my first example that there were <NA> 
observations, which may have affected the kinds of solutions that were 
suggested.  Chalk up another testimonial in favor providing a small 
workable examples when asking for help.

Thanks very much,

Scot


Henrique Dallazuanna wrote:
> Perhaps:
>
> data <- data.frame(key, row.names=1)
> names(data) <- paste("q", 1:6, sep="")
> apply(data, 2, function(x)unique(x)[unique(x) != " "])
>
>
> On 01/02/2008, Scot W. McNary <smcnary at charm.net> wrote:
>   
>> Hi,
>>
>> I have a question about how to obtain the union of several data frame
>> rows.  I'm trying to create a common key for several tests composed of
>> different items.   Here is a small scale version of the problem.  These
>> are keys for 4 different tests, not all mutually exclusive:
>>
>> id q1 q2 q3 q4 q5 q6
>> 1  A  C
>> 2              B  D
>> 3  A     D  B
>> 4     C  D     B  D
>>
>> I would like to create a single key all test versions, the "union" of
>> the above:
>>
>> id   q1 q2 q3 q4 q5 q6
>> key  A  C  D  B  B  D
>>
>>
>> Here is what I have (unsuccessfully) tried so far:
>>
>>  > key <-
>> +   matrix(c("1", "A", "C", " ", " ", " ", " ",
>> +          "2", " ", " ", " ", " ", "B", "D",
>> +          "3", "A", " ", "D", "B", " ", " ",
>> +          "4", " ", "C", "D", " ", "B", "D"),
>> +        byrow=TRUE, ncol = 7)
>>  >
>>  > k1 <- key[1, 2:7]
>>  > k2 <- key[2, 2:7]
>>  > k3 <- key[3, 2:7]
>>  > k4 <- key[4, 2:7]
>>  >
>>  > itemid <- c("q1", "q2", "q3", "q4", "q5", "q6")
>>  >
>>  > k1 <- cbind(itemid, k1)
>>  > k2 <- cbind(itemid, k2)
>>  > k3 <- cbind(itemid, k3)
>>  > k4 <- cbind(itemid, k4)
>>  >
>>  > tmp <- merge(k1, k2, by = "itemid")
>>  > tmp <- merge(tmp, k3, by = "itemid")
>>  > tmp <- merge(tmp, k4, by = "itemid")
>>  >
>>  > t(tmp)
>>        [,1] [,2] [,3] [,4] [,5] [,6]
>> itemid "q1" "q2" "q3" "q4" "q5" "q6"
>> k1     "A"  "C"  " "  " "  " "  " "
>> k2     " "  " "  " "  " "  "B"  "D"
>> k3     "A"  " "  "D"  "B"  " "  " "
>> k4     " "  "C"  "D"  " "  "B"  "D"
>>
>> The actual problem involves 300 or so items instead of 6 and 10
>> different keys instead of four.  Any suggestions welcome.
>>
>> Thanks in advance,
>>
>> Scot McNary
>>
>>  > version
>>                _
>> platform       i386-pc-mingw32
>> arch           i386
>> os             mingw32
>> system         i386, mingw32
>> status
>> major          2
>> minor          6.1
>> year           2007
>> month          11
>> day            26
>> svn rev        43537
>> language       R
>> version.string R version 2.6.1 (2007-11-26)
>>
>>
>> --
>> Scot McNary
>> smcnary at charm dot net
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>     
>
>
>   

-- 
Scot McNary
smcnary at charm dot net



More information about the R-help mailing list