[R] subsetting problem with multiple criteria: Works in some but not all cases.

John Kane jrkrideau at yahoo.ca
Thu Nov 1 17:44:07 CET 2007


I am trying to compare some word lists which have an
associate set of numbers. I want to compare word list
aa with bb and find only those words which are
unique to bb, then compare bb with cc, etc.

I thought that I should be able to do this by using
setdiff to get the unique words and then subset the
data frame to get the unique names and corresponding
numbers but I am misunderstanding something.

When I run the code below a) I get lots of warning and
b) I get the correct results for 4 of the 5
comparisons. However the comparison of  three with
four (cc,dd) gives me an empty subset.

Can anyone point out my error or suggest a better way
to do this?
Thanks

============================================================================

mydata  = data.frame(aa = Cs(cat, dog, horse, cow),
bb = c("mouse", "dog", "cow", "pigeon"),
cc  =c("emu", "rat", "crow", "cow"),
dd = c("cow", "camel", "manatee", "parrot") ,
ee = c( "coat", "hat", "dog", "camel") ,
ff = c("knife","dog", "cow", "pigeon"),
ann = c(1,2,3,4),
bnn = c(5,6,7,8),
cnn = c(9,10,11,12),
dnn = c(13,14,15,16),
enn = c(17,18,19,20),
fnn = c(21,22,23,24))

wordnames <- c("word", "number")
word.list  <- rep(vector("list", 1), 5)

for(j in 1:5) {
lone.word <- setdiff(mydata[,j+1],mydata[,j]);
lone.word
matching <- subset(mydata[,c(j+1,j+7)],
mydata[,j+1]==lone.word); matching
word.list[[j]] <- matching; names(word.list[[j]])<-
wordnames
}
word.list

=============================================================================
R version 2.6.0 (2007-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
methods   base

other attached packages:
[1] Hmisc_3.4-2 gdata_2.3.1

loaded via a namespace (and not attached):
[1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0  
lattice_0.17-1


R version 2.6.0 (2007-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
methods   base

other attached packages:
[1] Hmisc_3.4-2 gdata_2.3.1

loaded via a namespace (and not attached):
[1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0  
lattice_0.17-1



More information about the R-help mailing list