[R] Correcting for missing data combinations

Greg Hirson ghirson at ucdavis.edu
Fri Dec 11 22:42:20 CET 2009


One approach would be to use expand.grid to generate all combinations 
and then match against what you have.

A short example:

#generate data - two factors - 4 levels in factor1, 26 levels in factor2
df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T),
     factor2 = sample(letters, 100, replace=T), value = runif(100))

#generate possible combinations
poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters)

#find matches
present <- paste(poss.comb$factor1, poss.comb$factor2 %in% 
paste(df$factor1, df$factor2)

#find possible combinations not in the data
poss.comb[!present, ]

#add 0 as value
zerodata <- cbind(poss.comb[!present, ], value=0)

#and append to data
rbind(df, zerodata)

In place of letters and LETTERS, you could use unique(Factor1) and 
unique(Factor2) from your own data in creating the poss.comb list.

HTH,

Greg


On 12/11/09 10:19 AM, GL wrote:
> I can think of many brute-force ways to do this outside of R, but was
> wondering if there was a simple/elegant solution within R instead.
>
> I have a table that looks something like the following:
>
> Factor1	Factor2		Value
> A	11/11/2009	5
> A	11/12/2009	4
> B	11/11/2009 	7
> B	11/13/2009	8
>
> > From that I need to generate all permutations of Factor1 and Factor2 and
> force a 0 for any combination that doesn’t exist in the actual data table.
> By way of example, I’d like the output for above to end up as:
>
>   Factor1	Factor2		Value
> A	11/11/2009	5
> A	11/12/2009	4
> A	11/13/2009	0
> B	11/11/2009 	7
> B	11/12/2009	0
> B	11/13/2009	8
>
> Truly appreciate any thoughts.
>
>    

-- 
Greg Hirson
ghirson at ucdavis.edu

Graduate Student
Agricultural and Environmental Chemistry

1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616




More information about the R-help mailing list