[R] Correcting for missing data combinations
Greg Hirson
ghirson at ucdavis.edu
Fri Dec 11 22:42:20 CET 2009
One approach would be to use expand.grid to generate all combinations
and then match against what you have.
A short example:
#generate data - two factors - 4 levels in factor1, 26 levels in factor2
df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T),
factor2 = sample(letters, 100, replace=T), value = runif(100))
#generate possible combinations
poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters)
#find matches
present <- paste(poss.comb$factor1, poss.comb$factor2 %in%
paste(df$factor1, df$factor2)
#find possible combinations not in the data
poss.comb[!present, ]
#add 0 as value
zerodata <- cbind(poss.comb[!present, ], value=0)
#and append to data
rbind(df, zerodata)
In place of letters and LETTERS, you could use unique(Factor1) and
unique(Factor2) from your own data in creating the poss.comb list.
HTH,
Greg
On 12/11/09 10:19 AM, GL wrote:
> I can think of many brute-force ways to do this outside of R, but was
> wondering if there was a simple/elegant solution within R instead.
>
> I have a table that looks something like the following:
>
> Factor1 Factor2 Value
> A 11/11/2009 5
> A 11/12/2009 4
> B 11/11/2009 7
> B 11/13/2009 8
>
> > From that I need to generate all permutations of Factor1 and Factor2 and
> force a 0 for any combination that doesn’t exist in the actual data table.
> By way of example, I’d like the output for above to end up as:
>
> Factor1 Factor2 Value
> A 11/11/2009 5
> A 11/12/2009 4
> A 11/13/2009 0
> B 11/11/2009 7
> B 11/12/2009 0
> B 11/13/2009 8
>
> Truly appreciate any thoughts.
>
>
--
Greg Hirson
ghirson at ucdavis.edu
Graduate Student
Agricultural and Environmental Chemistry
1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616
More information about the R-help
mailing list