[R] Reshaping dataframes

Wed Aug 22 23:29:31 CEST 2012

Hello,

Your function doesn't seem to be very difficult to generalize.

d <- read.table(text="
    trg_type child_type_1
1 Scientists NA
2        of         used
", header=TRUE)
str(d)

subs_na <- function(tok, na_factor_level = "NOT_REALIZED", na_num = 99999) {
     ifac <- which(sapply(tok, is.factor))
     inum <- which(sapply(tok, is.numeric))
     for(i in ifac) {
         levels(tok[, i]) <- c(levels(tok[, i]), na_factor_level)
         tok[is.na(tok[, i]), i] <- as.factor(na_factor_level)
     }
     for(i in inum)
         tok[is.na(tok[, i]), i] <- na_num
     tok
}

r1 <- substitute_na(d)
r2 <- subs_na(d)
str(r1)
str(r2)
identical(r1, r2)  # TRUE

You could use the same coding for characters, Dates, etc.

Hope this helps,

Rui Barradas

Em 22-08-2012 20:16, Ingmar Schuster escreveu:
> Hi,
>
> I have a data set with variables that are _not_ missing at random. Now I
> use a package for learning a Bayesian Network which won't accept NA as a
> value. From a database I query data.frames with k,k+n,k+2n, ... variables
> (there are always at least k variables as leftmost columns). Using
> rbind.fill from the reshape package on two data frames I would get a data
> frame like
>
>     trg_type child_type_1
> 1 Scientists NA
> 2        of         used
>
> Now to get rid of NA values I use the following function, which works for
> data frames with only factor values:
>
>    substitute_na <- function(tok, na_factor_level = "NOT_REALIZED") {
>      for (i in 1:length(tok)) {levels(tok[,i]) <- c(levels(tok[,i]),
> na_factor_level)}
>      tok[is.na(tok)] <- as.factor(na_factor_level)
>      return(tok)
>    }
>
> Is there a better/faster way to do it? It would also be great to be able to
> distinguish factor columns from numeric columns and use a special numeric
> value there. The current version of rbind.fill makes no direct reference to
> the fill value so that I could change its implementation for my purpose.
>
>
> Thanks!
>
> Ingmar
>