[R] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

Gorjanc Gregor Gregor.Gorjanc at bfro.uni-lj.si
Sun Feb 13 03:04:18 CET 2005


Hello R users!

I have written one function (look at the end), which will ease my 
work with analysis of data in another programme, for which I need 
sometimes a special data structure. However I encountered several
problems with a created data frame.

---------------------------------------------------------------

The data frame (produced from the example at the end) looks like 
the way I want and is:

   c1 c2 f2 f1       y1.A       y2.A       y1.B        y2.B
1   1  2  M  A -1.2776840 -1.4695219         NA          NA
3   3  6  M  A  0.1593941  0.7581128         NA          NA
5   5 10  M  A  1.1085950  0.8556062         NA          NA
7   7 14  F  A -1.8259281  3.0675536         NA          NA
9   9 18  F  A  0.8017311 -0.1056571         NA          NA
2   2  4  M  B       <NA>       <NA>  0.3577166  0.27310051
4   4  8  M  B       <NA>       <NA> -0.8021399 -1.10060507
6   6 12  F  B       <NA>       <NA> -0.4912098  0.04526153
8   8 16  F  B       <NA>       <NA> -1.2522998 -1.03796810
10 10 20  F  B       <NA>       <NA> -0.3446779  0.53854276
Warning message: 
corrupt data frame: columns will be truncated or padded with 
NAs in: format.data.frame(x, digits = digits)

Is this data frame really corrupted as R points out? 

---------------------------------------------------------------

Then I have a problem with this function if there is also a
factor column between other columns i.e. columns that are
being "divided" according to levels. For example this call

mt.by.factor(x=data, factor="f1", common=c("c1", "c2")))

gives me:

   c1 c2 f1        y1.A        y2.A f2.A         y1.B       y2.B f2.B
1   1  2  A -0.02040825 -0.28686293    2           NA         NA   NA
3   3  6  A -0.60497978  0.84527030    2           NA         NA   NA
5   5 10  A -0.74968516 -0.01094755    2           NA         NA   NA
7   7 14  A  0.07658122 -0.30101228    1           NA         NA   NA
9   9 18  A -0.68788670 -0.02177379    1           NA         NA   NA
2   2  4  B        <NA>        <NA> <NA>  0.003037107  0.4067418    2
4   4  8  B        <NA>        <NA> <NA> -0.035371363 -1.9397670    2
6   6 12  B        <NA>        <NA> <NA>  0.970424682 -1.3881620    1
8   8 16  B        <NA>        <NA> <NA> -1.169746470  0.7670071    1
10 10 20  B        <NA>        <NA> <NA>  1.238606959 -0.1831825    1
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs in: 
format.data.frame(x, digits = digits)

Why are factor columns 'f2.A' and 'f2.B' now represented as integers?
It looks like that I lost somewhere the factor class but I do not
know why. It should have happened in this part of the function (the 
whole function is at the end). Can anyone help me with this?

    # - add all other columns but as a set for each level of a factor
    levels <- unique(X[factor])
    for (level in 1:length(unlist(levels))) {
        X[x[factor] == as.character(levels[level, ]),     
          paste(other, as.character(levels[level, ]), sep=".")] <- 
            x[x[factor] == as.character(levels[level, ]), other]    
    }

---------------------------------------------------------------

And another thing are NAs. If I compute means I get:

> mean(data1$y1.A)
[1] -0.2067784
> mean(data1$y1.A, na.rm=T)
[1] -0.2067784
> mean(data1$y1.B)
[1] NA
> mean(data1$y1.B, na.rm=T)
[1] -0.5065222

So <NA> and NA do not behave the same. Is this OK? It really
does not bother me, but I am just curious.

---------------------------------------------------------------

Here is the whole description of the function, the function and
example.

Thanks in advance.

# mt.by.factor.R
#-------------------------------------------------------------------------
# What: Create multiple trait data frame by given factor
# Time-stamp: <2005-02-12 02:28:00 ggorjan>
#-------------------------------------------------------------------------
# Quite often one wants to treat a trait for different levels e.g. sex,
# breed, ... as a different trait. This function eases preparation of data
# for such an analysis. 
#
# Input data frame with given variables is expanded in such a way, that 
# output represents a data frame with c + l + n * v columns, where c is a
# number of common columns for all levels of a factor, l is a factor 
# column, n is a number of levels in a factor and v number of variables
# that should be given for each level of a factor. Number of rows stays 
# the same.
#
#-------------------------------------------------------------------------

# Example
n=10                                                                    
(data <- data.frame(y1=rnorm(n=n),                                       
                   y2=rnorm(n=n),
                   f1=factor(rep(c("A", "B"), n/2)),                    
                   f2=factor(c(rep(c("M"), n/2), rep(c("F"), n/2))),    
                   c1=1:n,                                               
                   c2=2*(1:n)))          
                                                  
(data1 <-mt.by.factor(x=data, factor="f1", common=c("c1", "c2", "f2")))
(data1 <-mt.by.factor(x=data, factor="f1", common=c("c1", "c2")))

#
x <- data
factor <- "f1"
common <- c("c1", "c2")

# Function
mt.by.factor <- function(x, factor, common, sort=TRUE) {
    # Checks
    if (!is.data.frame(x)) {
        stop("`x' must be a data frame")
    }
    if (!is.factor(x[[factor]])) {                                  
        stop("`factor' must be a factor")
    }    
    # Sort
    if (sort) {
        x <- x[order(x[, factor]),]                        
    }                                                    
    # New data frame
    X <- x[common] # Common columns
    X[factor] <- x[factor] # Factor column
    # Other columns
    # - remove common and factor
    other <- names(x)
    for (i in 1:length(names(x[common]))) {
        other <- other[other != common[i]]
    }
    for (i in 1:length(names(x[factor]))) {
        other <- other[other != factor[i]]
    }
    # - add all other columns but as a set for each level of a factor
    levels <- unique(X[factor])
    for (level in 1:length(unlist(levels))) {
        X[x[factor] == as.character(levels[level, ]),     
          paste(other, as.character(levels[level, ]), sep=".")] <- 
            x[x[factor] == as.character(levels[level, ]), other]    
    }
    return(X)
}

#-------------------------------------------------------------------------
# mt.by.factor.R ends here


--
Lep pozdrav / With regards,
    Gregor GORJANC

---------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty       URI: http://www.bfro.uni-lj.si
Zootechnical Department    email: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                  tel: +386 (0)1 72 17 861
SI-1230 Domzale            fax: +386 (0)1 72 17 888
Slovenia




More information about the R-help mailing list