[R] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
Gorjanc Gregor
Gregor.Gorjanc at bfro.uni-lj.si
Sun Feb 13 03:04:18 CET 2005
Hello R users!
I have written one function (look at the end), which will ease my
work with analysis of data in another programme, for which I need
sometimes a special data structure. However I encountered several
problems with a created data frame.
---------------------------------------------------------------
The data frame (produced from the example at the end) looks like
the way I want and is:
c1 c2 f2 f1 y1.A y2.A y1.B y2.B
1 1 2 M A -1.2776840 -1.4695219 NA NA
3 3 6 M A 0.1593941 0.7581128 NA NA
5 5 10 M A 1.1085950 0.8556062 NA NA
7 7 14 F A -1.8259281 3.0675536 NA NA
9 9 18 F A 0.8017311 -0.1056571 NA NA
2 2 4 M B <NA> <NA> 0.3577166 0.27310051
4 4 8 M B <NA> <NA> -0.8021399 -1.10060507
6 6 12 F B <NA> <NA> -0.4912098 0.04526153
8 8 16 F B <NA> <NA> -1.2522998 -1.03796810
10 10 20 F B <NA> <NA> -0.3446779 0.53854276
Warning message:
corrupt data frame: columns will be truncated or padded with
NAs in: format.data.frame(x, digits = digits)
Is this data frame really corrupted as R points out?
---------------------------------------------------------------
Then I have a problem with this function if there is also a
factor column between other columns i.e. columns that are
being "divided" according to levels. For example this call
mt.by.factor(x=data, factor="f1", common=c("c1", "c2")))
gives me:
c1 c2 f1 y1.A y2.A f2.A y1.B y2.B f2.B
1 1 2 A -0.02040825 -0.28686293 2 NA NA NA
3 3 6 A -0.60497978 0.84527030 2 NA NA NA
5 5 10 A -0.74968516 -0.01094755 2 NA NA NA
7 7 14 A 0.07658122 -0.30101228 1 NA NA NA
9 9 18 A -0.68788670 -0.02177379 1 NA NA NA
2 2 4 B <NA> <NA> <NA> 0.003037107 0.4067418 2
4 4 8 B <NA> <NA> <NA> -0.035371363 -1.9397670 2
6 6 12 B <NA> <NA> <NA> 0.970424682 -1.3881620 1
8 8 16 B <NA> <NA> <NA> -1.169746470 0.7670071 1
10 10 20 B <NA> <NA> <NA> 1.238606959 -0.1831825 1
Warning message:
corrupt data frame: columns will be truncated or padded with NAs in:
format.data.frame(x, digits = digits)
Why are factor columns 'f2.A' and 'f2.B' now represented as integers?
It looks like that I lost somewhere the factor class but I do not
know why. It should have happened in this part of the function (the
whole function is at the end). Can anyone help me with this?
# - add all other columns but as a set for each level of a factor
levels <- unique(X[factor])
for (level in 1:length(unlist(levels))) {
X[x[factor] == as.character(levels[level, ]),
paste(other, as.character(levels[level, ]), sep=".")] <-
x[x[factor] == as.character(levels[level, ]), other]
}
---------------------------------------------------------------
And another thing are NAs. If I compute means I get:
> mean(data1$y1.A)
[1] -0.2067784
> mean(data1$y1.A, na.rm=T)
[1] -0.2067784
> mean(data1$y1.B)
[1] NA
> mean(data1$y1.B, na.rm=T)
[1] -0.5065222
So <NA> and NA do not behave the same. Is this OK? It really
does not bother me, but I am just curious.
---------------------------------------------------------------
Here is the whole description of the function, the function and
example.
Thanks in advance.
# mt.by.factor.R
#-------------------------------------------------------------------------
# What: Create multiple trait data frame by given factor
# Time-stamp: <2005-02-12 02:28:00 ggorjan>
#-------------------------------------------------------------------------
# Quite often one wants to treat a trait for different levels e.g. sex,
# breed, ... as a different trait. This function eases preparation of data
# for such an analysis.
#
# Input data frame with given variables is expanded in such a way, that
# output represents a data frame with c + l + n * v columns, where c is a
# number of common columns for all levels of a factor, l is a factor
# column, n is a number of levels in a factor and v number of variables
# that should be given for each level of a factor. Number of rows stays
# the same.
#
#-------------------------------------------------------------------------
# Example
n=10
(data <- data.frame(y1=rnorm(n=n),
y2=rnorm(n=n),
f1=factor(rep(c("A", "B"), n/2)),
f2=factor(c(rep(c("M"), n/2), rep(c("F"), n/2))),
c1=1:n,
c2=2*(1:n)))
(data1 <-mt.by.factor(x=data, factor="f1", common=c("c1", "c2", "f2")))
(data1 <-mt.by.factor(x=data, factor="f1", common=c("c1", "c2")))
#
x <- data
factor <- "f1"
common <- c("c1", "c2")
# Function
mt.by.factor <- function(x, factor, common, sort=TRUE) {
# Checks
if (!is.data.frame(x)) {
stop("`x' must be a data frame")
}
if (!is.factor(x[[factor]])) {
stop("`factor' must be a factor")
}
# Sort
if (sort) {
x <- x[order(x[, factor]),]
}
# New data frame
X <- x[common] # Common columns
X[factor] <- x[factor] # Factor column
# Other columns
# - remove common and factor
other <- names(x)
for (i in 1:length(names(x[common]))) {
other <- other[other != common[i]]
}
for (i in 1:length(names(x[factor]))) {
other <- other[other != factor[i]]
}
# - add all other columns but as a set for each level of a factor
levels <- unique(X[factor])
for (level in 1:length(unlist(levels))) {
X[x[factor] == as.character(levels[level, ]),
paste(other, as.character(levels[level, ]), sep=".")] <-
x[x[factor] == as.character(levels[level, ]), other]
}
return(X)
}
#-------------------------------------------------------------------------
# mt.by.factor.R ends here
--
Lep pozdrav / With regards,
Gregor GORJANC
---------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty URI: http://www.bfro.uni-lj.si
Zootechnical Department email: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3 tel: +386 (0)1 72 17 861
SI-1230 Domzale fax: +386 (0)1 72 17 888
Slovenia
More information about the R-help
mailing list