Hi:
It's hard to diagnose the problem without an illustrative example. Perhaps
the following might help:
(1) When writing a function to use in ddply(), make a generic data frame the
input argument
to the function and refer to the variables within the function either
with the $ notation
or in relation to with(dataframe, ...). This is because you want to
apply the function to
each sub-data frame indexed by combinations of the grouping factors.
(2) The function in (1) should return either a scalar quantity or a data
frame.
(3) If you're computing groupwise scalar summaries, make sure the third
argument of
ddply() is summarise, as in
ddply(mydf, .(grp1, grp2), summarise, mean = mean(y, na.rm = TRUE), sd
= sd(y, na,rm = TRUE))
I don't think as.data.frame.function(f) ... is going to work. Data frames
and functions are two quite different types of objects. If you're trying to
write a function that returns a data frame, then see point (2) above.
Here's an example with a few different versions of what is basically the
same function. Observe how they are handled in ddply().
mydf <- data.frame(grp1 = rep(LETTERS[1:3], each = 20),
grp2 = rep(rep(letters[1:2], each = 10), 3),
w = rpois(60, 10),
x = rpois(60, 5),
y = rbinom(60, 1, 0.5))
# One can use either with() to temporarily attach a data frame for the
# purpose of the calculation or use the $ notation to refers to components
# of a data frame. Either works, as shown below.
f <- function(df) {
u <- with(df, (w + x)/2 + y)
v <- df$x + df$w * df$y
data.frame(u = u, v = v)
}
# In this function, the reference to the data frame is never invoked.
h <- function(df) {
u <- (w + x)/2 + y
v <- x + w * y
data.frame(u = u, v = v)
}
# This returns both the original and newly created variables
g <- function(df) {
df <- transform(df,
u = (w + x)/2 + y,
v = x + w * y
)
df
}
# Returns only the variables u and v + grouping variables; the originals x,
y, z are gone
ddply(mydf, .(grp1, grp2), f)
# Returns the original data frame; the new variables u and v are not added.
In this case,
# ddply silently ignores the function f
ddply(mydf, .(grp1, grp2), transform, f)
# This gets it right
ddply(mydf, .(grp1, grp2), g)
# What happens when you use variable names without accessing the referent
data frame
ddply(mydf, .(grp1, grp2), h)
HTH,
Dennis
On Wed, Apr 13, 2011 at 12:40 PM, 1Rnwb wrote:
> Hello all,
>
> I have arranged my data as per Dennis's suggestion in this post
> http://www.mail-archive.com/r-help@r-project.org/msg107156.html.
> the posted code works fine but when I try to apply it to my data, i get ">
> u2 <- ddply(xxm, .(plateid, cytokine), as.data.frame.function(f))
> Error in t.test.formula(conc ~ Self_T1D, data = df, na.rm = T) :
> grouping factor must have exactly 2 levels".
> Self_T1D has two levels "N" and "Y"
>
> I have used the ddply function to do the mean and sd for the same dataframe
> without any issues.
> I would appreciate help to solve this.
> Thanks
> Sharad
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/error-for-ttest-tp3448056p3448056.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]