[R] Strange t-test error: "grouping factor must have exactly 2 levels" while it does...
Petr PIKAL
petr.pikal at precheza.cz
Fri Jul 10 12:00:53 CEST 2009
Hi
you have to look to your data
when I used your function to some artificial data I got expected result
> myfun(visko,"konc")
Levels = 2
[[1]]
[1] NA
[[2]]
Welch Two Sample t-test
data: data[[nam[v]]] by data[[g]]
t = -1.7778, df = 4.541, p-value = 0.1415
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.861362 2.535362
sample estimates:
mean in group 1 mean in group 2
6.685 11.848
[[3]]
Welch Two Sample t-test
data: data[[nam[v]]] by data[[g]]
t = -2.6074, df = 3.263, p-value = 0.07327
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.070027 0.775027
sample estimates:
mean in group 1 mean in group 2
2.3275 6.9750
try
debug(myfun)
and see at what column it gives an error and how all values look like
immediately before an error.
Regards
Petr
r-help-bounces at r-project.org napsal dne 10.07.2009 11:40:30:
> Thanks for your hints, but I'm still stuck... In dataset I mentioned
> (N=134) there are only 3 NA's in variable, and 41% : 59% distribution
> of the two values. It doesn't look like it was because of the data...
>
> I changed and simplified my function, now it prints levels before
> doing the rest. Here's a "funny" error result:
>
> > myfun(data, 'varname')
>
> Levels = 2
>
> Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
> grouping factor must have exactly 2 levels
>
> ...
>
> I'll paste simplified code, maybe it'd give someone a clue what is going
wrong:
>
> myfun <- function(data, g) {
>
> require(stats)
>
> data <- as.data.frame(data)
> nam <- names(data)
> res <- matrix(NA,ncol(data))
>
> cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
>
> for (v in 1:ncol(data)) {
> if (nam[v] != g) {
> res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
> }}
> res
> }
>
> What is going wrong here?
>
> Greetz,
> Timo
>
>
> 2009/7/10 Marc Schwartz <marc_schwartz at me.com>:
> > On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
> >
> >> Hi,
> >>
> >> Could anyone tell me what is wrong:
> >>
> >>> length(unique(mydata$myvariable))
> >>
> >> [1] 2
> >>>
> >>
> >> and in t-test:
> >>
> >> (...)
> >> Error in t.test.formula(othervariable ~ myvariable, mydata) :
> >> grouping factor must have exactly 2 levels
> >>>
> >>
> >> I re-checked the code and still don't get what is wrong.
> >>
> >> Moreover, there is some strange behavior:
> >>
> >> /1 It seems that the error is vulnerable to NA'a, because it affects
> >> some variables in data set with NA's and doesn't affect same ones in
> >> dataset with NA's removed.
> >>
> >> /2 It seems it works differently with different ways of using
> >> variables in t.test:
> >>
> >> eg. it hapends here: t.test(x~y, dataset) and does not here:
> >> t.test(dataset[['x']]~dataset[['y']])
> >>
> >> Does anyone have any ideas?
> >>
> >> Greetz,
> >> Timo
> >
> >
> > Check the output of:
> >
> > na.omit(cbind(mydata$othervariable, mydata$myvariable))
> >
> > which will give you some insight into what data is actually available
to be
> > used in the t test. This will remove any rows that have missing data.
Your
> > first test above, checking the number of levels, is before missing
data is
> > removed.
> >
> > The likelihood is that once missing values have been removed, you are
only
> > left with one unique grouping value in mydata$myvariable.
> >
> > For your note number 2, it should be the same for both examples, as in
both
> > cases, the same basic approach is used. For example:
> >
> > DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
> >
> >> DF
> > x y
> > 1 1 1
> > 2 2 1
> > 3 3 1
> > 4 NA 2
> > 5 NA 2
> > 6 NA 2
> >
> > # Remove missing data
> >> na.omit(DF)
> > x y
> > 1 1 1
> > 2 2 1
> > 3 3 1
> >
> >> t.test(x ~ y, data = DF)
> > Error in t.test.formula(x ~ y, data = DF) :
> > grouping factor must have exactly 2 levels
> >
> >> t.test(DF$x ~ DF$y)
> > Error in t.test.formula(DF$x ~ DF$y) :
> > grouping factor must have exactly 2 levels
> >
> >
> > If you have a small reproducible example where the two function calls
behave
> > differently, please post back with it.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> >
>
>
>
> --
> pozdrawiam,
> Tymek W
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list