[R] Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

Tymek W es_uomikim at op.pl
Fri Jul 10 11:40:30 CEST 2009


Thanks for your hints, but I'm still stuck... In dataset I mentioned
(N=134) there are only 3 NA's in variable, and 41% : 59% distribution
of the two values. It doesn't look like it was because of the data...

I changed and simplified my function, now it prints levels before
doing the rest. Here's a "funny" error result:

> myfun(data, 'varname')

 Levels = 2

Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
  grouping factor must have exactly 2 levels

...

I'll paste simplified code, maybe it'd give someone a clue what is going wrong:

myfun <- function(data, g) {
	
	require(stats)

	data <- as.data.frame(data)
	nam <- names(data)
	res <- matrix(NA,ncol(data))
	
	cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
		
	for (v in 1:ncol(data)) {
		if (nam[v] != g) {
			res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
	}}
	res
}

What is going wrong here?

Greetz,
Timo


2009/7/10 Marc Schwartz <marc_schwartz at me.com>:
> On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
>
>> Hi,
>>
>> Could anyone tell me what is wrong:
>>
>>> length(unique(mydata$myvariable))
>>
>> [1] 2
>>>
>>
>> and in t-test:
>>
>> (...)
>> Error in t.test.formula(othervariable ~ myvariable, mydata) :
>>  grouping factor must have exactly 2 levels
>>>
>>
>> I re-checked the code and still don't get what is wrong.
>>
>> Moreover, there is some strange behavior:
>>
>> /1 It seems that the error is vulnerable to NA'a, because it affects
>> some variables in data set with NA's and doesn't affect same ones in
>> dataset with NA's removed.
>>
>> /2 It seems it works differently with different ways of using
>> variables in t.test:
>>
>> eg. it hapends here: t.test(x~y, dataset) and does not here:
>> t.test(dataset[['x']]~dataset[['y']])
>>
>> Does anyone have any ideas?
>>
>> Greetz,
>> Timo
>
>
> Check the output of:
>
>  na.omit(cbind(mydata$othervariable, mydata$myvariable))
>
> which will give you some insight into what data is actually available to be
> used in the t test. This will remove any rows that have missing data. Your
> first test above, checking the number of levels, is before missing data is
> removed.
>
> The likelihood is that once missing values have been removed, you are only
> left with one unique grouping value in mydata$myvariable.
>
> For your note number 2, it should be the same for both examples, as in both
> cases, the same basic approach is used. For example:
>
> DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
>
>> DF
>   x y
> 1  1 1
> 2  2 1
> 3  3 1
> 4 NA 2
> 5 NA 2
> 6 NA 2
>
> # Remove missing data
>> na.omit(DF)
>  x y
> 1 1 1
> 2 2 1
> 3 3 1
>
>> t.test(x ~ y, data = DF)
> Error in t.test.formula(x ~ y, data = DF) :
>  grouping factor must have exactly 2 levels
>
>> t.test(DF$x ~ DF$y)
> Error in t.test.formula(DF$x ~ DF$y) :
>  grouping factor must have exactly 2 levels
>
>
> If you have a small reproducible example where the two function calls behave
> differently, please post back with it.
>
> HTH,
>
> Marc Schwartz
>
>



-- 
pozdrawiam,
Tymek W




More information about the R-help mailing list