[R] incorrect number of levels

David Winsemius dwinsemius at comcast.net
Fri Oct 8 22:51:04 CEST 2010


On Oct 8, 2010, at 4:37 PM, David Winsemius wrote:

>
> On Oct 8, 2010, at 3:04 PM, Chagaris, Dave wrote:
>
>> I have a data set 382 rows and  63 columns.  One of the columns is  
>> bay, and there are 6 bays.  But, the number of levels for this  
>> factor is 7 when it should be six because there is some 'blank'  
>> level "".  When I subset for the blank level "", I get 0 rows.
>
> How did you do the subset?
>
>> What in my data could be causing this?  Thanks.
>>
>>>  dim(datmtx)
>> [1] 382  63
>>
>>
>>>    datmtx$bay
>> [1] TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB  
>> TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB  
>> TB TB TB TB TB TB TB
>> [51] TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB  
>> TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB  
>> TB TB TB TB TB TB TB
>> [101] TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB  
>> TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB TB  
>> TB TB TB HI TB HI TB TB
>> [151] TB TB TB TB TB HI TB HI HI HI TB HI HI HI TB HI HI HI HI HI  
>> HI HI HI TB TB TB TB CH CH TB CH CH CH CH CH CH CH CH CH CH TB TB  
>> CH CH CH CH CH CH CH CH
>> [201] CH CH CH CH CH CH CH CH CH CH CH CH CH CH CH CH CH CH CH CH  
>> TB HI HI HI TB HI HI TB TB TB TB TB TB TB TB TB TB HI TB TB TB TB  
>> TB TB TB TB TB TB TB TB
>> [251] TB HI HI HI CH CH CH CH CH CH CH CH CH CH HI HI CH CH CH CH  
>> CH CH CH CH CH CH CH CH TB TB TB TB TB TB TB TB TB TB CH CH AP AP  
>> AP AP AP AP HI HI HI CH
>> [301] CH CH CH AP AP TB TB AP AP AP AP AP AP SA BB BB TB TB TB TB  
>> AP HI AP SA AP HI AP AP HI HI TB HI AP SA AP AP AP AP AP AP AP AP  
>> SA AP AP SA AP AP AP SA
>> [351] SA SA AP AP AP CH CH CH CH CH AP BB BB BB BB BB TB CH CH CH  
>> CH CH CH CH CH CH CH CH CH CH CH CH
>> Levels:  AP BB CH HI SA TB
>>
>>>   levels(datmtx$bay)
>> [1] ""   "AP" "BB" "CH" "HI" "SA" "TB"
>
> What do you get with:
>
> which(!datmtx$bay %in% c( "AP", "BB", "CH", "HI," "SA", "TB") )

It occurs to me that you should also report:

  which(!levels(datmtx$bay) %in% c( "AP", "BB", "CH", "HI," "SA",  
"TB") )

Since you will not necessarily have all levels represented by existing  
instances. If you created the factor and then filled in a blank  
instance, the earlier blank level would persist. If you want to  
collapse the levels in your factor vector so that they are all  
represented then you can do:

datmtx$bay <-factor(datmtx$bay)


>
> -- 
> David.
>>
>>>   nlevels(datmtx$bay)
>> [1] 7
>>
>> David Chagaris
>> Associate Research Scientist
>> Florida Fish and Wildlife Conservation Commission
>> Florida Fish and Wildlife Research Institute
>> 100 8th Ave SE
>> St. Petersburg, FL  33701
>> (727) 896-8626 ext. 4305
>> (727) 893-1374 fax
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list