[R] Counting non-empty levels of a factor

David Winsemius dwinsemius at comcast.net
Sun Nov 8 15:25:41 CET 2009


On Nov 8, 2009, at 9:11 AM, David Winsemius wrote:

>
> On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:
>
>> Hi everyone,
>>
>> I'm struggling with a little problem for a while, and I'm wondering  
>> if
>> anyone could help...
>>
>> I have a dataset (from retailing industry) that indicates which  
>> brands
>> are present in a panel of 500 stores,
>>
>> store , brand
>> 1 , B1
>> 1 , B2
>> 1 , B3
>> 2 , B1
>> 2 , B3
>> 3 , B2
>> 3 , B3
>> 3 , B4
>>
>> I would like to know how many brands are present in each store,
>>
>> I tried:
>> result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels)
>>
>> but I got:
>> Group.1 x
>> 1 , 4
>> 2 , 4
>> 3 , 4
>>
>> which is not exactly the result I expected
>> I would like to get sthg like:
>> Group.1 x
>> 1 , 3
>> 2 , 2
>> 3 , 3
>
> Try:
>
> result <- aggregate(MyData$brand , by=list(MyData$store) , length)
>
> Quick, easy and generalizes to other situations. The factor levels  
> got carried along identically, but length counts the number of  
> elements in the list returned by tapply.

Which may not have been what you asked for as this would demonstrate.  
You probably wnat the second solution:
mydata2 <- rbind(MyData, MyData)
 > result <- aggregate(mydata2$brand , by=list(mydata2$store) , length)
 > result
   Group.1 x
1       1 6
2       2 4
3       3 6

 > result <- aggregate(mydata2$brand , by=list(mydata2$store) ,  
function(x) nlevels(factor(x)))
 > result
   Group.1 x
1       1 3
2       2 2
3       3 3

>>
>> Looking around, I found I can delete empty levels of factor using:
>> problem.factor <- problem.factor[,drop=TRUE]
>
> If you reapply the function, factor, you get the same result. So you  
> could have done this:
>
> > result <- aggregate(MyData$brand , by=list(MyData$store) ,  
> function(x) nlevels(factor(x)))
> > result
>  Group.1 x
> 1       1 3
> 2       2 2
> 3       3 3
>
>
>
>> But this solution isn't handy for me as I have many stores and should
>> make a subset of my data for each store before dropping empty factor
>>
>> I can't either counting the line for each store (N), because the same
>> brand can appear several times in each store (several products for  
>> the
>> same brand, and/or several weeks of observation)
>>
>> I used to do this calculation using SAS with:
>> proc freq data = MyData noprint ; by store ;
>> tables  brand / out = result ;
>> run ;
>> (the cool thing was I got a database I can merge with MyData)
>>
>> any idea for doing that in R ?
>>
>> Thanks in advance,
>>
>> King Regards,
>>
>> Sylvain Willart,
>> PhD Marketing,
>> IAE Lille, France
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list