[R] Counting non-empty levels of a factor
David Winsemius
dwinsemius at comcast.net
Sun Nov 8 15:11:12 CET 2009
On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:
> Hi everyone,
>
> I'm struggling with a little problem for a while, and I'm wondering if
> anyone could help...
>
> I have a dataset (from retailing industry) that indicates which brands
> are present in a panel of 500 stores,
>
> store , brand
> 1 , B1
> 1 , B2
> 1 , B3
> 2 , B1
> 2 , B3
> 3 , B2
> 3 , B3
> 3 , B4
>
> I would like to know how many brands are present in each store,
>
> I tried:
> result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels)
>
> but I got:
> Group.1 x
> 1 , 4
> 2 , 4
> 3 , 4
>
> which is not exactly the result I expected
> I would like to get sthg like:
> Group.1 x
> 1 , 3
> 2 , 2
> 3 , 3
Try:
result <- aggregate(MyData$brand , by=list(MyData$store) , length)
Quick, easy and generalizes to other situations. The factor levels got
carried along identically, but length counts the number of elements in
the list returned by tapply.
>
> Looking around, I found I can delete empty levels of factor using:
> problem.factor <- problem.factor[,drop=TRUE]
If you reapply the function, factor, you get the same result. So you
could have done this:
> result <- aggregate(MyData$brand , by=list(MyData$store) ,
function(x) nlevels(factor(x)))
> result
Group.1 x
1 1 3
2 2 2
3 3 3
> But this solution isn't handy for me as I have many stores and should
> make a subset of my data for each store before dropping empty factor
>
> I can't either counting the line for each store (N), because the same
> brand can appear several times in each store (several products for the
> same brand, and/or several weeks of observation)
>
> I used to do this calculation using SAS with:
> proc freq data = MyData noprint ; by store ;
> tables brand / out = result ;
> run ;
> (the cool thing was I got a database I can merge with MyData)
>
> any idea for doing that in R ?
>
> Thanks in advance,
>
> King Regards,
>
> Sylvain Willart,
> PhD Marketing,
> IAE Lille, France
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list