[R] Counting things
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Aug 5 11:49:33 CEST 2009
Try this using built in data frame iris:
> length(subset(iris, Sepal.Length >= 7, Sepal.Width)[[1]])
[1] 13
> length(subset(iris, Sepal.Length >= 7 & Species == 'virginica', Sepal.Width)[[1]])
[1] 12
> # or the following (note that dot in Sepal.Length is automatically
> # converted to _ because dot has special meaning in sql)
> library(sqldf)
> sqldf("select count(*) from iris where Sepal_Length >= 7")
count(*)
1 13
> sqldf("select count(*) from iris where Sepal_Length >= 7 and Species = 'virginica'")
count(*)
1 12
For the second part use cut to create a factor with the levels you
want
iris$Sepal.Length.factor <- cut(iris$Sepal.Length, 4:8)
and then summarize as desired using sql such as:
> sqldf("select Sepal_Length_factor, avg(Sepal_Length), count(Sepal_Length) from iris group by Sepal_Length_factor")
Sepal_Length_factor avg(Sepal_Length) count(Sepal_Length)
1 (4,5] 4.787500 32
2 (5,6] 5.550877 57
3 (6,7] 6.473469 49
4 (7,8] 7.475000 12
or use summaryBy the in the doBy package.
See ?cut, ?subset, and in doBy see ?summaryBy Also see
http://sqldf.googlecode.com
On Tue, Aug 4, 2009 at 11:40 PM, Noah Silverman<noah at smartmediacorp.com> wrote:
> I've completed an experiment and want to summarize the results.
>
> There are two things I like to create.
>
> 1) A simple count of things from the data.frame with predictions
> 1a) Number of predictions with probability greater than x
> 1b) Number of predictions with probability greater than x that are really
> true
>
> In SQL, this would be,
> "Select count(predictions) from data.frame where probability > x"
> "Select count(predictions) from data.frame where probability > x and label
> ='T' "
>
> How can I do this one in R?
>
>
> 2) I'd like to create what we call "binning". It is a simple list of
> probability ranges and how accurate our model is. The idea is to see how
> "true" our probabilities are.
> for example
>
> range number of items mean(probability) true_accuracy
> 100-90% 20 .924 .90
> 90-80% 50 .825 .84
> 80-70% 214 .75 .71
> etc...
>
> It would be really great if I could also graph this!
>
> Is there any kind of package or way to do this in R
>
> Thanks!
>
> -N
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list