Tue Nov 16 09:32:24 CET 2021

I have a large database with a column containing a factor:
> str(df)
'data.frame': 5000000 obs. of  4 variables:
\$ MR   : num  0.000809 0.001236 0.001663 0.002089 0.002516 ...
\$ FCN  : num  2 2 2 2 2 2 2 2 2 2 ...
\$ Class: Factor w/ 3 levels "negative","positive",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ Set  : int  1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "out.attrs")=List of 2
..\$ dim     : Named int [1:2] 1000 1000
.. ..- attr(*, "names")= chr [1:2] "X1" "X2"
..\$ dimnames:List of 2
.. ..\$ X1: chr [1:1000] "X1=0.0008094667" "X1=0.0012360955"
"X1=0.0016627243" "X1=0.0020893531" ...
.. ..\$ X2: chr [1:1000] "X2= 2.000000" "X2= 2.048048" "X2= 2.096096"
"X2= 2.144144" ...
I would like to run prop.test on df\$Class, but:
> prop.test(x=point\$Class, n=length(point\$Class),
+ conf.level=.95, correct=FALSE)
Error in prop.test(x = point\$Class, n = length(point\$Class),
conf.level = 0.95,  :
'x' and 'n' must have the same length
Since `x` is "a vector of counts of successes, a one-dimensional table
with two entries, or a two-dimensional table (or matrix) with 2
columns, giving the counts of successes and failures, respectively." I
provided point\$Class. The total number of tests is
length(point\$Class).
There are three levels:
> unique(df\$Class)
 negative  positive  uncertain
Levels: negative positive uncertain
I tried to remove the levels to check if the levels were interfering
with the test:
> df\$Class = levels(droplevels(df\$Class))
Error in `\$<-.data.frame`(`*tmp*`, Class, value = c("negative", "positive",  :
replacement has 3 rows, data has 5000000
What would be the syntax for this test? The idea is to get the most
common value for each unique(df\$Set) and prop.test will provide also
the 95% CI for the estimate.
