[R] How to generate SE for the proportion value using a randomization process in R?

Sat Jan 23 09:36:31 CET 2021

Yes Rui, I can see we don't need to divide by square root of sample size.
The example is great to understand it.
Thank you.
Marna

On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Hello,
>
> Inline.
>
> Às 07:47 de 23/01/21, Marna Wagley escreveu:
> > Dear Rui,
> > I was wondering whether we have to square root of SD to find SE, right?
>
> No, we don't. var already divides by n, don't divide again.
> This is the code, that can be seen by running the function name at a
> command line.
>
>
> sd
> #function (x, na.rm = FALSE)
> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> #    na.rm = na.rm))
> #<bytecode: 0x55f3ce900848>
> #<environment: namespace:stats>
>
>
>
> >
> > bootprop <- function(data, index){
> >     d <- data[index, ]
> >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 1e3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0     # original
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > sd(b$t)/sqrt(1000)
> > pandit*(1-pandit)
> >
> > hist(b$t, freq = FALSE)
>
>
> Try plotting the normal densities for both cases, the red line is
> clearly wrong.
>
>
> f <- function(x, xbar, s){
>    dnorm(x, mean = xbar, sd = s)
> }
>
> hist(b$t, freq = FALSE)
> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
> add = TRUE)
> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
> "red", add = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> >
> >
> >
> >
> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas using sapo.pt
> > <mailto:ruipbarradas using sapo.pt>> wrote:
> >
> >     Hello,
> >
> >     Something like this, using base package boot?
> >
> >
> >     library(boot)
> >
> >     bootprop <- function(data, index){
> >         d <- data[index, ]
> >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
> TRUE)
> >     }
> >
> >     R <- 1e3
> >     set.seed(2020)
> >     b <- boot(daT, bootprop, R)
> >     b
> >     b$t0     # original
> >     sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> >     hist(b$t, freq = FALSE)
> >
> >
> >     Hope this helps,
> >
> >     Rui Barradas
> >
> >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
> >      > Hi All,
> >      > I was trying to estimate standard error (SE) for the proportion
> >     value using
> >      > some kind of randomization process (bootstrapping or jackknifing)
> >     in R, but
> >      > I could not figure it out.
> >      >
> >      > Is there any way to generate SE for the proportion?
> >      >
> >      > The example of the data and the code I am using is attached for
> your
> >      > reference. I would like to generate the value of proportion with
> >     a SE using
> >      > a 1000 times randomization.
> >      >
> >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
> 16L,
> >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
> >     = c("id1",
> >      > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> >      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> >      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
> >     "data.frame",
> >      > row.names = c(NA,
> >      > -19L))
> >      > daT<-data.frame(dat %>%
> >      >    mutate(Time1.but.not.in.Time2 = case_when(
> >      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> >      > Time2.but.not.in.Time1 = case_when(
> >      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >      >   BothTimes = case_when(
> >      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> >      >   daT
> >      >   summary(daT)
> >      >
> >      > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >      > "BothTimes")
> >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> >      > summary(daT)
> >      > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
> na.rm=T)
> >      > ProportionValue
> >      > standard error??
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > ______________________________________________
> >      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> >     -- To UNSUBSCRIBE and more, see
> >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >      > PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >      > and provide commented, minimal, self-contained, reproducible code.
> >      >
> >
>

	[[alternative HTML version deleted]]