[R] How to generate SE for the proportion value using a randomization process in R?
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Jan 28 23:47:07 CET 2021
Hello,
Yes, sorry for my previous post, I had forgotten about boot.array.
That's a much better solution for your problem.
Rui Barradas
Às 20:29 de 28/01/21, Marna Wagley escreveu:
> Thank you Rui,
> This is great. How about the following?
>
> SimilatedData<-boot.array(b, indices=T)
>
> seems it is giving the rows ID which are used in the calculation, isn't it?
>
>
>
>
> On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <ruipbarradas using sapo.pt
> <mailto:ruipbarradas using sapo.pt>> wrote:
>
> Hello,
>
> I don't know why you would need to see the indices but rewrite the
> function bootprop as
>
> bootprop_ind <- function(data, index){
> d <- data[index, ]
> #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> index
> }
>
>
> and call in the same way. It will now return a matrix of indices with R
> = 1000 rows and 19 columns.
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 19:29 de 28/01/21, Marna Wagley escreveu:
> > Hi Rui,
> > I am sorry for asking you several questions.
> >
> > In the given example, randomizations (reshuffle) were done 1000
> times,
> > and its 1000 proportion values (results) are stored and it can be
> seen
> > using b$t; but I was wondering how the table was randomized
> (which rows
> > have been missed/or repeated in each randomizing procedure?).
> >
> > Is there any way we can see the randomized table and its associated
> > results? Here in this example, I randomized (or bootstrapped) the
> table
> > into three times (R=3) so I would like to store these three
> tables and
> > look at them later to know which rows were repeated/missed. Is
> there any
> > possibility?
> > The example data and the code is given below.
> >
> > Thank you for your help.
> >
> > ####
> > library(boot)
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
> = c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
> "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> > mutate(Time1.but.not.in.Time2 = case_when(
> > Time1 %in% "1" & Time2 %in% "0" ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> > Time1 %in% "0" & Time2 %in% "1" ~ "1"),
> > BothTimes = case_when(
> > Time1 %in% "1" & Time2 %in% "1" ~ "1")))
> > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> >
> > bootprop <- function(data, index){
> > d <- data[index, ]
> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
> TRUE)
> > }
> >
> > R <- 3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0 # original
> > b$t
> > sd(b$t) # bootstrapped estimate of the SE of the sample prop.
> > hist(b$t, freq = FALSE)
> >
> > str(b)
> > b$data
> > b$seed
> > b$sim
> > b$strata
> > ################
> >
> >
> > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley
> <marna.wagley using gmail.com <mailto:marna.wagley using gmail.com>
> > <mailto:marna.wagley using gmail.com <mailto:marna.wagley using gmail.com>>>
> wrote:
> >
> > Yes Rui, I can see we don't need to divide by square root of
> sample
> > size. The example is great to understand it.
> > Thank you.
> > Marna
> >
> >
> > On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas
> <ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>
> > <mailto:ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>>>
> wrote:
> >
> > Hello,
> >
> > Inline.
> >
> > Às 07:47 de 23/01/21, Marna Wagley escreveu:
> > > Dear Rui,
> > > I was wondering whether we have to square root of SD
> to find
> > SE, right?
> >
> > No, we don't. var already divides by n, don't divide again.
> > This is the code, that can be seen by running the
> function name
> > at a
> > command line.
> >
> >
> > sd
> > #function (x, na.rm = FALSE)
> > #sqrt(var(if (is.vector(x) || is.factor(x)) x else
> as.double(x),
> > # na.rm = na.rm))
> > #<bytecode: 0x55f3ce900848>
> > #<environment: namespace:stats>
> >
> >
> >
> > >
> > > bootprop <- function(data, index){
> > > d <- data[index, ]
> > > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
> > na.rm = TRUE)
> > > }
> > >
> > > R <- 1e3
> > > set.seed(2020)
> > > b <- boot(daT, bootprop, R)
> > > b
> > > b$t0 # original
> > > sd(b$t) # bootstrapped estimate of the SE of the
> sample prop.
> > > sd(b$t)/sqrt(1000)
> > > pandit*(1-pandit)
> > >
> > > hist(b$t, freq = FALSE)
> >
> >
> > Try plotting the normal densities for both cases, the red
> line is
> > clearly wrong.
> >
> >
> > f <- function(x, xbar, s){
> > dnorm(x, mean = xbar, sd = s)
> > }
> >
> > hist(b$t, freq = FALSE)
> > curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
> > "blue",
> > add = TRUE)
> > curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0,
> to = 1,
> > col =
> > "red", add = TRUE)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > >
> > >
> > >
> > >
> > > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
> > <ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>
> <mailto:ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>>
> > > <mailto:ruipbarradas using sapo.pt
> <mailto:ruipbarradas using sapo.pt> <mailto:ruipbarradas using sapo.pt
> <mailto:ruipbarradas using sapo.pt>>>>
> > wrote:
> > >
> > > Hello,
> > >
> > > Something like this, using base package boot?
> > >
> > >
> > > library(boot)
> > >
> > > bootprop <- function(data, index){
> > > d <- data[index, ]
> > > sum(d[["BothTimes"]], na.rm =
> TRUE)/sum(d[["Time1"]],
> > na.rm = TRUE)
> > > }
> > >
> > > R <- 1e3
> > > set.seed(2020)
> > > b <- boot(daT, bootprop, R)
> > > b
> > > b$t0 # original
> > > sd(b$t) # bootstrapped estimate of the SE of the
> sample
> > prop.
> > > hist(b$t, freq = FALSE)
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > Às 21:57 de 22/01/21, Marna Wagley escreveu:
> > > > Hi All,
> > > > I was trying to estimate standard error (SE)
> for the
> > proportion
> > > value using
> > > > some kind of randomization process
> (bootstrapping or
> > jackknifing)
> > > in R, but
> > > > I could not figure it out.
> > > >
> > > > Is there any way to generate SE for the proportion?
> > > >
> > > > The example of the data and the code I am using is
> > attached for your
> > > > reference. I would like to generate the value of
> > proportion with
> > > a SE using
> > > > a 1000 times randomization.
> > > >
> > > > dat<-structure(list(Sample = structure(c(1L,
> 12L, 13L,
> > 14L, 15L, 16L,
> > > > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
> > 11L), .Label
> > > = c("id1",
> > > > "id10", "id11", "id12", "id13", "id14", "id15",
> > "id16", "id17",
> > > > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
> > "id7", "id8",
> > > > "id9"), class = "factor"), Time1 = c(0L, 1L,
> 1L, 1L,
> > 0L, 0L,
> > > > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
> 0L),
> > Time2 = c(1L,
> > > > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
> > 0L, 1L, 0L,
> > > > 1L, 1L)), .Names = c("Sample", "Time1",
> "Time2"), class =
> > > "data.frame",
> > > > row.names = c(NA,
> > > > -19L))
> > > > daT<-data.frame(dat %>%
> > > > mutate(Time1.but.not.in.Time2 = case_when(
> > > > Time1 %in% "1" & Time2 %in% "0" ~
> "1"),
> > > > Time2.but.not.in.Time1 = case_when(
> > > > Time1 %in% "0" & Time2 %in% "1" ~
> "1"),
> > > > BothTimes = case_when(
> > > > Time1 %in% "1" & Time2 %in% "1" ~
> "1")))
> > > > daT
> > > > summary(daT)
> > > >
> > > > cols.num <-
> > c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > > > "BothTimes")
> > > > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > > > summary(daT)
> > > > ProportionValue<-sum(daT$BothTimes,
> > na.rm=T)/sum(daT$Time1, na.rm=T)
> > > > ProportionValue
> > > > standard error??
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help using r-project.org
> <mailto:R-help using r-project.org> <mailto:R-help using r-project.org
> <mailto:R-help using r-project.org>>
> > <mailto:R-help using r-project.org
> <mailto:R-help using r-project.org> <mailto:R-help using r-project.org
> <mailto:R-help using r-project.org>>>
> > mailing list
> > > -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained,
> > reproducible code.
> > > >
> > >
> >
>
More information about the R-help
mailing list