[R] generate distribution based on summary data and add random noise

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu Feb 3 17:09:43 CET 2022


If I understand correctly:
To generate a sample of total size N, generate a uniform sample of size p*N
for a bin with proportion p?
?runif

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:

> Hallo all
>
> I have summary data with size bins and percentage below that size.
>
> dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L,
> 90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L,
> 200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L,
> 2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L,
> 76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA,
> -24L))
>
> #I want to generate original distribution (I know it is better not to do
> it but I have no other choice) so I calculated #mids of those bins
>
> xd <-dat$size-c(5,diff(dat$size)/2)
> xd<- xd[-1]
>
> #I can sample the size bins with probability given by percent.
> Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)
> plot(ecdf(Result))
>
> #and I can add some noise to it, which is satisfactory with lower size
> bins but not enough for higher size bins.
>
> Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000,
> mean=0, sd=5)
> plot(ecdf(Result))
> I can increase sd to satisfy bigger bin size but in that case noise is too
> big for lower bin size.
>
> I would like to add smaller random noise to lower size bins and bigger
> random noise to higher size bins, which seems to be easy task but I am
> stuck how to do it. It should be somehow proportional to size value.
> The only way forward I see is to sort generated result and to use
> something like
>
> + rnorm(1000, mean=xd, sd=xd/10)
> But it is not correct.
>
> I'd appreciate any hint how to add random noise to values in ordered
> manner.
>
> Best regards.
> Petr
>
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list