[R] generate distribution based on summary data and add random noise

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Thu Feb 3 16:52:12 CET 2022


Hallo all

I have summary data with size bins and percentage below that size.

dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L,
90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L,
200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L,
2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L,
76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA,
-24L))

#I want to generate original distribution (I know it is better not to do it but I have no other choice) so I calculated #mids of those bins

xd <-dat$size-c(5,diff(dat$size)/2)
xd<- xd[-1]

#I can sample the size bins with probability given by percent.
Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)
plot(ecdf(Result))

#and I can add some noise to it, which is satisfactory with lower size bins but not enough for higher size bins.

Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000, mean=0, sd=5)
plot(ecdf(Result))
I can increase sd to satisfy bigger bin size but in that case noise is too big for lower bin size.

I would like to add smaller random noise to lower size bins and bigger random noise to higher size bins, which seems to be easy task but I am stuck how to do it. It should be somehow proportional to size value.
The only way forward I see is to sort generated result and to use something like

+ rnorm(1000, mean=xd, sd=xd/10)
But it is not correct.

I'd appreciate any hint how to add random noise to values in ordered manner.

Best regards.
Petr

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/



More information about the R-help mailing list