# [R] how to create a plot of permutation of 30 random values and show proportion of values

Ana Marija @okov|c@@n@m@r|j@ @end|ng |rom gm@||@com
Fri Jan 24 05:05:12 CET 2020

```Hi Jim,

thanks for getting back to me.
Can you please confirm if you can see this plot in attach?

Thanks
Ana

On Thu, Jan 23, 2020 at 8:06 PM Jim Lemon <drjimlemon using gmail.com> wrote:
>
> Hi Ana,
> You seem to be working on an identification or classification problem.
> Your sample plot didn't come through, perhaps try converting it to a
> PDF or PNG.
> I may be missing something, but I can't see how randomly selecting 30
> values from almost 4 million is going to mean anything in terms of
> statistical significance. I hope you will pardon me for saying that it
> looks like a "p-trawl". It is easy to select cases where the p-value
> is less than 0.05:
>
> a[a\$pvalue < 0.05,]
>
> Maybe what you want to do is display this subset of your data as
> candidates for a match among the very large number of non-matches.
> Let's do a bit of damage to your sample data and add the proportions:
>
>  rs185642176 0.0267407 0.6
>  rs184120752 0.0787681 0.3
>  rs10904045 0.0508162 0.4
>  rs35849539 0.0875910 0.2
>  rs141633513 0.0787759 0.2
>  rs4468273 0.0542171 0.4
>  rs4567378 0.0539484 0.4
>  rs7084251 0.0126445 0.7
>  rs181605000 0.0787838 0.35
>  rs12255619 0.0192719 0.61
>  rs140367257 0.0788008 0.25
>  rs10904178 0.0969814 0.16
>  rs7918960 0.0436341 0.45
>  rs61688896 0.0526256 0.39
>  rs151283848 0.0787284 0.34
>  rs140174295 0.0989107 0.11
>  rs145945079 0.0787015 0.23
>  rs4881370 0.0455089 0.51
>  rs183895035 0.0787015 0.22
>  rs181749526 0.0787015 0.22",
> alt05<-a[a\$pvalue < 0.05,]
> library(plotrix)
> segmat<-matrix(c(alt05\$pSNP,alt05\$pSNP-0.1,alt05\$pSNP+0.1,rep(1,5)),
>  nrow=4,byrow=TRUE)
> rownames(segmat)<-c("prop","lower","upper","N")
> centipede.plot(segmat,mar=c(4,6,3,4),
>  main="Proportion of SNPs",
>  left.labels=alt05\$rs,right.labels=rep("",5))
>
> This is probably not what you want, but it is a start.
>
> Jim
>
> On Fri, Jan 24, 2020 at 7:08 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
> >
> > Hello,
> >
> > I have a data frame which looks like this:
> >
> >              rs   pvalue
> >  1: rs185642176 0.267407
> >  2: rs184120752 0.787681
> >  3:  rs10904045 0.508162
> >  4:  rs35849539 0.875910
> >  5: rs141633513 0.787759
> >  6:   rs4468273 0.542171
> >  7:   rs4567378 0.539484
> >  8:   rs7084251 0.126445
> >  9: rs181605000 0.787838
> > 10:  rs12255619 0.192719
> > 11: rs140367257 0.788008
> > 12:  rs10904178 0.969814
> > 13:   rs7918960 0.436341
> > 14:  rs61688896 0.526256
> > 15: rs151283848 0.787284
> > 16: rs140174295 0.989107
> > 17: rs145945079 0.787015
> > 18:   rs4881370 0.455089
> > 19: rs183895035 0.787015
> > 20: rs181749526 0.787015
> > > dim(a)
> > [1] 3859763       2
> >
> > What I would like to do is to take random subsets of 30 of those rs
> > throughout the dataframe and find out which subsets of those generated
> > have FDR value <0.05
> >
> > FDR I would calculate I guess with:
> >
> > but I also guess I would be calculating only FDR for a particular
> > subset of 30 randomly chosen rs, not for the whole data set.
> >
> > The result I would like to present like in the attached plot. The
> > x-axis say proportion of SNPs and in my case SNP is equivalent to rs
> >
> >
> > Thanks
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help