[R] qqnorm & huge datasets

Bert Gunter gunter.berton at gene.com
Thu Dec 22 20:56:43 CET 2011


Chuck:

A bad idea, I think: Rounding to unique values loses data density,
while sampling preserves it (to display resolution -- also a form of
rounding).

-- Bert

On Thu, Dec 22, 2011 at 11:10 AM,  <cberry at tajo.ucsd.edu> wrote:
> Sam Steingold <sds at gnu.org> writes:
>
>> Hi,
>> When qqnorm on a vector of length 10M+ I get a huge pdf file which
>> cannot be loaded by acroread or evince.
>> Any suggestions? (apart from sampling the data).
>> Thanks.
>
> Following the other suggestions, I did not notice mention of another
> trick for slimming down graphs of many points. viz.
>
> Do not plot points that substantially overlap:
>
>> xx <- rexp(1e05)
>> qq.results <- qqnorm(xx, plot.it=FALSE)
>> qq.slim <- unique(round(as.data.frame(qq.results),3))
>> dim(qq.slim)
> [1] 10233     2
>> plot(qq.slim)
>>
>
> Choose the digits arg in round to be large enough to allow for points that do not overlap
> to be seen and small enough to slim down the number of plotted
> points. In the example above, 10233 vs 100000.
>
> HTH,
>
> Chuck
>
> --
> Charles C. Berry                            Dept of Family/Preventive Medicine
> cberry at ucsd edu                          UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list