[R] qqnorm & huge datasets

R. Michael Weylandt michael.weylandt at gmail.com
Thu Dec 22 01:11:49 CET 2011


I'd second Peter's suggestion, but if you need every data point for
whatever reason, you might also try passing the pch = "." option to
qqnorm. On a test with 1e7 data points, it more than halved the
resulting file size and with that many points, there's no loss in
clarity with the different marker.

Michael

On Wed, Dec 21, 2011 at 4:59 PM, peter dalgaard <pdalgd at gmail.com> wrote:
>
> On Dec 21, 2011, at 23:10 , Sam Steingold wrote:
>
>> Hi,
>> When qqnorm on a vector of length 10M+ I get a huge pdf file which
>> cannot be loaded by acroread or evince.
>> Any suggestions? (apart from sampling the data).
>
> Sample intelligently? Things like
>
>> qq <- seq(-4,4,,10001)
>> qqplot(qq,quantile(x,pnorm(qq)),type="l")
>
> or maybe
>
>> qqnorm(sort(x)[seq_along(x)%%100==50], type="l")
>
> (Those can likely be improved upon, but you get the picture.)
>
>
>> Thanks.
>> --
>> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
>> http://mideasttruth.com http://honestreporting.com http://camera.org
>> http://openvotingconsortium.org http://pmw.org.il http://thereligionofpeace.com
>> A person without flaws probably lacks strengths either.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list