[R] Difficulty with qqline in logarithmic context
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Feb 1 17:06:16 CET 2006
Is there a good reason to use qqnorm in a single-log context? Should one
not rather use
> qqnorm(log(freq))
> qqline(log(freq))
since you are (I guess) looking at log-normality of freq? Another way
to look at that is
> qqplot(qlnorm(ppoints(length(freq))), freq, log="xy")
the same plot, different scales. (I believe a QQ plot should always have
comparable scales on the two axes.)
The point is that qqline is tied to normality, not to log-normality.
On Wed, 1 Feb 2006, François Pinard wrote:
> Hi, R friends. I had some difficulty with the following code:
>
> qqnorm(freq, log='y')
> qqline(freq)
>
> as the line drawn was seemingly random. The exact data I used appears
> below. After wandering a bit within the source code for "abline",
> I figured out I should rather write:
>
> qqnorm(freq, log='y')
> par(ylog=FALSE)
> qqline(log10(freq))
> par(ylog=TRUE)
>
> I'm proposing that this little stunt be rather be hidden and
> automatically effected within "qqline" proper, whenever par('ylog') is
> TRUE. I thought about providing a patch, as "qqline" is so small. Yet
> it would be more noise than useful, as I'm not familiar with the "datax"
> argument usage, which should probably be addressed as well.
>
>
>
> Here is the data, in case useful:
>
> freq <-
> as.integer(c(33, 79, 21, 436, 58, 18, 1106, 498, 1567, 393, 2,
> 104, 50, 67, 113, 76, 327, 331, 196, 145, 86, 59, 12, 215, 293,
> 154, 500, 314, 246, 587, 85, 23, 323, 3, 13, 576, 29, 37, 24,
> 21, 1230, 137, 13, 93, 3, 101, 72, 218, 59, 17, 2, 8, 86, 143,
> 150, 22, 19, 234, 119, 157, 4, 255, 146, 126, 76, 15, 271, 170,
> 4, 6, 16, 3048, 2175, 3350, 5017, 5706, 1610, 665, 322, 1, 16,
> 47, 51, 168, 94, 66, 154, 99, 11, 547, 953, 1, 1071, 80, 184,
> 168, 52, 187, 103, 187, 361, 46, 85, 135, 597, 121, 283, 26,
> 12, 20, 169, 9, 79, 15, 114, 75, 30, 111, 556, 173, 32, 99, 438,
> 2, 2, 1, 117, 5, 3, 51, 8, 41, 12, 23, 2, 13, 5, 1, 9, 4, 1,
> 7, 15, 5, 48, 16, 112, 6, 1, 39, 60, 5, 23, 5, 19, 1, 8, 32,
> 4, 13, 1, 14, 71, 5, 1, 35, 30, 100, 389, 22, 8, 1, 192, 40,
> 6, 3, 17, 2, 14, 71, 14, 1, 5, 4, 32, 21, 18, 13, 2, 2, 45, 342,
> 46, 144, 18, 131, 188, 112, 37, 85, 90, 8, 195, 173, 5, 53, 96,
> 37, 16, 16, 281, 64, 50, 92, 336, 31, 744, 4, 134, 74, 1, 227,
> 6, 48, 418, 64, 66, 59, 20, 45, 20, 370, 148, 22, 7, 30, 601,
> 29, 82, 113, 938, 252, 65, 137, 72, 22, 98, 12, 152, 212, 13,
> 8, 35, 3, 77))
>
> Yet this really is the value of "courriel$freq" after "data(courriel)",
> with a file ".../R/data/courriel.R" here, holding:
>
> courriel <- read.table(pipe('grep -c \'^From \' ../courriel/*'),
> sep=':', as.is=T, row.names=1,
> col.names=c('fichier', 'freq'))
>
> My goal, which is nothing serious, was merely to toy with the number of
> messages per folder, for folders massaged out of R archives.
>
>
>
> Version:
> platform = i686-pc-linux-gnu
> arch = i686
> os = linux-gnu
> system = i686, linux-gnu
> status =
> major = 2
> minor = 2.1
> year = 2005
> month = 12
> day = 20
> svn rev = 36812
> language = R
>
> Locale:
> LC_CTYPE=fr_CA.UTF-8;LC_NUMERIC=C;LC_TIME=fr_CA.UTF-8;LC_COLLATE=fr_CA.UTF-8;LC_MONETARY=fr_CA.UTF-8;LC_MESSAGES=fr_CA.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
>
> Search Path:
> .GlobalEnv, package:methods, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, fp.etc, Autoloads, package:base
>
>
> --
> François Pinard http://pinard.progiciels-bpi.ca
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list