[R] understanding patterns in categorical vs. continuous data
Liaw, Andy
andy_liaw at merck.com
Fri Jan 27 04:07:42 CET 2006
From: Dave Roberts
>
> You might prefer boxplot(insolation~veg_type) as a graphic.
> That will
> give you quantiles. To get the actual numeric values you could
>
> for (i in levels(veg_type)) {
> print(i)
> quantile(insolation[veg_type==i])
> }
>
> see ?quantile for more help.
If you want the five-number summaries plotted in the boxplots, just look at
the returned object of boxplot():
> g <- factor(rep(1:3, 10))
> y <- rnorm(30)
> res <- boxplot(y ~ g)
> str(res)
List of 6
$ stats: num [1:5, 1:3] -1.135 -0.757 -0.536 0.499 0.996 ...
$ n : num [1:3] 10 10 10
$ conf : num [1:2, 1:3] -1.1639 0.0918 -0.5208 1.6546 -1.2487 ...
$ out : num(0)
$ group: num(0)
$ names: chr [1:3] "1" "2" "3"
If you just want to compute the summaries without the boxplots, use
fivenum():
> tapply(y, g, fivenum)
$"1"
[1] -1.1352456 -0.7571895 -0.5360496 0.4994445 0.9956749
$"2"
[1] -1.1408493 -0.3751730 0.5668747 1.8018146 2.0019303
$"3"
[1] -2.2309983 -0.9333305 -0.3402786 0.8849042 0.9833057
... and if you really want the quantiles, you can do that, too:
> tapply(y, g, quantile)
$"1"
0% 25% 50% 75% 100%
-1.1352456 -0.7391977 -0.5360496 0.3378861 0.9956749
$"2"
0% 25% 50% 75% 100%
-1.1408493 -0.3039648 0.5668747 1.6669879 2.0019303
$"3"
0% 25% 50% 75% 100%
-2.2309983 -0.8389260 -0.3402786 0.6746950 0.9833057
... but note how the quartiles and hinges are not necessarily the same.
Andy
> Dylan Beaudette wrote:
> > Greetings,
> >
> > I have a set of bivariate data: one variable (vegetation
> type) which is
> > categorical, and one (computed annual insolation) which is
> continuous.
> > Plotting veg_type ~ insolation produces a nice overview of
> the patterns that
> > I can see in the source data. However, due to the large
> number of samples
> > (1,000), and the apparent "spread" in the distribution of a
> single vegetation
> > type over a range of insolation values- I having a hard
> time quantitatively
> > describing the relationship between the two variables.
> >
> > Here is a link to a sample graph:
> > http://casoilresource.lawr.ucdavis.edu/drupal/node/162
> >
> > Since the data along each vegetation type "line" is not a
> distribution in the
> > traditional sense, I am having problems applying
> descriptive statistical
> > methods. Conceptually, I would like to some how describe
> the variation with
> > insolation, along each vegetation type "line".
> >
> > Any guidance, or suggested reading material would be
> greatly appreciated.
> >
> >
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> David W. Roberts office
> 406-994-4548
> Professor and Head FAX
> 406-994-3190
> Department of Ecology email
> droberts at montana.edu
> Montana State University
> Bozeman, MT 59717-3460
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list