[R] lattice::xyplot/ggplot2: plotting weighted data frames with lmline and smooth

Michael Friendly friendly at yorku.ca
Fri Oct 21 17:22:53 CEST 2011


In the HistData package, I have a data frame, PearsonLee, containing 
observations on heights of parent and child, in weighted form:

library(HistData)

 > str(PearsonLee)
'data.frame':   746 obs. of  6 variables:
  $ child    : num  59.5 59.5 59.5 60.5 60.5 61.5 61.5 61.5 61.5 61.5 ...
  $ parent   : num  62.5 63.5 64.5 62.5 66.5 59.5 60.5 62.5 63.5 64.5 ...
  $ frequency: num  0.5 0.5 1 0.5 1 0.25 0.25 0.5 1 0.25 ...
  $ gp       : Factor w/ 4 levels "fd","fs","md",..: 2 2 2 2 2 2 2 2 2 2 ...
  $ par      : Factor w/ 2 levels "Father","Mother": 1 1 1 1 1 1 1 1 1 1 ...
  $ chl      : Factor w/ 2 levels "Daughter","Son": 2 2 2 2 2 2 2 2 2 2 ...

I want to make a 2x2 set of plots of child ~ parent | par+chl, with 
regression lines and loess smooths, that
incorporate weights=frequency.  The "frequencies" are not integers, so I 
can't simply expand the
data frame.

I'd also like to use different colors for the regression and smoothed lines.
Here's what I've tried using xyplot, all unsuccessful.  I suppose I 
could also use ggplot2, if I could do what
I want.

xyplot(child ~ parent|par+chl, data=PearsonLee, weights=frequency, 
type=c("p", "r", "smooth"))
xyplot(child ~ parent|par+chl, data=PearsonLee,  type=c("p", "r", "smooth"))

  panel.lmline  and panel.smooth don't have a weights= argument, though 
lm() and loess() do.

# Try to control line colors: unsuccessfully -- only one value of 
col.lin is used
xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p", "r", 
"smooth"), col.line=c("red", "blue"))

## try to use panel functions ... unsucessfully
xyplot(child ~ parent|par+chl, data=PearsonLee, type="p",
        panel = function(x, y, ...) {
            panel.xyplot(x, y, ...)
            panel.lmline(x, y, col="blue", ...)
            panel.smooth(x, y, col="red", ...)
            }
)

The following, using base graphics, illustrates the difference between 
the weighted and unweighted lines,
for the total data frame:

with(PearsonLee,
     {
     lim <- c(55,80)
     xv <- seq(55,80, .5)
     sunflowerplot(parent,child, number=frequency, xlim=lim, ylim=lim, 
seg.col="gray", size=.1)
     # unweighted
     abline(lm(child ~ parent), col="green", lwd=2)
     lines(xv, predict(loess(child ~ parent), data.frame(parent=xv)), 
col="green", lwd=2)
     # weighted
     abline(lm(child ~ parent, weights=frequency), col="blue", lwd=2)
     lines(xv, predict(loess(child ~ parent, weights=frequency), 
data.frame(parent=xv)), col="blue", lwd=2)
   })

thanks,
-Michael



-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list