[R] Problem with by statement for spaghetti plots

Dennis Murphy djmuser at gmail.com
Wed Sep 7 23:07:21 CEST 2011


Your code doesn't work, but if you want to do spaghetti plots, they're
rather easy to do in ggplot2 or lattice.

The first problem you have is that one group in your test data has all
NA responses, so by() or any other groupwise summarization function is
going to choke unless you give it a way to bypass that group. In other
words, you need to write a function that checks for bad cases and then
'skips over' them if found. I needed some practice in using try(), so
I gave this a shot.

id <- c(230017,230017,230017,230018,230018,230018,230019,230019,230019,230020,230020,
age <- rep(c(30,36,42), 10)
outcome <- c(12,17,10,5,5,2,NA,NA,NA,8,6,5,11,13,10,15,11,15,13,NA,9,0,0,0,20,
mydata <- data.frame(id,age,outcome)

# Checks to make sure that each group has at least two non-NA observations,
# otherwise it can't fit a line. The code assumes that the x-variable
has no NAs;
# if this is not the case in your real data, then you need to check
that at least
# two observations (i.e., (x, y) pairs) have no NAs in each subgroup.
checklm <- function(d) {
    if(sum(is.na(d$outcome)) > 1) stop('not enough values to fit a line')
    lm(outcome ~ age, data = d)

# Now use the plyr package to run the models; the function to be applied
# checks for fealty before fitting the model; if the error message is thrown,
# it returns an error message with class 'try-error' as the output for that list
# component.

# takes a data frame as input, a list of model objects as output
# id is the grouping variable. In the function, d represents the
# sub-data frame associated with a particular id
u <- dlply(mydata, .(id), function(d) try(checklm(d), TRUE))
# remove the 'bad' components
v <- u[!sapply(u, function(x) class(x) == 'try-error')]
# return the age, outcome and fitted values for each group
w <- ldply(v, function(d) data.frame(d$model, yhat = fitted(d)))

# Use ggplot2 and lattice to do the spaghetti plots:
# the group variable allows individual line plots by id

ggplot(w, aes(x = age, y = outcome, group = .id)) + geom_line()

xyplot(outcome ~ age, data = w, group = .id, type = 'l', col = 1)

You can substitute yhat for outcome if you wish.


On Wed, Sep 7, 2011 at 12:34 PM, dadrivr <dadrivr at gmail.com> wrote:
> Sorry, I thought the link would work for people because it is a public link
> and it works for me when I run it in R.  Anyways, here is an example set of
> data that I am having trouble with:
> /id <-
> c(230017,230017,230017,230018,230018,230018,230019,230019,230019,230020,230020,
> 230020,230021,230021,230021,230022,230022,230022,230023,230023,230023,230024,
>        230024,230024,230025,230025,230025,230026,230026,230026)
> age <- rep(c(30,36,42),10)
> outcome <-
> c(12,17,10,5,5,2,NA,NA,NA,8,6,5,11,13,10,15,11,15,13,NA,9,0,0,0,20,
>             14,16,1,2,2)
> mydata <- as.data.frame(cbind(id,age,outcome))
> fit <- by(mydata, mydata$id, function(x) fitted.values(lm(outcome ~ age,
> data=x)))
> fit1 <- unlist(fit)
> names(fit1) <- NULL
> interaction.plot(mydata$age, mydata$id, fit1,legend=F)/
> Note that the following works fine:
> /lm(outcome ~ age, data=mydata)/
> Any help would be greatly appreciated.  Thanks!
> --
> View this message in context: http://r.789695.n4.nabble.com/Problem-with-by-statement-for-spaghetti-plots-tp3788536p3797025.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list