[R] Problem with by statement for spaghetti plots

Dennis Murphy djmuser at gmail.com
Wed Sep 7 23:32:15 CEST 2011


Hi:

Having just seen your follow-up question, here's how it could be done
in ggplot2 or lattice:

# ggplot2 version:
ggplot(w, aes(x = age, y = outcome)) + geom_line(aes(group = .id)) +
   stat_summary(fun.y = 'mean', geom = 'line', color = 'blue', size = 2)

# lattice version:
xyplot(outcome ~ age, data = w, group = .id,
         panel = function(x, y, groups, ...) {
                  panel.xyplot(x, y, groups = groups, type = 'l', col = 1, ...)
                  panel.lmline(x, y, col = 'blue', lwd = 2)
                 } )

Dennis


On Wed, Sep 7, 2011 at 2:07 PM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> Your code doesn't work, but if you want to do spaghetti plots, they're
> rather easy to do in ggplot2 or lattice.
>
> The first problem you have is that one group in your test data has all
> NA responses, so by() or any other groupwise summarization function is
> going to choke unless you give it a way to bypass that group. In other
> words, you need to write a function that checks for bad cases and then
> 'skips over' them if found. I needed some practice in using try(), so
> I gave this a shot.
>
> id <- c(230017,230017,230017,230018,230018,230018,230019,230019,230019,230020,230020,
>        230020,230021,230021,230021,230022,230022,230022,230023,230023,230023,230024,
>        230024,230024,230025,230025,230025,230026,230026,230026)
> age <- rep(c(30,36,42), 10)
> outcome <- c(12,17,10,5,5,2,NA,NA,NA,8,6,5,11,13,10,15,11,15,13,NA,9,0,0,0,20,
>             14,16,1,2,2)
> mydata <- data.frame(id,age,outcome)
>
> # Checks to make sure that each group has at least two non-NA observations,
> # otherwise it can't fit a line. The code assumes that the x-variable
> has no NAs;
> # if this is not the case in your real data, then you need to check
> that at least
> # two observations (i.e., (x, y) pairs) have no NAs in each subgroup.
> checklm <- function(d) {
>    if(sum(is.na(d$outcome)) > 1) stop('not enough values to fit a line')
>    lm(outcome ~ age, data = d)
>   }
>
> # Now use the plyr package to run the models; the function to be applied
> # checks for fealty before fitting the model; if the error message is thrown,
> # it returns an error message with class 'try-error' as the output for that list
> # component.
>
> library('plyr')
> # takes a data frame as input, a list of model objects as output
> # id is the grouping variable. In the function, d represents the
> # sub-data frame associated with a particular id
> u <- dlply(mydata, .(id), function(d) try(checklm(d), TRUE))
> # remove the 'bad' components
> v <- u[!sapply(u, function(x) class(x) == 'try-error')]
> # return the age, outcome and fitted values for each group
> w <- ldply(v, function(d) data.frame(d$model, yhat = fitted(d)))
> head(w)
>
> # Use ggplot2 and lattice to do the spaghetti plots:
> # the group variable allows individual line plots by id
>
> library('ggplot2')
> ggplot(w, aes(x = age, y = outcome, group = .id)) + geom_line()
>
> library('lattice')
> xyplot(outcome ~ age, data = w, group = .id, type = 'l', col = 1)
>
> You can substitute yhat for outcome if you wish.
>
> HTH,
> Dennis
>
> On Wed, Sep 7, 2011 at 12:34 PM, dadrivr <dadrivr at gmail.com> wrote:
>> Sorry, I thought the link would work for people because it is a public link
>> and it works for me when I run it in R.  Anyways, here is an example set of
>> data that I am having trouble with:
>>
>> /id <-
>> c(230017,230017,230017,230018,230018,230018,230019,230019,230019,230020,230020,
>>
>> 230020,230021,230021,230021,230022,230022,230022,230023,230023,230023,230024,
>>        230024,230024,230025,230025,230025,230026,230026,230026)
>> age <- rep(c(30,36,42),10)
>> outcome <-
>> c(12,17,10,5,5,2,NA,NA,NA,8,6,5,11,13,10,15,11,15,13,NA,9,0,0,0,20,
>>             14,16,1,2,2)
>> mydata <- as.data.frame(cbind(id,age,outcome))
>>
>> fit <- by(mydata, mydata$id, function(x) fitted.values(lm(outcome ~ age,
>> data=x)))
>> fit1 <- unlist(fit)
>> names(fit1) <- NULL
>> interaction.plot(mydata$age, mydata$id, fit1,legend=F)/
>>
>> Note that the following works fine:
>> /lm(outcome ~ age, data=mydata)/
>>
>> Any help would be greatly appreciated.  Thanks!
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Problem-with-by-statement-for-spaghetti-plots-tp3788536p3797025.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list