[R] Help with recast() syntax
jdnewmil
jdnewmil at dcn.org
Tue Nov 29 07:25:41 CET 2011
Inline below...
On Mon, 28 Nov 2011 21:32:21 -0800 (PST), Chris Conner
<connerpharmd at yahoo.com> wrote:
> Dear Help-Rs,
>
> I have data similar to the following:
>
> DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO =
> c(201011L,
> 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L,
> 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L,
> 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
> ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L,
> 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L,
> 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> Currently there are 2 observations for each month (one for negative
> and one for positive test results). What I need to create a data set
> that looks like the following, with positive and negative test
> results
> in the same row organized by month:
>
> DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"),
> YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L,
> 201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L,
> 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
> ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L,
> 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO",
> "POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA,
> -11L))
Thanks for the sample data.
> As this is something that I understand Hadley Wickham's Reshape
> package is ideally suited for, I tried using the following reshape
> command:
>
> ReshapeDF <- recast(DF, YR_MO~variable)
>
> I get the following error message:
>
> Using RESULT as id variables
> Error: Casting formula contains variables not found in molten data:
> YR_MO
I don't think you need to melt the data first, so you don't need the
recast function.
# reshape2 is faster than reshape, but slightly syntactically different
library(reshape2)
# rename the RESULT levels
DF0 <- DF
levels( DF0$RESULT ) <- c( "NEG_TOTAL", "POS_TOTAL" )
# cast to data frame, use sum if more than one row for a given YR_MO
DF0 <- dcast( DF0, YR_MO~RESULT, sum, value.var="TOT_TESTS" )
# The rest of this is to make the data frame look like your result,
which seems
# unnecessary to me, but perhaps there is a good reason for keeping X
and RESULT
DF1 <- merge( DF[ DF$RESULT=="POS", c( "X", "RESULT", "YR_MO" ) ], DF0
)
DF2 <- DF1[,c("X", "RESULT", "YR_MO", "POS_TOTAL", "NEG_TOTAL" ) ]
> I have a work around that allows me to get to my desired endpoint
> that involves splitting the data.frame into two (by test result),
> then
> using the YR_MO as the by.x/by.y in a merge, but I think this task
> would be handled more efficiently using reshape? Can anyone help me
> to see where I'm going wrong? Thanks in advance!
>
> [[alternative HTML version deleted]]
(Please remember that this is a plain text email list.)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go
Live...
DCN:<jdnewmil_at_dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#.
rocks...1k
More information about the R-help
mailing list