[R] Help with recast() syntax

David Winsemius dwinsemius at comcast.net
Tue Nov 29 07:25:30 CET 2011


On Nov 29, 2011, at 12:32 AM, Chris Conner wrote:

> Dear Help-Rs,
>
> I have data similar to the following:
>
> DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

This section of the structure has two NEG's for 201109 and none for POS.

> 1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO =  
> c(201011L,
> 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L,
> 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L,
> 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
> ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L,
> 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L,
> 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> Currently there are 2 observations for each month (one for negative  
> and one for positive test results).  What I need to create a data  
> set that looks like the following, with positive and negative test  
> results in the same row organized by month:

After fixing the POS/NEG discrepancy, this works:

 > dcast(DF, YR_MO ~ RESULT, value_var="TOT_TESTS")
     YR_MO NEG POS
1  201011 349  66
2  201012 393  98
3  201101 376 109
4  201102 371 122
5  201103 396 113
6  201104 367 111
7  201105 406 113
8  201106 383 146
9  201107 394 124
10 201108 412 130
11 201109 379 120

-- 
David.
>
> DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"),
>     YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L,
>     201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L,
>     98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
>     ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L,
>     383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO",
> "POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA,
> -11L))
>
> As this is something that I understand Hadley Wickham's Reshape  
> package is ideally suited for, I tried using the following reshape  
> command:
>
> ReshapeDF <- recast(DF, YR_MO~variable)
>
> I get the following error message:
>
> Using RESULT as id variables
> Error: Casting formula contains variables not found in molten data:  
> YR_MO
>
> I have a work around that allows me to get to my desired endpoint  
> that involves splitting the data.frame into two (by test result),  
> then using the YR_MO as the by.x/by.y in a merge, but I think this  
> task would be handled more efficiently using reshape?  Can anyone  
> help me to see where I'm going wrong?  Thanks in advance!
>
> 	[[alternative HTML version deleted]]

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list