[R] Restructuring data - unstack, reshape?
Dennis Murphy
djmuser at gmail.com
Mon Sep 26 10:05:49 CEST 2011
Hi:
Here's one approach using the reshape() function in base R:
# Read in your data:
d <- read.table(textConnection("
Candidate.ID Specialty Office Score
110002 C London 47
110002 C East 48
110003 RM West 45
110003 RM Southwest 39
110003 C Southwest 38
110004 H South 42
110006 G East 47
110006 G London 45"), header = TRUE)
closeAllConnections()
# Create a variable to distinguish positions
d$Position <- c(1, 2, 1, 2, 1, 1, 1, 2)
reshape(d, idvar = 'Candidate.ID', timevar = 'Position',
v.names = c('Specialty', 'Office', 'Score'),
direction = 'wide')
HTH,
Dennis
On Sun, Sep 25, 2011 at 6:47 PM, Jen M <jmstatshelp at gmail.com> wrote:
> Hi all,
>
> I'm having a problem restructuring my data the way I'd like it. I have data
> that look like this:
>
> Candidate.ID Specialty Office Score
> 110002 C London 47
> 110002 C East 48
> 110003 RM West 45
> 110003 RM Southwest 39
> 110003 C Southwest 38
> 110004 H South 42
> 110006 G East 47
> 110006 G London 45
>
> Candidates can apply for the same job specialty in up to 2 offices (never
> more). They can apply for different specialties in further centres. I
> would like to look at score differences when candidates apply for the same
> specialty in two different offices. With the help of the archives I have
> tried various stack/unstack and reshape/melt/cast combinations, and I've
> managed to get a huge matrix where the columns are all possible combinations
> of Specialties & Offices - and there are many. This leaves a very sparse
> matrix with mainly null values, and this is not what I want. I'd like the
> scores from the two attempts in two columns so I can do scatterplots,
> calculate differences by specialty etc. In SPSS I'd use 'restructure' to get
> what I want. I'm working to order with specific requests here so I have to
> do it this way (as opposed to a modelling approach).
>
> I would like it restructured to look something like this:
>
> Candidate.ID Specialty Office.1 Score.1 Office.2
> Score.2
> 110002 C London 47
> East 48
> 110003 RM West 45
> Southwest 39
> 110003 C Southwest 38
> 110004 H South 42
> 110006 G East 47
> London 45
> 110006 G London 45
>
> So one row per candidate/specialty combination, with 2 sets of
> offices/scores and null values in the second set if they've only applied
> once for that specialty. Can anyone help me out with this? Is it possible
> using stack or reshape?
>
> Many thanks for reading,
> Jan
>
> PS Closest I've come to what I need is the sparse matrix produced by this:
>
> recast(spec.scores, Candidate.ID ~ Specialty + Office, measure.var="Score")
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list