[R] Restructuring data - unstack, reshape?

Mon Sep 26 10:05:49 CEST 2011

Hi:

Here's one approach using the reshape() function in base R:

# Read in your data:
d <- read.table(textConnection("
Candidate.ID        Specialty       Office         Score
      110002         C                  London           47
      110002         C                  East              48
      110003         RM               West              45
      110003         RM               Southwest      39
      110003         C                  Southwest      38
      110004         H                  South             42
      110006         G                  East               47
      110006         G                  London           45"), header = TRUE)
closeAllConnections()

# Create a variable to distinguish positions
d$Position <- c(1, 2, 1, 2, 1, 1, 1, 2)

reshape(d, idvar = 'Candidate.ID', timevar = 'Position',
           v.names = c('Specialty', 'Office', 'Score'),
           direction = 'wide')

HTH,
Dennis

On Sun, Sep 25, 2011 at 6:47 PM, Jen M <jmstatshelp at gmail.com> wrote:
> Hi all,
>
> I'm having a problem restructuring my data the way I'd like it.  I have data
> that look like this:
>
>  Candidate.ID        Specialty       Office         Score
>       110002         C                  London           47
>       110002         C                  East              48
>       110003         RM               West              45
>       110003         RM               Southwest      39
>       110003         C                  Southwest      38
>       110004         H                  South             42
>       110006         G                  East               47
>       110006         G                  London           45
>
> Candidates can apply for the same job specialty in up to 2 offices (never
> more).  They can apply for different specialties in further centres.  I
> would like to look at score differences when candidates apply for the same
> specialty in two different offices.  With the help of the archives I have
> tried various stack/unstack and reshape/melt/cast combinations, and I've
> managed to get a huge matrix where the columns are all possible combinations
> of Specialties & Offices - and there are many.  This leaves a very sparse
> matrix  with mainly null values, and this is not what I want.  I'd like the
> scores from the two attempts in two columns so I can do scatterplots,
> calculate differences by specialty etc. In SPSS I'd use 'restructure' to get
> what I want.  I'm working to order with specific requests here so I have to
> do it this way (as opposed to a modelling approach).
>
> I would like it restructured to look something like this:
>
>  Candidate.ID        Specialty       Office.1       Score.1      Office.2
> Score.2
>       110002         C                  London           47
>  East          48
>       110003         RM               West              45
>  Southwest  39
>       110003         C                  Southwest      38
>       110004         H                  South             42
>       110006         G                  East               47
> London      45
>       110006         G                  London           45
>
> So one row per candidate/specialty combination, with 2 sets of
> offices/scores and null values in the second set if they've only applied
> once for that specialty.  Can anyone help me out with this? Is it possible
> using stack or reshape?
>
> Many thanks for reading,
> Jan
>
> PS Closest I've come to what I need is the sparse matrix produced by this:
>
> recast(spec.scores, Candidate.ID ~ Specialty + Office, measure.var="Score")
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>