[Rd] Re: desiderata for data manipulation

A.J. Rossini rossini@blindglobe.net
01 Nov 2000 05:42:15 -0800


>>>>> "MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:

    MM> [diverted from R-help to R-devel]
>>>>> "tony" == A J Rossini <rossini@blindglobe.net> writes:

>>>>> "ZC" == Zsombor Cseres-Gergely <z.cseres-gergely@ucl.ac.uk> writes:

    ZC> .........  But how do you exchange datafiles? Dump to ascii
    ZC> and infile?

    tony> Required helpful R-related content:

    tony> Much, much easier.  Look for the stataread package at CRAN.
    tony> (at least until R gets a nice set of longitudinal macros,
    tony> I'll continue to enjoy Stata's "reshape" command for data
    tony> manipulation)

    MM> Dear Tony, can you elaborate what you are missing here for R?

In R, it would be a simple command to reshape a dataframe, from a
single-observation per individual (or other unit having repeated
measurements) to multiple lines (one per measurement).  Quite often,
in building models, I'll switch between wide/long versions of the data
set to build useful variables for modeling.


Here's the help file of my favorite (missing from R) data manipulation
command:

-------------------------------------------------------------------------------
help for reshape                                         (manual:  [R] reshape)
-------------------------------------------------------------------------------

Convert data from wide to long and vice versa
---------------------------------------------

Basic syntax:

        reshape wide varnames, i(varlist) [ j(varname [values]) ... ]
        reshape long varnames, i(varlist) [ j(varname [values]) ... ]

        reshape wide
        reshape long

        reshape error

        where
                values is           #[-#] [#[-#] ...]
                ... is              string atwl(chars)

        Both are seldom specified.
        
        
Advanced syntax:
        
        reshape i   varlist
        reshape j   varname [#[-#] [#[-#] ...]] [, string]
        reshape xij fvarnames [, atwl(chars)]
        reshape xi  [varlist]
        
        reshape [query]
        
        reshape wide
        reshape long
        
        reshape error
        
        reshape clear
        
        where fvarnames are either varnames, varnames with @ characters, or a
                mix of the two.  The @ character denotes where the # (j) suffix
                appears.
                
Description
-----------
        
reshape converts data from wide to long form and vice versa.  Think of the data
as a collection of observations x_ij.  One such collection might be
        
                 (wide form)                          (long form)
        
        -i-       ------- x_ij --------         -i-  -j-         -x_ij-
        id  sex   inc80   inc81   inc82         id   year   sex    inc
        -------------------------------         ----------------------
         1    0    5000    5500    6000          1     80     0   5000
         2    1    2000    2200    3300          1     81     0   5500
         3    0    3000    2000    1000          1     82     0   6000
                                                 2     80     1   2000
                                                 2     81     1   2200
                                                 2     82     1   3300
                                                 3     80     0   3000
                                                 3     81     0   2000
                                                 3     82     0   1000
        
reshape converts data from one form to the other:
        
        . reshape long inc, i(id) j(year)      (goes from left-form to right)
        . reshape wide inc, i(id) j(year)      (goes from right-form to left)
        
See [R] reshape for a detailed discussion and examples for both the basic and
advanced syntax.
        
        
Options 
------- 
        
i(varlist) specifies the variable(s) whose unique values denote a logical
    observation.
        
j(varname [values]) specifies the variable whose unique values denote a sub-
    observation.  values list the unique values to be used from varname and
    typically is not explicitly stated since reshape will determine them
    automatically from the data.
        
string specifies that the j() may contain string values.
       
atwl(chars) specifies that chars should be substituted for the @ character when
    converting the data to the long form.
        
        
Examples
--------
        
 . reshape long inc ue, i(id) j(year)             converts from wide to long
 . reshape wide                                   converts back to wide
        
 . reshape ..., i(id)                             one i() variable
 . reshape ..., i(hid pid)                        two i() variables
        
 . reshape long inc, i(id) j(year 80-82 85)       specifying j() values
        
 . reshape long inc, i(id) j(sex) string          allow string var. in j()
        
        


-- 
A.J. Rossini				Rsrch. Asst. Prof. of Biostatistics
BlindGlobe Networks (home/default)	rossini@blindglobe.net	
UW Biostat/Center for AIDS Research	rossini@u.washington.edu	
FHCRC/SCHARP/HIV Vaccine Trials Net	rossini@scharp.org

FHCRC: M/Tu: 206-667-7025 (fax=4812) | Voicemail is pretty sketchy
CFAR:   W/F: 206-731-3647 (fax=3694) | Email is far better than phone
UW:    Th/F: 206-543-1044 (fax=3286) | Change last 4 digits of phone for fax
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._