[R] Help with Data Transformation

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Tue Jan 11 09:48:34 CET 2011


Dear Guy,

Have a look at cast() from the reshape package. You'll need something
like

cast(fldsampleid ~ Analysis, value = "Result", data = your.data.frame)

Best regards,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op r-project.org 
> [mailto:r-help-bounces op r-project.org] Namens Guy Jett
> Verzonden: maandag 10 januari 2011 22:00
> Aan: r-help op r-project.org
> Onderwerp: [R] Help with Data Transformation
> 
> Greetings,
> I am new to R and am having trouble with parsing a file with 
> the following characteristics:
> 
> *         Individual results for a single sample are written 
> to multiple lines.
> 
> *         First 16 columns are constant from sample to sample.
> 
> *         Remaining 10 need to be matched up (cross-tabbed?)
> 
> o   (the exact contents for the remaining 10 vary from sample 
> to sample, as indicated in the extract below)
> 
> *         Ultimate goal is to run various comparisons between 
> the variable columns, compare samples from separate 
> populations, and graph samples from the separate populations.
> 
> *         (An extract is provided below)
> 
> The data is initially extracted from an SQL database into 
> Excel, then saved as a tab-delimited text file for use in R.
> I have been successful in using subset() to extract specific 
> sample types, but have not yet been able to transform the 
> data so that all the data needed is on a single line.  I have 
> looked at several R manuals, read through 'R in a Nutshell', 
> prowled the help resources (R Site Search and the Google 
> link), tried stack(), subset(), reshape(), and several other 
> functions, to no avail.
> 
> Thank you very much for your help.  This seems like a 
> wonderful community,
> Guy Jett, R.G.
> Project Geologist
> gjett op itsi.com<mailto:gjett op itsi.com>
> 
> Example Data Input (subset):
>                 fldsampid            CLP_ID  sacode  matrix   
> etc...      prccode                Lab         EXMCODE        
>    Analysis                PARLABEL                PARVQ Result
> 2268       LHR020GW-01E2                               N      
>        WG                        INO        BRLS      NONE    
> E300       CL           =             23590.9
> 2269       LHR020GW-01E2                               N      
>        WG                        INO        BRLS      NONE    
> E300       PO4        ND          50
> 2270       LHR020GW-01E2                               N      
>        WG                        INO        BRLS      NONE    
> E300       SO4        =             22460
> 2272       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1631    HG          =             0.00171
> 2273       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1638    AG          =             2.57
> 2274       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1638    AL           =             122
> 2275       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1638    AS           =             317
> 2276       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1638    B             =             9970
> 2289       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1638    V             =             131
> 2290       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      FLDFLT  
> E1638    Zn           =             1.76
> 2291       LHR020GW-01E2                               N      
>        WG                        MET       BRLS      METHOD   
>            E1638    PB           ND                0.008
> 2292       LHR020GW-01E2                               N      
>        WG                        MI          BRLS      NONE   
>  A2320    ALK        =             807000
> 2293       LHR020GW-01E2                               N      
>        WG                        MI          BRLS      NONE   
>  A2320    ALKB      =             807000
> 2294       LHR020GW-01E2                               N      
>        WG                        MI          BRLS      NONE   
>  A2320    ALKC      ND          2500
> 2295       LHR020GW-01E2                               N      
>        WG                        ORG       BRLS      NONE    
> A5310B DOC       =             49330
> 2296       LHR020GW-01E2                               N      
>        WG                        SN          BRLS      NONE   
>  E300       NO3       =             792
> 2326       LHR020SD-00E2                 N             SE     
>                       MET       BRLS      METHOD              
> E1630    MEHG   =             4.28
> 2327       LHR020SD-00E2                 N             SE     
>                       MI          BRLS      METHOD            
>   E160.3   SOLID    =             48.45
> 2328       LHR020SD-00E2                 N             SE     
>                       ORG       BRLS      NONE    SW9060      
>           TOC        =             4.823
> 2329       LHR020SD-00E2 MY77J8 N             SE              
>              MET       A4SW    METHOD              C245.5   
> HG          =             5100
> 2330       LHR020SD-00E2 MY77J8 N             SE              
>              MET       A4SW    METHOD              E200.8   
> AG          ND          1050
> 2331       LHR020SD-00E2 MY77J8 N             SE              
>              MET       A4SW    METHOD              E200.8   
> AS           =             5500
> 2332       LHR020SD-00E2 MY77J8 N             SE              
>              MET       A4SW    METHOD              E200.8   B 
>             =             11400
> 2346       LHR020SD-00E2 MY77J8 N             SE              
>              MET       A4SW    SW3050B             SW6010B    
>          V             =                56900
> 2349       LHR020SD-00E2 MY77J8 N             SE              
>              MI          A4SW    METHOD              A2540G   
>              SOLID    =             47.7
> 
> Desired output:
>                 fldsampid            CLP_ID  sacode  matrix   
> etc...      CL           PO4        SO4        AG          AL 
>           AS           B             V             Zn         
>        etc...      ALK        ALKB      ALKC      SOLID    
> DOC       TOC        NO3
>                 LHR020GW-01E2                               N 
>             WG                        <value>                
> <value>                <value>                <value>         
>        <value>                <value>                <value>  
>               <value>                <value>                
> <value>                <value>                <value>         
>        <value>                <value>                <value>  
>               <value>                <value>
>                 LHR020SD-00E2 MY77J8 N             SE         
>                   NA          NA          NA          <value> 
>                <value>                <value>                
> <value>                <value>                NA          
> <value>                NA          NA          NA          
> <value>                NA          NA          NA
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list