[R] Help with Data Transformation
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Tue Jan 11 09:48:34 CET 2011
Dear Guy,
Have a look at cast() from the reshape package. You'll need something
like
cast(fldsampleid ~ Analysis, value = "Result", data = your.data.frame)
Best regards,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium
Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op r-project.org
> [mailto:r-help-bounces op r-project.org] Namens Guy Jett
> Verzonden: maandag 10 januari 2011 22:00
> Aan: r-help op r-project.org
> Onderwerp: [R] Help with Data Transformation
>
> Greetings,
> I am new to R and am having trouble with parsing a file with
> the following characteristics:
>
> * Individual results for a single sample are written
> to multiple lines.
>
> * First 16 columns are constant from sample to sample.
>
> * Remaining 10 need to be matched up (cross-tabbed?)
>
> o (the exact contents for the remaining 10 vary from sample
> to sample, as indicated in the extract below)
>
> * Ultimate goal is to run various comparisons between
> the variable columns, compare samples from separate
> populations, and graph samples from the separate populations.
>
> * (An extract is provided below)
>
> The data is initially extracted from an SQL database into
> Excel, then saved as a tab-delimited text file for use in R.
> I have been successful in using subset() to extract specific
> sample types, but have not yet been able to transform the
> data so that all the data needed is on a single line. I have
> looked at several R manuals, read through 'R in a Nutshell',
> prowled the help resources (R Site Search and the Google
> link), tried stack(), subset(), reshape(), and several other
> functions, to no avail.
>
> Thank you very much for your help. This seems like a
> wonderful community,
> Guy Jett, R.G.
> Project Geologist
> gjett op itsi.com<mailto:gjett op itsi.com>
>
> Example Data Input (subset):
> fldsampid CLP_ID sacode matrix
> etc... prccode Lab EXMCODE
> Analysis PARLABEL PARVQ Result
> 2268 LHR020GW-01E2 N
> WG INO BRLS NONE
> E300 CL = 23590.9
> 2269 LHR020GW-01E2 N
> WG INO BRLS NONE
> E300 PO4 ND 50
> 2270 LHR020GW-01E2 N
> WG INO BRLS NONE
> E300 SO4 = 22460
> 2272 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1631 HG = 0.00171
> 2273 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1638 AG = 2.57
> 2274 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1638 AL = 122
> 2275 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1638 AS = 317
> 2276 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1638 B = 9970
> 2289 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1638 V = 131
> 2290 LHR020GW-01E2 N
> WG MET BRLS FLDFLT
> E1638 Zn = 1.76
> 2291 LHR020GW-01E2 N
> WG MET BRLS METHOD
> E1638 PB ND 0.008
> 2292 LHR020GW-01E2 N
> WG MI BRLS NONE
> A2320 ALK = 807000
> 2293 LHR020GW-01E2 N
> WG MI BRLS NONE
> A2320 ALKB = 807000
> 2294 LHR020GW-01E2 N
> WG MI BRLS NONE
> A2320 ALKC ND 2500
> 2295 LHR020GW-01E2 N
> WG ORG BRLS NONE
> A5310B DOC = 49330
> 2296 LHR020GW-01E2 N
> WG SN BRLS NONE
> E300 NO3 = 792
> 2326 LHR020SD-00E2 N SE
> MET BRLS METHOD
> E1630 MEHG = 4.28
> 2327 LHR020SD-00E2 N SE
> MI BRLS METHOD
> E160.3 SOLID = 48.45
> 2328 LHR020SD-00E2 N SE
> ORG BRLS NONE SW9060
> TOC = 4.823
> 2329 LHR020SD-00E2 MY77J8 N SE
> MET A4SW METHOD C245.5
> HG = 5100
> 2330 LHR020SD-00E2 MY77J8 N SE
> MET A4SW METHOD E200.8
> AG ND 1050
> 2331 LHR020SD-00E2 MY77J8 N SE
> MET A4SW METHOD E200.8
> AS = 5500
> 2332 LHR020SD-00E2 MY77J8 N SE
> MET A4SW METHOD E200.8 B
> = 11400
> 2346 LHR020SD-00E2 MY77J8 N SE
> MET A4SW SW3050B SW6010B
> V = 56900
> 2349 LHR020SD-00E2 MY77J8 N SE
> MI A4SW METHOD A2540G
> SOLID = 47.7
>
> Desired output:
> fldsampid CLP_ID sacode matrix
> etc... CL PO4 SO4 AG AL
> AS B V Zn
> etc... ALK ALKB ALKC SOLID
> DOC TOC NO3
> LHR020GW-01E2 N
> WG <value>
> <value> <value> <value>
> <value> <value> <value>
> <value> <value>
> <value> <value> <value>
> <value> <value> <value>
> <value> <value>
> LHR020SD-00E2 MY77J8 N SE
> NA NA NA <value>
> <value> <value>
> <value> <value> NA
> <value> NA NA NA
> <value> NA NA NA
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list