[R] Help with Data Transformation
John Kane
jrkrideau at yahoo.ca
Mon Jan 10 22:43:15 CET 2011
That sample data set is really hard to read. Could you resent it after having used dput on it?
A file output with dput is easily read into R and makes seeing what you need much easier. BTW what are the = doing?
Thanks
--- On Mon, 1/10/11, Guy Jett <GJett at itsi.com> wrote:
> From: Guy Jett <GJett at itsi.com>
> Subject: [R] Help with Data Transformation
> To: "r-help at r-project.org" <r-help at r-project.org>
> Received: Monday, January 10, 2011, 3:59 PM
> Greetings,
> I am new to R and am having trouble with parsing a file
> with the following characteristics:
>
> * Individual results
> for a single sample are written to multiple lines.
>
> * First 16 columns
> are constant from sample to sample.
>
> * Remaining 10 need
> to be matched up (cross-tabbed?)
>
> o (the exact contents for the remaining 10
> vary from sample to sample, as indicated in the extract
> below)
>
> * Ultimate goal is to
> run various comparisons between the variable columns,
> compare samples from separate populations, and graph samples
> from the separate populations.
>
> * (An extract is
> provided below)
>
> The data is initially extracted from an SQL database into
> Excel, then saved as a tab-delimited text file for use in
> R.
> I have been successful in using subset() to extract
> specific sample types, but have not yet been able to
> transform the data so that all the data needed is on a
> single line. I have looked at several R manuals, read
> through 'R in a Nutshell', prowled the help resources (R
> Site Search and the Google link), tried stack(), subset(),
> reshape(), and several other functions, to no avail.
>
> Thank you very much for your help. This seems like a
> wonderful community,
> Guy Jett, R.G.
> Project Geologist
> gjett at itsi.com<mailto:gjett at itsi.com>
>
> Example Data Input (subset):
>
> fldsampid
> CLP_ID sacode
> matrix etc...
> prccode
> Lab
> EXMCODE
> Analysis
> PARLABEL
> PARVQ Result
> 2268 LHR020GW-01E2
>
> N
> WG
>
> INO
> BRLS NONE E300
> CL
> =
> 23590.9
> 2269 LHR020GW-01E2
>
> N
> WG
>
> INO
> BRLS NONE E300
> PO4 ND
> 50
> 2270 LHR020GW-01E2
>
> N
> WG
>
> INO
> BRLS NONE E300
> SO4 =
> 22460
> 2272 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1631 HG
> =
> 0.00171
> 2273 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1638 AG
> = 2.57
> 2274 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1638 AL
> =
> 122
> 2275 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1638 AS
> =
> 317
> 2276 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1638 B
> =
> 9970
> 2289 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1638 V
> =
> 131
> 2290 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS FLDFLT
> E1638 Zn
> =
> 1.76
> 2291 LHR020GW-01E2
>
> N
> WG
>
> MET
> BRLS METHOD
> E1638
> PB ND
> 0.008
> 2292 LHR020GW-01E2
>
> N
> WG
>
> MI
> BRLS NONE A2320
> ALK =
> 807000
> 2293 LHR020GW-01E2
>
> N
> WG
>
> MI
> BRLS NONE A2320
> ALKB =
> 807000
> 2294 LHR020GW-01E2
>
> N
> WG
>
> MI
> BRLS NONE A2320
> ALKC ND
> 2500
> 2295 LHR020GW-01E2
>
> N
> WG
>
> ORG
> BRLS NONE
> A5310B DOC =
> 49330
> 2296 LHR020GW-01E2
>
> N
> WG
>
> SN
> BRLS NONE E300
> NO3 =
> 792
> 2326 LHR020SD-00E2
>
> N
> SE
>
> MET
> BRLS METHOD
> E1630
> MEHG =
> 4.28
> 2327 LHR020SD-00E2
>
> N
> SE
>
> MI
> BRLS METHOD
> E160.3 SOLID
> =
> 48.45
> 2328 LHR020SD-00E2
>
> N
> SE
>
> ORG
> BRLS NONE
> SW9060
> TOC =
> 4.823
> 2329 LHR020SD-00E2 MY77J8
> N
> SE
>
> MET
> A4SW METHOD
>
> C245.5 HG
> = 5100
> 2330 LHR020SD-00E2 MY77J8
> N
> SE
>
> MET
> A4SW METHOD
>
> E200.8 AG
> ND 1050
> 2331 LHR020SD-00E2 MY77J8
> N
> SE
>
> MET
> A4SW METHOD
>
> E200.8 AS
> =
> 5500
> 2332 LHR020SD-00E2 MY77J8
> N
> SE
>
> MET
> A4SW METHOD
>
> E200.8 B
> =
> 11400
> 2346 LHR020SD-00E2 MY77J8
> N
> SE
>
> MET
> A4SW SW3050B
> SW6010B
> V
> =
> 56900
> 2349 LHR020SD-00E2 MY77J8
> N
> SE
>
> MI
> A4SW METHOD
> A2540G
> SOLID =
> 47.7
>
> Desired output:
>
> fldsampid
> CLP_ID sacode
> matrix etc... CL
> PO4
> SO4 AG
> AL
> AS
> B
> V
> Zn
> etc... ALK
> ALKB ALKC
> SOLID DOC
> TOC NO3
>
> LHR020GW-01E2
>
> N
> WG
>
> <value>
> <value>
> <value>
>
> <value>
> <value>
> <value>
>
> <value>
> <value>
> <value>
>
> <value>
> <value>
> <value>
>
> <value>
> <value>
> <value>
>
> <value>
> <value>
>
> LHR020SD-00E2 MY77J8 N
> SE
>
> NA
> NA NA
> <value>
> <value>
>
> <value>
> <value>
> <value>
> NA
> <value>
> NA
> NA NA
> <value>
> NA
> NA
> NA
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
More information about the R-help
mailing list