[R] reshape
jim holtman
jholtman at gmail.com
Thu Jun 21 01:58:44 CEST 2012
Something got lost in the transmission since the '\\' were replaced
with '\' I will try gain with the data: (also attached as a text
file)
require(reshape2)
toy <- data.frame(year = rep(2007:2008,each = 20), month =
rep(1:5,each = 4, length = 40),
day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length =
40), plot = rep(1,40),
trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2=
rnorm(40),
Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))
x <- melt(toy, id = c('year', 'month', 'day', 'hhmm', 'plot', 'trt'))
# parse the type, group and number
x$type <- sub("\\..*$", "", x$variable)
x$group <- sub(".*\\.(.).*", "\\1", x$variable)
x$number <- sub(".*\\..", "", x$variable)
head(x)
On Wed, Jun 20, 2012 at 4:08 PM, nrm2010 <nrm2010 at zoho.com> wrote:
> Still working on it...to get your original suggestion to run,
> I "escaped" (doubled) the backslashes, so instead of
> x$type <- sub("\..*$", "", x$variable)
> which generated the error message
> Error: '\.' is an unrecognized escape in character string starting "\."
> I tried
> x$type <- sub("\\..*$", "", x$variable)
>
> That did not generate the error message and allowed me to
> run your second suggestion:
> dcast(x, year+month+day+hhmm+group+number ~ type, mean)
> The Tair and VWC columns appeared but the groups all came out as \001, instead of B or T.
> year month day hhmm group number Tair VWC
> 1 2007 1 1 1100 \001 1 1.553844979 0.325459714
> 2 2007 1 1 1100 \001 10 NaN -0.693606661
>
>
> I'm also not trying to take the mean, but only to reformat from wide to long,
> so I tried it without the "mean" at the end of the dcast statement:
> dcast(x, year+month+day+hhmm+group+number ~ type)
> That resulted in a loss of the measurement data -
> the VWC and Tair columns are still there but they contain only 0's, 1's, or 2's.
>
> year month day hhmm group number Tair VWC
> 1 2007 1 1 1100 \001 1 2 2
> 2 2007 1 1 1100 \001 10 0 1
> 3 2007 1 1 1100 \001 2 2 2
>
> So, I'm still working on it...
>
> Thanks,
>
> Toby
>
> R v. 2.15.0
> Windows OS
>
>
>
>
>
>
> ---- On Wed, 20 Jun 2012 09:51:45 -0700 jim holtman <jholtman at gmail.com> wrote ----
> > Is this closer using the same data:
> >
> > > dcast(x, year+month+day+hhmm+group+number ~ type, mean)
> > year month day hhmm group number Tair VWC
> > 1 2007 1 1 1100 B 1 0.285404913 1.4337114478
> > 2 2007 1 1 1100 B 2 -0.899893189 1.1004122756
> > 3 2007 1 1 1100 T 1 -0.054938988 0.0565570314
> > 4 2007 1 1 1100 T 10 NaN -0.4645357924
> > 5 2007 1 1 1100 T 2 0.235724005 0.1355795323
> > 6 2007 1 1 1101 B 1 0.008141599 0.2010562273
> > 7 2007 1 1 1101 B 2 -0.094439009 0.7713080532
> > 8 2007 1 1 1101 T 1 1.111981883 -0.5022737361
> > 9 2007 1 1 1101 T 10 NaN 0.3061178598
> > 10 2007 1 1 1101 T 2 0.101879382 -0.1886613029
> > 11 2007 1 2 1100 B 1 -2.189530590 0.0741563102
> > 12 2007 1 2 1100 B 2 0.290204966 1.4341878509
> > 13 2007 1 2 1100 T 1 0.788716089 1.9156394486
> > 14 2007 1 2 1100 T 10 NaN 1.9814202666
> > 15 2007 1 2 1100 T 2 0.255978008 0.0604626563
> > 16 2007 1 2 1101 B 1 -0.479254067 0.7770438119
> > 17 2007 1 2 1101 B 2 0.526466448 -0.1691557783
> > 18 2007 1 2 1101 T 1 2.070198634 0.1129769772
> > 19 2007 1 2 1101 T 10 NaN 1.5263077972
> > 20 2007 1 2 1101 T 2 -0.883571883 -0.5357702486
> >
> >
> > On Wed, Jun 20, 2012 at 12:33 PM, nrm2010 <nrm2010 at zoho.com> wrote:
> > >
> > > Thank you, but it doesn't quite. It still needs a new column for each measurement type,
> > > i.e., a VWC column that contains the VWC measurements for each subject within
> > > group at each time point and a Tair column that contains the
> > > Tair measurements for each subject within group at each time point.
> > >
> > > Toby
> > >
> > >
> > >
> > >
> > > ---- On Wed, 20 Jun 2012 09:25:08 -0700 jim holtman wrote ----
> > >
> > >>will this do it for you:
> > >>
> > >>
> > >>> require(reshape2)
> > >>> toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
> > >>+ day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length =
> > >>40), plot = rep(1,40),
> > >>+ trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
> > >>+ VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2
> > >>= rnorm(40),
> > >>+ Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))
> > >>>
> > >>> x <- melt(toy, id = c('year', 'month', 'day', 'hhmm', 'plot', 'trt'))
> > >>>
> > >>> # parse the type, group and number
> > >>> x$type <- sub("\..*$", "", x$variable)
> > >>> x$group <- sub(".*\.(.).*", "\1", x$variable)
> > >>> x$number <- sub(".*\..", "", x$variable)
> > >>>
> > >>> head(x)
> > >> year month day hhmm plot trt variable value type group number
> > >>1 2007 1 1 1100 1 4 VWC.B1 0.35680152 VWC B 1
> > >>2 2007 1 1 1101 1 4 VWC.B1 -0.01787670 VWC B 1
> > >>3 2007 1 2 1100 1 4 VWC.B1 -0.07361414 VWC B 1
> > >>4 2007 1 2 1101 1 4 VWC.B1 0.90117558 VWC B 1
> > >>5 2007 2 1 1100 1 4 VWC.B1 0.64824677 VWC B 1
> > >>6 2007 2 1 1101 1 4 VWC.B1 0.26769079 VWC B 1
> > >>>
> > >>
> > >>
> > >>On Wed, Jun 20, 2012 at 12:08 PM, nrm2010 wrote:
> > >>> Hello, helpeRs,
> > >>>
> > >>> I am attempting to reshape (either base R or package reshape) multiple .csv
> > >>> spreadsheets from a very unfortunate wide format to long format. Each spreadsheet
> > >>> looks something like this, after being read in to R:
> > >>>
> > >>> toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
> > >>> day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length = 40), plot = rep(1,40),
> > >>> trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
> > >>> VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2 = rnorm(40),
> > >>> Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))
> > >>>
> > >>> Taken together, the columns year, month, day, and hhmm define a timepoint
> > >>> at which at a measurement was taken. Plot and treatment do not change within
> > >>> any one spreadsheet. Parsing the column names, VWC and Tair are 2 different
> > >>> types of measurements; in the full dataset, there are more than 2 types of
> > >>> measurements. The B and T in the column names are grouping variables. The
> > >>> numbers following the B or T are subjects within groups. So, if a column name
> > >>> is VWC.B1, it is the VWC measurement taken on subject 1 within group B.
> > >>>
> > >>> I am trying to reshape to long format. The long format should have one new column
> > >>> for each measurement type (VWC and Tair in the toy example), a new column
> > >>> indicating the grouping variable (B or T), and a new column identifyng the
> > >>> subject (the number that follow the B or T). I've been unable to come up with
> > >>> a solution that produces the group and subject within group columns as well
> > >>> as both types of measurements.
> > >>>
> > >>> Any assistance would be greatly appreciated.
> > >>>
> > >>> Thank you.
> > >>>
> > >>> Toby
> > >>>
> > >>> Toby Gass, Ph.D.
> > >>> Visiting Lecturer, Forestry and Wildland Resources
> > >>> Humboldt State University
> > >>> email: toby.gass humboldt edu
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >>
> > >>
> > >>--
> > >>Jim Holtman
> > >>Data Munger Guru
> > >>
> > >>What is the problem that you are trying to solve?
> > >>Tell me what you want to do, not how you want to do it.
> > >>
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> > Tell me what you want to do, not how you want to do it.
> >
> >
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
-------------- next part --------------
require(reshape2)
toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length = 40), plot = rep(1,40),
trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2= rnorm(40),
Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))
x <- melt(toy, id = c('year', 'month', 'day', 'hhmm', 'plot', 'trt'))
# parse the type, group and number
x$type <- sub("\\..*$", "", x$variable)
x$group <- sub(".*\\.(.).*", "\\1", x$variable)
x$number <- sub(".*\\..", "", x$variable)
head(x)
More information about the R-help
mailing list