[R] reshape

nrm2010 nrm2010 at zoho.com
Wed Jun 20 22:08:31 CEST 2012


Still working on it...to get your original suggestion to run, 
I "escaped" (doubled) the backslashes, so instead of 
x$type <- sub("\..*$", "", x$variable)
which generated the error message
Error: '\.' is an unrecognized escape in character string starting "\."
I tried
x$type <- sub("\\..*$", "", x$variable)

That did not generate the error message and allowed me to 
run your second suggestion:
dcast(x, year+month+day+hhmm+group+number ~ type, mean)
The Tair and VWC columns appeared but the groups all came out as \001, instead of B or T. 
    year month day hhmm group number         Tair          VWC
1   2007     1   1 1100  \001      1  1.553844979  0.325459714
2   2007     1   1 1100  \001     10          NaN -0.693606661

 
I'm also not trying to take the mean, but only to reformat from wide to long, 
so I tried it without the "mean" at the end of the dcast statement:
dcast(x, year+month+day+hhmm+group+number ~ type) 
That resulted in a loss of the measurement data - 
the VWC and Tair columns are still there but they contain only 0's, 1's, or 2's.

   year month day hhmm group number Tair VWC
1   2007     1   1 1100  \001      1    2   2
2   2007     1   1 1100  \001     10    0   1
3   2007     1   1 1100  \001      2    2   2

So, I'm still working on it...

Thanks,

Toby

R v. 2.15.0
Windows OS






---- On Wed, 20 Jun 2012 09:51:45 -0700 jim holtman <jholtman at gmail.com> wrote ---- 
 > Is this closer using the same data:
 > 
 > > dcast(x, year+month+day+hhmm+group+number ~ type, mean)
 >     year month day hhmm group number         Tair           VWC
 > 1   2007     1   1 1100     B      1  0.285404913  1.4337114478
 > 2   2007     1   1 1100     B      2 -0.899893189  1.1004122756
 > 3   2007     1   1 1100     T      1 -0.054938988  0.0565570314
 > 4   2007     1   1 1100     T     10          NaN -0.4645357924
 > 5   2007     1   1 1100     T      2  0.235724005  0.1355795323
 > 6   2007     1   1 1101     B      1  0.008141599  0.2010562273
 > 7   2007     1   1 1101     B      2 -0.094439009  0.7713080532
 > 8   2007     1   1 1101     T      1  1.111981883 -0.5022737361
 > 9   2007     1   1 1101     T     10          NaN  0.3061178598
 > 10  2007     1   1 1101     T      2  0.101879382 -0.1886613029
 > 11  2007     1   2 1100     B      1 -2.189530590  0.0741563102
 > 12  2007     1   2 1100     B      2  0.290204966  1.4341878509
 > 13  2007     1   2 1100     T      1  0.788716089  1.9156394486
 > 14  2007     1   2 1100     T     10          NaN  1.9814202666
 > 15  2007     1   2 1100     T      2  0.255978008  0.0604626563
 > 16  2007     1   2 1101     B      1 -0.479254067  0.7770438119
 > 17  2007     1   2 1101     B      2  0.526466448 -0.1691557783
 > 18  2007     1   2 1101     T      1  2.070198634  0.1129769772
 > 19  2007     1   2 1101     T     10          NaN  1.5263077972
 > 20  2007     1   2 1101     T      2 -0.883571883 -0.5357702486
 > 
 > 
 > On Wed, Jun 20, 2012 at 12:33 PM, nrm2010 <nrm2010 at zoho.com> wrote:
 > >
 > > Thank you, but it doesn't quite.  It still needs a new column for each measurement type,
 > > i.e., a VWC column that contains the VWC measurements for each subject within
 > > group at each time point and a Tair column that contains the
 > > Tair measurements for each subject within group at each time point.
 > >
 > > Toby
 > >
 > >
 > >
 > >
 > > ---- On Wed, 20 Jun 2012 09:25:08 -0700 jim holtman  wrote ----
 > >
 > >>will this do it for you:
 > >>
 > >>
 > >>> require(reshape2)
 > >>> toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
 > >>+ day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length =
 > >>40), plot = rep(1,40),
 > >>+ trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
 > >>+ VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2
 > >>= rnorm(40),
 > >>+ Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))
 > >>>
 > >>> x <- melt(toy, id = c('year', 'month', 'day', 'hhmm', 'plot', 'trt'))
 > >>>
 > >>> # parse the type, group and number
 > >>> x$type <- sub("\..*$", "", x$variable)
 > >>> x$group <- sub(".*\.(.).*", "\1", x$variable)
 > >>> x$number <- sub(".*\..", "", x$variable)
 > >>>
 > >>> head(x)
 > >> year month day hhmm plot trt variable value type group number
 > >>1 2007 1 1 1100 1 4 VWC.B1 0.35680152 VWC B 1
 > >>2 2007 1 1 1101 1 4 VWC.B1 -0.01787670 VWC B 1
 > >>3 2007 1 2 1100 1 4 VWC.B1 -0.07361414 VWC B 1
 > >>4 2007 1 2 1101 1 4 VWC.B1 0.90117558 VWC B 1
 > >>5 2007 2 1 1100 1 4 VWC.B1 0.64824677 VWC B 1
 > >>6 2007 2 1 1101 1 4 VWC.B1 0.26769079 VWC B 1
 > >>>
 > >>
 > >>
 > >>On Wed, Jun 20, 2012 at 12:08 PM, nrm2010  wrote:
 > >>> Hello, helpeRs,
 > >>>
 > >>> I am attempting to reshape (either base R or package reshape) multiple .csv
 > >>> spreadsheets from a very unfortunate wide format to long format.  Each spreadsheet
 > >>> looks something like this, after being read in to R:
 > >>>
 > >>> toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
 > >>> day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length = 40), plot = rep(1,40),
 > >>> trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
 > >>> VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2 = rnorm(40),
 > >>> Tair.T1 =  rnorm(40), Tair.T2 = rnorm(40))
 > >>>
 > >>> Taken together, the columns year, month, day, and hhmm define a timepoint
 > >>> at which at a measurement was taken.  Plot and treatment do not change within
 > >>> any one spreadsheet.  Parsing the column names, VWC and Tair are 2 different
 > >>> types of measurements; in the full dataset, there are more than 2 types of
 > >>> measurements.  The B and T in the column names are grouping variables.  The
 > >>> numbers following the B or T are subjects within groups.  So, if a column name
 > >>> is VWC.B1, it is the VWC measurement taken on subject 1 within group B.
 > >>>
 > >>> I am trying to reshape to long format. The long format should have one new column
 > >>> for each measurement type (VWC and Tair in the toy example), a new column
 > >>> indicating the grouping variable (B or T), and a new column identifyng the
 > >>> subject (the number that follow the B or T). I've been unable to come up with
 > >>> a solution that produces the group and subject within group columns as well
 > >>> as both types of measurements.
 > >>>
 > >>> Any assistance would be greatly appreciated.
 > >>>
 > >>> Thank you.
 > >>>
 > >>> Toby
 > >>>
 > >>> Toby Gass, Ph.D.
 > >>> Visiting Lecturer, Forestry and Wildland Resources
 > >>> Humboldt State University
 > >>> email: toby.gass  humboldt  edu
 > >>>
 > >>> ______________________________________________
 > >>> R-help at r-project.org mailing list
 > >>> https://stat.ethz.ch/mailman/listinfo/r-help
 > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > >>> and provide commented, minimal, self-contained, reproducible code.
 > >>
 > >>
 > >>
 > >>--
 > >>Jim Holtman
 > >>Data Munger Guru
 > >>
 > >>What is the problem that you are trying to solve?
 > >>Tell me what you want to do, not how you want to do it.
 > >>
 > >
 > 
 > 
 > 
 > -- 
 > Jim Holtman
 > Data Munger Guru
 > 
 > What is the problem that you are trying to solve?
 > Tell me what you want to do, not how you want to do it.
 > 
 >



More information about the R-help mailing list