[R] Data frame manipulation - newbie question
jim holtman
jholtman at gmail.com
Mon Jan 7 02:41:18 CET 2008
There are a number of different ways that you would have to manipulate
your data to do what you want. It is useful to learn some of these
techniques. Here, I think, are the set of actions that you want to
do.
> x <- read.table(textConnection("row k.idx step.forwd pt.num model prev value abs.error
+ 1 200 0 1 lm 09
10.5 1.5
+ 2 200 0 2 lm 11
10.5 1.5
+ 3 201 1 1 lm 10
12 2.0
+ 4 201 1 2 lm 12
12 2.0
+ 5 202 2 1 lm 12
12.1 0.1
+ 6 202 2 2 lm 12
12.1 0.1
+ 7 200 0 1 rlm 10.1
10.5 0.4
+ 8 200 0 2 rlm 10.3
10.5 0.2
+ 9 201 1 1 rlm 11.6
12 0.4
+ 10 201 1 2 rlm 11.4
12 0.6
+ 11 202 2 1 rlm 11.8
12.1 0.1
+ 12 202 2 2 rlm 11.9
12.1 0.2"), header=TRUE)
> closeAllConnections()
>
> # split the data by the grouping factors
> x.split <- split(x, list(x$k.idx, x$step.forwd, x$model), drop=TRUE)
> x.split
$`200.0.lm`
row k.idx step.forwd pt.num model prev value abs.error
1 1 200 0 1 lm 9 10.5 1.5
2 2 200 0 2 lm 11 10.5 1.5
$`201.1.lm`
row k.idx step.forwd pt.num model prev value abs.error
3 3 201 1 1 lm 10 12 2
4 4 201 1 2 lm 12 12 2
$`202.2.lm`
row k.idx step.forwd pt.num model prev value abs.error
5 5 202 2 1 lm 12 12.1 0.1
6 6 202 2 2 lm 12 12.1 0.1
$`200.0.rlm`
row k.idx step.forwd pt.num model prev value abs.error
7 7 200 0 1 rlm 10.1 10.5 0.4
8 8 200 0 2 rlm 10.3 10.5 0.2
$`201.1.rlm`
row k.idx step.forwd pt.num model prev value abs.error
9 9 201 1 1 rlm 11.6 12 0.4
10 10 201 1 2 rlm 11.4 12 0.6
$`202.2.rlm`
row k.idx step.forwd pt.num model prev value abs.error
11 11 202 2 1 rlm 11.8 12.1 0.1
12 12 202 2 2 rlm 11.9 12.1 0.2
>
> # now take the means of given columns
> x.mean <- lapply(x.split, function(.grp) colMeans(.grp[, c('prev', 'value', 'abs.error')]))
>
> # put back into a matrix
> (x.mean <- do.call(rbind, x.mean))
prev value abs.error
200.0.lm 10.00 10.5 1.50
201.1.lm 11.00 12.0 2.00
202.2.lm 12.00 12.1 0.10
200.0.rlm 10.20 10.5 0.30
201.1.rlm 11.50 12.0 0.50
202.2.rlm 11.85 12.1 0.15
>
> #boxplot
> boxplot(abs.error ~ k.idx, data=x)
>
> # create a table with average of the abs.error for each 'model'
> cbind(x, abs.error.mean=ave(x$abs.error, x$model))
row k.idx step.forwd pt.num model prev value abs.error abs.error.mean
1 1 200 0 1 lm 9.0 10.5 1.5 1.2000000
2 2 200 0 2 lm 11.0 10.5 1.5 1.2000000
3 3 201 1 1 lm 10.0 12.0 2.0 1.2000000
4 4 201 1 2 lm 12.0 12.0 2.0 1.2000000
5 5 202 2 1 lm 12.0 12.1 0.1 1.2000000
6 6 202 2 2 lm 12.0 12.1 0.1 1.2000000
7 7 200 0 1 rlm 10.1 10.5 0.4 0.3166667
8 8 200 0 2 rlm 10.3 10.5 0.2 0.3166667
9 9 201 1 1 rlm 11.6 12.0 0.4 0.3166667
10 10 201 1 2 rlm 11.4 12.0 0.6 0.3166667
11 11 202 2 1 rlm 11.8 12.1 0.1 0.3166667
12 12 202 2 2 rlm 11.9 12.1 0.2 0.3166667
>
On Jan 6, 2008 10:50 AM, Rense Nieuwenhuis <rense.nieuwenhuis at gmail.com> wrote:
> Hi,
>
> you may want to use that apply / tapply function. Some find it a bit
> hard to grasp at first, but it will help you many times in many
> situations when you get the hang of it.
>
> Maybe you can get some information on my site: http://
> www.rensenieuwenhuis.nl/r-project/manual/basics/tables/
>
>
> Hope this helps,
>
> Rense Nieuwenhuis
>
>
>
> On Jan 3, 2008, at 11:53 , José Augusto M. de Andrade Junior wrote:
>
> > Hi all,
> >
> > Could someone please explain how can i efficientily query a data frame
> > with several factors, as shown below:
> >
> > ----------------------------------------------------------------------
> > -----------------------------------
> > Data frame: pt.knn
> > ----------------------------------------------------------------------
> > -----------------------------------
> > row | k.idx | step.forwd | pt.num | model | prev | value
> > | abs.error
> > 1 200 0 1 lm 09
> > 10.5 1.5
> > 2 200 0 2 lm 11
> > 10.5 1.5
> > 3 201 1 1 lm 10
> > 12 2.0
> > 4 201 1 2 lm 12
> > 12 2.0
> > 5 202 2 1 lm 12
> > 12.1 0.1
> > 6 202 2 2 lm 12
> > 12.1 0.1
> > 7 200 0 1 rlm 10.1
> > 10.5 0.4
> > 8 200 0 2 rlm 10.3
> > 10.5 0.2
> > 9 201 1 1 rlm 11.6
> > 12 0.4
> > 10 201 1 2 rlm 11.4
> > 12 0.6
> > 11 202 2 1 rlm 11.8
> > 12.1 0.1
> > 12 202 2 2 rlm 11.9
> > 12.1 0.2
> > ----------------------------------------------------------------------
> > ------------------------------------
> >
> > k.idx, step.forwd, pt.num and model columns are FACTORS.
> > prev, value, abs.error are numeric
> >
> > I need to take the mean value of the numeric columns (prev, value and
> > abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2,
> > 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped
> > together.
> >
> > Next, i need to plot a boxplot of the mean(abs.error) of each model
> > for each k.idx.
> > I need to compare the abs.error of the two models for each step and
> > the mean overall abs.error of each model. And so on.
> >
> > I read the manuals, but the examples there are too simple. I know how
> > to do this manipulation in a "brute force" manner, but i wish to learn
> > how to work the right way with R.
> >
> > Could someone help me?
> > Thanks in advance.
> >
> > José Augusto
> > Undergraduate student
> > University of São Paulo
> > Business Administration Faculty
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list