[R] Table Transformation
Paul Johnson
pauljohn32 at gmail.com
Wed Mar 4 20:14:35 CET 2009
On Wed, Mar 4, 2009 at 11:58 AM, Christian Pilger
<christian.pilger at gmx.net> wrote:
>
> Dear R-experts,
>
> recently, I started to discover the world of R. I came across a problem,
> that I was unable to solve by myself (including searches in R-help, etc.)
>
> I have a flat table similar to
>
> key1 key2 value1
>
> abcd_1 BP 10
> abcd_1 BSMP 1A
> abcd_1 PD 25
> abcd_2 BP 20
> abcd_3 BP 80
> abcd_4 IA 30
> abcd_4 PD 70
> abcd_4 PS N
>
> I wish to transform this table to obtain the following result:
>
> key2
> key1 BP BSMP IA PD PS
> abcd_1 "10" "1A" "" "25" ""
> abcd_2 "20" "" "" "" ""
> abcd_3 "80" "" "" "" ""
> abcd_4 "" "" "30" "70" "N"
>
I think we would say that a dataframe of the first type is in the
"long" format, while the other one you want is in the "wide" format.
I've done changes like that with the "reshape" function that is in the
stats package.
This example you propose is like making one column for each "country"
where key 1 is like the "year" in which the observation is made.
Right?
You don't have an easily cut-and-pasteable code example, so I've
generated a little working example. Here, x1 is key 1 and x2 is key 2.
> x1 <- gl(4,5, labels=c("c1","c2","c3","c4"))
> x1
[1] c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c3 c3 c3 c3 c3 c4 c4 c4 c4 c4
Levels: c1 c2 c3 c4
> x2 <- rep(1:5,4)
> df <- data.frame(x1, x2, y=rnorm(20))
> df
x1 x2 y
1 c1 1 0.02095747
2 c1 2 0.05926233
3 c1 3 -0.07561916
4 c1 4 -1.06272710
5 c1 5 -1.89202032
6 c2 1 -0.04549782
7 c2 2 -0.68333187
8 c2 3 -0.99151410
9 c2 4 -0.29070280
10 c2 5 -0.97655024
11 c3 1 0.33411223
12 c3 2 -0.24907340
13 c3 3 -0.25469819
14 c3 4 1.23956157
15 c3 5 -1.38162430
16 c4 1 0.50343661
17 c4 2 -0.58126964
18 c4 3 0.24256348
19 c4 4 -0.39398578
20 c4 5 0.01664450
> reshape(df, direction="wide", timevar="x2", idvar="x1")
x1 y.1 y.2 y.3 y.4 y.5
1 c1 0.02095747 0.05926233 -0.07561916 -1.0627271 -1.8920203
6 c2 -0.04549782 -0.68333187 -0.99151410 -0.2907028 -0.9765502
11 c3 0.33411223 -0.24907340 -0.25469819 1.2395616 -1.3816243
16 c4 0.50343661 -0.58126964 0.24256348 -0.3939858 0.0166445
>
Your case will have many missings, but I think the idea is the same.
HTH
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-help
mailing list