[R] Reformatting a table

Thu Nov 20 23:01:14 CET 2008

on 11/20/2008 02:28 PM Tul Gan wrote:
> Hi !
> ���������� I am new to R. Can somebody help me in reformatting�huge output files ,i.e, rearranging sets of columns in specific order.
> For example: I have data for three compunds 1, 2 and 3
> file1:
> ID CA1 CA3 CA2 MA2 MA1 MA3 
> 1 14 15 13 7 12 3 
> 2 19 7 12 10 14 5 
> 3 21 12 19 6 8 9 
> �
> to 
> File 2:
> �
> ID CA1 CA2 CA3 MA1 MA2 MA3 
> 1 14 13 15 12 7 3 
> 2 19 12 7 14 10 5 
> 3 21 19 12 8 6 9 
> �
> or File3:
> ID CA1 MA1 CA2 MA2 CA3 MA3 
> 1 14 12 13 7 15 3 
> 2 19 14 12 10 7 5 
> 3 21 8 19 6 12 9 
> �
> Thanks for your help,
> Tul Gan

A general approach to the first case, where 'DF' is the initial data frame:

> DF[, c("ID", sort(names(DF)[-1]))]
  ID CA1 CA2 CA3 MA1 MA2 MA3
1  1  14  13  15  12   7   3
2  2  19  12   7  14  10   5
3  3  21  19  12   8   6   9

Essentially, you are sorting the column names less "ID", then reordering
the columns of 'DF' using indexing.

> names(DF)[-1]
[1] "CA1" "CA3" "CA2" "MA2" "MA1" "MA3"

> sort(names(DF)[-1])
[1] "CA1" "CA2" "CA3" "MA1" "MA2" "MA3"

The second case is a little more complicated, since you are
independently sorting on both alpha and numeric values, rather than just
 alpha.

If there is a predictable sequence of names, you could do something like:

> paste(c("CA", "MA"), rep(1:3, each = 2), sep = "")
[1] "CA1" "MA1" "CA2" "MA2" "CA3" "MA3"

Thus:

> DF[, c("ID", paste(c("CA", "MA"), rep(1:3, each = 2), sep = ""))]
  ID CA1 MA1 CA2 MA2 CA3 MA3
1  1  14  12  13   7  15   3
2  2  19  14  12  10   7   5
3  3  21   8  19   6  12   9

If the sequence is not predictable, such that you would generate
non-existing column names with the above, then we need to split the
existing column names and sort them separately. Something along the
lines of the following:

# use gsub() to split out the alpha and numeric values. Note that we
# need to create a data frame to handle an alpha and a numeric column

names.df <- data.frame(Alpha = gsub("[[:digit:]]", "",
                                    names(DF)[-1]),
                       Digits = as.numeric(gsub("[[:alpha:]]", "",
                                                names(DF)[-1])))

> names.df
  Alpha Digits
1    CA      1
2    CA      3
3    CA      2
4    MA      2
5    MA      1
6    MA      3

Now, get the indices for the sorted rows, based first upon the digits
value, subsorted by the Alpha value:

> order(names.df$Digits, names.df$Alpha)
[1] 1 5 3 4 2 6

Now use those values to index the columns in 'DF'. Remember that we need
to add 1 to each index, to account for the "ID" column, which is not
included in 'names.mat':

> c(1, order(names.df$Digits, names.df$Alpha) + 1)
[1] 1 2 6 4 5 3 7

Thus:

> DF[, c(1, order(names.df$Digits, names.df$Alpha) + 1)]
  ID CA1 MA1 CA2 MA2 CA3 MA3
1  1  14  12  13   7  15   3
2  2  19  14  12  10   7   5
3  3  21   8  19   6  12   9

Note, that we could also solve the first case, by reversing the sorting
levels from names.df:

> DF[, c(1, order(names.df$Alpha, names.df$Digits) + 1)]
  ID CA1 CA2 CA3 MA1 MA2 MA3
1  1  14  13  15  12   7   3
2  2  19  12   7  14  10   5
3  3  21  19  12   8   6   9

See ?gsub, ?sort and ?order

HTH,

Marc Schwartz