[R] data shape

Thu Dec 20 17:04:18 CET 2007

HI Tom,
On Dec 20, 2007, at 9:06 AM, Tom Sgouros wrote:

>
> Hello:
>
> I have been give a spreadsheet to work with formed as one big table.
> What it consists of is a 10-row-by-40-column table for each of  
> about 70
> different locations.  In other words, the table row names are repeated
> 70 times, once for each of the locations (whose names also appear  
> in the
> same column, where it's talking about the totals for that  
> location), e.g.:
>             A    B    C
>  Location1 15   73  123   <-  this row is the sum of the following 3
>  Under 10   6   42   23
>  10 - 25    4   15   23
>  Over 25    5   16   77
>  Location2 18   75  113   <- same here
>  Under 10   7   45   13
>  10 - 25    5   18   44
>  Over 25    6   12   56
>
> I want to get this into R as a collection of data frames, one for each
> of my locations.  My questions:
>
>   1. There is a way to handle a collection of data frames, isn't  
> there?
>      No doubt there are plenty, but what's the easiest way, so that I
>      can address them collectively, allowing me to ask such  
> questions as
>      what's the max of the over 25's in column C?

A list is the best way for that. Then you can use things like lapply  
and sapply, as I do towards the end of the script that follows.

>   2. What's the easiest way to read such a data array from a text  
> file?
>      I can do some editing of a csv file produced from the  
> spreadsheet,
>      but don't really know what to aim for.

Here is the code I used to read your example, which I saved as a  
comma-separated file, with the only addition that I added the name  
"Names" to the first column. You will probably need to adjust  
filename, nlocations and rows.per.location.

filename <- "~/Desktop/rows.txt"
nlocations <- 2
rows.per.location <- 3
data <- read.csv(filename)
data$Names <- gsub("\\s","", data$Names, perl=TRUE)  # Trim off  
whitespace from first column
totals <- data[4*seq_len(nlocations)-3,]             # Pick up the  
rows with the totals
actual.data <- data[-(4*seq_len(nlocations)-3),]     # Pick up the rest
location.names <-  totals[,1]                        # The location  
names are now the first column of totals
data.by.location <- split(actual.data, rep(location.names,  
each=rows.per.location)) # this is the "workhorse"
data.by.location <- lapply(data.by.location, function(x) {
   data.frame(x[,-1], row.names=x[,1])                   # Converting  
each list item to a better form
})
totals2 <- sapply(data.by.location, function(x) sapply(x,sum))
all(totals2 == t(totals[,-1]))   # Should return true if the totals  
add up

>   3. Is there some shortcut that would allow me to read this directly
>      from a spreadsheet?

Have a look at the xlsReadWrite package.

> Many thanks,
>
>  -tom
>

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College