[R] data shape
Charilaos Skiadas
cskiadas at gmail.com
Thu Dec 20 17:04:18 CET 2007
HI Tom,
On Dec 20, 2007, at 9:06 AM, Tom Sgouros wrote:
>
> Hello:
>
> I have been give a spreadsheet to work with formed as one big table.
> What it consists of is a 10-row-by-40-column table for each of
> about 70
> different locations. In other words, the table row names are repeated
> 70 times, once for each of the locations (whose names also appear
> in the
> same column, where it's talking about the totals for that
> location), e.g.:
> A B C
> Location1 15 73 123 <- this row is the sum of the following 3
> Under 10 6 42 23
> 10 - 25 4 15 23
> Over 25 5 16 77
> Location2 18 75 113 <- same here
> Under 10 7 45 13
> 10 - 25 5 18 44
> Over 25 6 12 56
>
> I want to get this into R as a collection of data frames, one for each
> of my locations. My questions:
>
> 1. There is a way to handle a collection of data frames, isn't
> there?
> No doubt there are plenty, but what's the easiest way, so that I
> can address them collectively, allowing me to ask such
> questions as
> what's the max of the over 25's in column C?
A list is the best way for that. Then you can use things like lapply
and sapply, as I do towards the end of the script that follows.
> 2. What's the easiest way to read such a data array from a text
> file?
> I can do some editing of a csv file produced from the
> spreadsheet,
> but don't really know what to aim for.
Here is the code I used to read your example, which I saved as a
comma-separated file, with the only addition that I added the name
"Names" to the first column. You will probably need to adjust
filename, nlocations and rows.per.location.
filename <- "~/Desktop/rows.txt"
nlocations <- 2
rows.per.location <- 3
data <- read.csv(filename)
data$Names <- gsub("\\s","", data$Names, perl=TRUE) # Trim off
whitespace from first column
totals <- data[4*seq_len(nlocations)-3,] # Pick up the
rows with the totals
actual.data <- data[-(4*seq_len(nlocations)-3),] # Pick up the rest
location.names <- totals[,1] # The location
names are now the first column of totals
data.by.location <- split(actual.data, rep(location.names,
each=rows.per.location)) # this is the "workhorse"
data.by.location <- lapply(data.by.location, function(x) {
data.frame(x[,-1], row.names=x[,1]) # Converting
each list item to a better form
})
totals2 <- sapply(data.by.location, function(x) sapply(x,sum))
all(totals2 == t(totals[,-1])) # Should return true if the totals
add up
> 3. Is there some shortcut that would allow me to read this directly
> from a spreadsheet?
Have a look at the xlsReadWrite package.
> Many thanks,
>
> -tom
>
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College
More information about the R-help
mailing list