[R] [newbie] aggregating table() results and simplifying code with loop

Fri Sep 14 01:25:21 CEST 2012

Hello,

You would get more and better help if you were to break your problem 
into smaller sub-problems.
I am not really sure if this is what you want but here it goes.

inxList <- function(DF, logcover, nYears){
     f <- function(x, n){
         if(any(x)){
             r <- rle(x)
             any(r$lengths[!r$values] == n)
         }else FALSE
     }
     yrs <- lapply(0:5, `+`, seq_len(5))
     inx.list <- lapply(yrs, function(i){apply(logcover[, i], 1, f, 
nYears)})
     inx.list
}

makeCovers <- function(Cover, nYears){
     lcover <- T80[, ycols] == Cover
     inx.list <- inxList(T80[, ycols], lcover, nYears)
     tmp <- rep(NA, length(ws))
     names(tmp) <- ws
     xtb <- do.call(rbind, lapply(seq_along(inx.list), function(ix){
         xt <- xtabs(~ WS, T80[ inx.list[[ix]], c(2, 1 + ix + 1:5) ])
         tmp[names(xt)] <- xt; tmp}))
     colnames(xtb) <- paste("WS", seq_along(ws), sep = "")
     data.frame(xtb, Cover = Cover, Period = seq_along(inx.list))
}

T80 <- read.table("sample.txt", header=TRUE, sep = ";")

ycols <- grep("y", names(T80))
ws <- unique(T80$WS)
covers <- as.character(unique(unlist(T80[ycols])))

result <- lapply( 4:2, function(.n)
         do.call(rbind, lapply(covers, makeCovers, .n)) )
names(result) <- paste("nYears", 4:2, sep = ".")

str(result)
# See the results for 4 years
result[[ "nYears.4" ]]  # or any other number of years

Hope this helps,

Rui Barradas
Em 13-09-2012 14:36, Davide Rizzo escreveu:
> Dear all,
> I'm looking for primary help at aggregating table() results and at
> writing a loop (if useful)
>
> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
> representing a point in the space of which we know the land cover over
> 10 years (column y01 to y10).
>
> I need to analyse it with a temporal sliding window of 5 years (y01 to
> y05, y02 to y06 and so forth)
> For each period I'm looking for specific sequences (e.g., Maize,
> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return time"
> of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
> I define the "return time" as the presence of a given land cover
> according to a given sequence. Hence, each return time could require
> the sum of different sequences (e.g., a return time of 5 years derives
> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
> I need to repeat the calculation for each land cover for each time
> window. In addition, I need to repeat the process over three datasets
> (the one I give is the first one, the second one is from year 12 to
> year 24, the third one from year 27 to year 31. So I have breaks in
> the monitoring of land cover that avoid me to create a continuous
> dataset). At the end I expect to aggregate the sum for each spatial
> entity (column WS)
>
> I've started writing the code for the first crop in the first 5yrs
> period (http://goo.gl/FhZNx) then copying and pasting it for each crop
> then for each time window...
> Moreover I do not know how to aggregate the results of table(). (NB
> sometimes I have a different number of WS per table because a given
> sequence could be absent in a given spatial entity... so I have the
> following warning msg: number of columns of result is not a multiple
> of vector length (arg 1)). Therefore, I'm "obliged" to copy&paste the
> table corresponding to each sequence....
>
> FIRST QUEST. How to aggregate the results of table() when the number
> of columns is different?
> Or the other way around: Is there a way to have a table where each row
> reports the number of points per time return per WS? something like
>
> WS1    WS2    WS3    WS4    ...    WS16    crop    period
> 23    15    18    43    ...    52       Ma5    01
> 18    11    25    84    ...    105       Ma2    01
> ...    ...    ...    ...    ...    ...    ...    ...
> ...    ...    ...    ...    ...    ...    Co5    01
> ...    ...    ...    ...    ...    ...    ...    ...
> ...    ...    ...    ...    ...    ...    Ma5    02
> ...    ...    ...    ...    ...    ...    ...    ...
> In this table each row should represent a return time for a given land
> cover a given period (one of the 6 time window of 5 years)?
>
> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
> improve the time/reliability of the calculation? If yes, could you
> please indicate me some entry-level references to write it?
>
> I am aware this are newbie's questions, but I have not be able to
> solve them using manuals and available sources.
> Thank you in advance for your help.
>
> Greetings,
> Dd
>
> PS
> R: version 2.14.2 (2012-02-29)
> OS: MS Windows XP Home 32-bit SP3
>
>
> *****************************
> Davide Rizzo
> post-doc researcher
> INRA UR055 SAD-ASTER
> website :: http://sites.google.com/site/ridavide/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.