[R] [newbie] aggregating table() results and simplifying code with loop
Rui Barradas
ruipbarradas at sapo.pt
Fri Sep 14 01:25:21 CEST 2012
Hello,
You would get more and better help if you were to break your problem
into smaller sub-problems.
I am not really sure if this is what you want but here it goes.
inxList <- function(DF, logcover, nYears){
f <- function(x, n){
if(any(x)){
r <- rle(x)
any(r$lengths[!r$values] == n)
}else FALSE
}
yrs <- lapply(0:5, `+`, seq_len(5))
inx.list <- lapply(yrs, function(i){apply(logcover[, i], 1, f,
nYears)})
inx.list
}
makeCovers <- function(Cover, nYears){
lcover <- T80[, ycols] == Cover
inx.list <- inxList(T80[, ycols], lcover, nYears)
tmp <- rep(NA, length(ws))
names(tmp) <- ws
xtb <- do.call(rbind, lapply(seq_along(inx.list), function(ix){
xt <- xtabs(~ WS, T80[ inx.list[[ix]], c(2, 1 + ix + 1:5) ])
tmp[names(xt)] <- xt; tmp}))
colnames(xtb) <- paste("WS", seq_along(ws), sep = "")
data.frame(xtb, Cover = Cover, Period = seq_along(inx.list))
}
T80 <- read.table("sample.txt", header=TRUE, sep = ";")
ycols <- grep("y", names(T80))
ws <- unique(T80$WS)
covers <- as.character(unique(unlist(T80[ycols])))
result <- lapply( 4:2, function(.n)
do.call(rbind, lapply(covers, makeCovers, .n)) )
names(result) <- paste("nYears", 4:2, sep = ".")
str(result)
# See the results for 4 years
result[[ "nYears.4" ]] # or any other number of years
Hope this helps,
Rui Barradas
Em 13-09-2012 14:36, Davide Rizzo escreveu:
> Dear all,
> I'm looking for primary help at aggregating table() results and at
> writing a loop (if useful)
>
> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
> representing a point in the space of which we know the land cover over
> 10 years (column y01 to y10).
>
> I need to analyse it with a temporal sliding window of 5 years (y01 to
> y05, y02 to y06 and so forth)
> For each period I'm looking for specific sequences (e.g., Maize,
> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return time"
> of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
> I define the "return time" as the presence of a given land cover
> according to a given sequence. Hence, each return time could require
> the sum of different sequences (e.g., a return time of 5 years derives
> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
> I need to repeat the calculation for each land cover for each time
> window. In addition, I need to repeat the process over three datasets
> (the one I give is the first one, the second one is from year 12 to
> year 24, the third one from year 27 to year 31. So I have breaks in
> the monitoring of land cover that avoid me to create a continuous
> dataset). At the end I expect to aggregate the sum for each spatial
> entity (column WS)
>
> I've started writing the code for the first crop in the first 5yrs
> period (http://goo.gl/FhZNx) then copying and pasting it for each crop
> then for each time window...
> Moreover I do not know how to aggregate the results of table(). (NB
> sometimes I have a different number of WS per table because a given
> sequence could be absent in a given spatial entity... so I have the
> following warning msg: number of columns of result is not a multiple
> of vector length (arg 1)). Therefore, I'm "obliged" to copy&paste the
> table corresponding to each sequence....
>
> FIRST QUEST. How to aggregate the results of table() when the number
> of columns is different?
> Or the other way around: Is there a way to have a table where each row
> reports the number of points per time return per WS? something like
>
> WS1 WS2 WS3 WS4 ... WS16 crop period
> 23 15 18 43 ... 52 Ma5 01
> 18 11 25 84 ... 105 Ma2 01
> ... ... ... ... ... ... ... ...
> ... ... ... ... ... ... Co5 01
> ... ... ... ... ... ... ... ...
> ... ... ... ... ... ... Ma5 02
> ... ... ... ... ... ... ... ...
> In this table each row should represent a return time for a given land
> cover a given period (one of the 6 time window of 5 years)?
>
> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
> improve the time/reliability of the calculation? If yes, could you
> please indicate me some entry-level references to write it?
>
> I am aware this are newbie's questions, but I have not be able to
> solve them using manuals and available sources.
> Thank you in advance for your help.
>
> Greetings,
> Dd
>
> PS
> R: version 2.14.2 (2012-02-29)
> OS: MS Windows XP Home 32-bit SP3
>
>
> *****************************
> Davide Rizzo
> post-doc researcher
> INRA UR055 SAD-ASTER
> website :: http://sites.google.com/site/ridavide/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list