[R] sbusetting data by rows (every 69 rows)
arun
smartpink111 at yahoo.com
Tue Aug 27 03:12:49 CEST 2013
Hi R.L.
No problem.
Try this:
#slightly modified the example:
set.seed(24)
dat1<- as.data.frame(matrix(sample(c(1:10,Inf,-Inf),2000*3,replace=TRUE),ncol=3))
lst1<-split(dat1,((seq_len(nrow(dat1))-1)%/%69)+1)
lst2<-lapply(lst1,function(x) {colnames(x)<-letters[1:3];x})
lst3<-lapply(lst2,function(x) {x[x==Inf|x==-Inf]<-0;x})
tail(lst2[[29]],7)
# a b c
#1994 -Inf Inf 5
#1995 10 5 7
#1996 5 5 5
#1997 6 Inf 10
#1998 10 7 4
#1999 10 Inf 1
#2000 Inf 8 2
tail(lst3[[29]],7)
# a b c
#1994 0 0 5
#1995 10 5 7
#1996 5 5 5
#1997 6 0 10
#1998 10 7 4
#1999 10 0 1
#2000 0 8 2
A.K.
Thank you AK!
So the final result 'res' is a list consisting of data frames!
Actually my initial idea is to replace infinite values with zero in the
list, and that was why i wished to convert the elements to data frame so
that I can proceed the replacement using the following code
list [ list == Inf ] = 0
list [ list == -Inf ] = 0
However, even the elements of the list are now data frame, I still
received error message : '(list) object cannot be coerced to type
'double' '
Does it mean I need to convert the whole list to data frame rather
than each element? If so, how could I do so? ( I tried
'as.data.frame(list)' which didn't work since the last element had less
rows)
Your contribution is highly appreciated!
Best regards,
R.L.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Monday, August 26, 2013 10:46 AM
Subject: Re: sbusetting data by rows (every 69 rows)
Hi R.L.,
No problem.
You may try:
set.seed(24)
dat1<- as.data.frame(matrix(sample(1:10,2000*3,replace=TRUE),ncol=3))
lst1<-split(dat1,((seq_len(nrow(dat1))-1)%/%69)+1)
lst2<-lapply(lst1,function(x) {colnames(x)<-letters[1:3];x})
res<-lapply(lst2,function(x) {x$z<-with(x,(a-b)/c);x})
head(res[[1]],3)
# a b c z
#1 3 1 8 0.250000
#2 3 2 2 0.500000
#3 8 3 3 1.666667
A.K.
Thank you very much for your help AK. The codes work efficiently!
Just a following up question -- do you happen to know how to
select certain columns in each element (since I need to apply
calculation on multiple columns for each element of the list)?
For example, list[1] looks like:
$`1`
a b c
1 2.1 1.4 3.4
2 4.4 2.6 5.5
3 2.6 0.4 3.0
...
$`2`
a b c
70 5.1 4.9 5.1
71 4.4 7.6 8.5
72 2.8 3.5 6.8
...
what I wish to do is something like
z = (a-b) / c
for each element ($`1`,$`2`...)
I tried the following code:
for( i in 1:23) { ##
there are 23 elements in the list ( sorry in fact I have 1566 rows in
total in sample)
z = (list[[i]]$a - list[[i]]$b) / list[[i]]$c
}
which gave me only 49 values, rather than 1566 values.
Thank you very much!
Kind regards,
R.L
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Sunday, August 25, 2013 1:28 PM
Subject: Re: sbusetting data by rows (every 69 rows)
#or you could try:
lst2<- split(dat1,as.numeric(gl(69,69,2000)))
# identical(lst1,lst2)
#[1] TRUE
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Sunday, August 25, 2013 1:17 PM
Subject: Re: sbusetting data by rows (every 69 rows)
Hi,
Try:
set.seed(24)
dat1<- as.data.frame(matrix(sample(1:400,2000*16,replace=TRUE),ncol=16))
lst1<-split(dat1,((seq_len(nrow(dat1))-1)%/%69)+1)
sapply(lst1,function(x) range(as.numeric(row.names(x))))
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#[1,] 1 70 139 208 277 346 415 484 553 622 691 760 829 898 967 1036 1105 1174
#[2,] 69 138 207 276 345 414 483 552 621 690 759 828 897 966 1035 1104 1173 1242
# 19 20 21 22 23 24 25 26 27 28 29
#[1,] 1243 1312 1381 1450 1519 1588 1657 1726 1795 1864 1933
#[2,] 1311 1380 1449 1518 1587 1656 1725 1794 1863 1932 2000
str(lst1[[1]])
#'data.frame': 69 obs. of 16 variables:
# $ V1 : int 118 90 282 208 266 369 112 306 321 102 ...
# $ V2 : int 6 50 115 247 355 109 39 297 35 209 ...
# $ V3 : int 313 67 102 298 367 23 376 91 5 38 ...
# $ V4 : int 207 351 212 342 255 399 239 57 234 79 ...
# $ V5 : int 74 80 289 165 231 193 310 255 98 218 ...
# $ V6 : int 99 91 325 143 398 66 201 337 66 382 ...
# $ V7 : int 339 327 325 274 22 105 106 75 400 167 ...
# $ V8 : int 135 233 91 306 230 140 233 166 210 351 ...
# $ V9 : int 204 203 256 337 25 295 214 288 63 388 ...
# $ V10: int 370 328 161 227 381 164 300 313 303 375 ...
# $ V11: int 171 373 133 345 60 119 215 48 55 367 ...
# $ V12: int 118 309 67 250 286 127 171 248 46 20 ...
# $ V13: int 385 15 282 276 130 166 160 214 58 74 ...
# $ V14: int 90 165 39 154 294 84 106 367 359 145 ...
# $ V15: int 392 290 103 14 111 148 200 331 302 88 ...
# $ V16: int 323 210 167 345 249 325 217 171 150 223 ...
sapply(lst1,nrow)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69
#27 28 29
#69 69 68
A.K.
Hi There,
It might be a simple problem but I didn't find a clear solution online.
The task is quite straightforward -- I have a large data frame with
more than 2000 rows and 16 columns. For further analysis, I need to
subset every 69 rows into some new data frames.
I tried to used the "for" command (code as showing below):
n = nrow(data)
w = 69
for(i in 1:(n-w)){
data= data[i:(i+w),]
}
But it only gave me a subset with the last 69 rows.
So my question is now how to subset the whole data frame with every 69 rows ( 1st to 69th rows, 70th to 139th rows, etc.).
Any help will be appreciated.
More information about the R-help
mailing list