[R] Help processing large data
jim holtman
jholtman at gmail.com
Sat Nov 29 02:15:36 CET 2008
Is this what you want:
> x <- read.table(textConnection('"read" "no" "length"
+ 2 2 144
+ 7 7 47490
+ 9 9 310944
+ 11 11 10089
+ 14 14 13152
+ 17 17 27363 '), header=TRUE)
> closeAllConnections()
> result <- lapply(1:nrow(x), function(.indx){
+ data.frame(read=paste(x$read[.indx], seq(x$length[.indx] %/% 100
+ 1), sep="_"),
+ no=rep(x$no[.indx], x$length[.indx] %/% 100 + 1),
+ length=c(rep(100, x$length[.indx] %/% 100), x$length[.indx] %% 100))
+ })
> result <- do.call(rbind, result)
>
> str(result)
'data.frame': 4094 obs. of 3 variables:
$ read : Factor w/ 4094 levels "2_1","2_2","7_1",..: 1 2 3 114 225
336 423 434 445 456 ...
$ no : int 2 2 7 7 7 7 7 7 7 7 ...
$ length: num 100 44 100 100 100 100 100 100 100 100 ...
> head(result)
read no length
1 2_1 2 100
2 2_2 2 44
3 7_1 7 100
4 7_2 7 100
5 7_3 7 100
6 7_4 7 100
>
On Thu, Nov 27, 2008 at 5:16 AM, mitras <suparna.mitra at gmail.com> wrote:
>
> Dear all,
> I have one problem to handle a large dataset...
> It looks like:
> "read" "no" "length"
> 2 2 144
> 7 7 47490
> 9 9 310944
> 11 11 10089
> 14 14 13152
> 17 17 27363 and so on
> There are 130000 rows
>
> >From this table I need to make a table like
> 2_1 2 100
> 2_2 2 44
> 7_1 7 100
> 7_2 7 100
> ...
> ...
> 7_474 7 100
> 7_475 7 90
> 9_1 9 100
> 9_2 9 100 and so on...
>
> In words: I want to divide the 3rd column by 100 to keep the length 100 and
> increasing no of rows needed, where no will be same for all increased rows,
> but the read will be changed like 2_1,2_2 and so on..
> Please let me know if any one can help.
> Thanks a lot in advance.
> Best,
> Mitra.
> --
> View this message in context: http://www.nabble.com/Help-processing-large-data-tp20716564p20716564.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list