[R] generating multiple sequences in subsets of data
Jason Baucom
jason.baucom at ateb.com
Fri Sep 11 22:44:10 CEST 2009
A bit of debugging information
> merged_cut_col$pickseq <- ave(as.numeric(as.Date(merged_cut_col$pickts)),merged_cut_col$cpid,as.numeric(as.Date(merged_cut_col$pickts)) > as.numeric(as.Date("2008-12-01")),FUN=seq)
Error: cannot allocate vector of size 55 Kb
> memory.size()
[1] 1882.56
> object.size(merged_cut_col)
75250816 bytes
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 226664 6.1 1423891 38.1 3463550 92.5
Vcells 19186778 146.4 156381436 1193.1 241372511 1841.6
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help at r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data
On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:
> I got this to work. Thanks for the insight! row7 is what I need.
>
>
>
>> checkLimit <-function(x) x<3
>
>> stuff$row6<-checkLimit(stuff$row1)
You don't actually need those intermediate steps:
> stuff$row7 <- with(stuff, ave(row1, row2, row1 < 3, FUN = seq))
> stuff
row1 row2 row7
1 0 1 1
2 1 1 2
3 2 1 3
4 3 1 1
5 4 1 2
6 5 1 3
7 1 2 1
8 2 2 2
9 3 2 1
10 4 2 2
The expression row1 < 3 gets turned into a logical vector that ave()
is perfectly happy with.
--
David Winsemius
>
>> stuff$row7 <- with(stuff, ave(row1,row2, row6, FUN = sequence))
>
>> stuff
>
> row1 row2 row3 row4 row5 row6 row7
>
> 1 0 1 1 1 1 TRUE 1
>
> 2 1 1 2 2 2 TRUE 2
>
> 3 2 1 3 3 3 TRUE 3
>
> 4 3 1 4 1 4 FALSE 1
>
> 5 4 1 5 1 5 FALSE 2
>
> 6 5 1 6 1 6 FALSE 3
>
> 7 1 2 1 1 1 TRUE 1
>
> 8 2 2 2 2 2 TRUE 2
>
> 9 3 2 3 1 3 FALSE 1
>
> 10 4 2 4 1 4 FALSE 2
>
>
>
> Jason
>
>
>
> ________________________________
>
> From: Henrique Dallazuanna [mailto:wwwhsd at gmail.com]
> Sent: Thursday, August 27, 2009 11:02 AM
> To: Jason Baucom
> Cc: r-help at r-project.org; Steven Few
> Subject: Re: [R] generating multiple sequences in subsets of data
>
>
>
> Try this;
>
> stuff$row3 <- with(stuff, ave(row1, row2, FUN = seq))
>
> I don't understand the fourth column
>
> On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom
> <jason.baucom at ateb.com> wrote:
>
> I'm running into a problem I can't seem to find a solution for. I'm
> attempting to add sequences into an existing data set based on subsets
> of the data. I've done this using a for loop with a small subset of
> data, but attempting the same process using real data (200k rows) is
> taking way too long.
>
>
>
> Here is some sample data and my ultimate goal
>
>> row1<-c(0,1,2,3,4,5,1,2,3,4)
>
>> row2<-c(1,1,1,1,1,1,2,2,2,2)
>
>> stuff<-data.frame(row1=row1,row2=row2)
>
>> stuff
>
> row1 row2
>
> 1 0 1
>
> 2 1 1
>
> 3 2 1
>
> 4 3 1
>
> 5 4 1
>
> 6 5 1
>
> 7 1 2
>
> 8 2 2
>
> 9 3 2
>
> 10 4 2
>
>
>
>
>
> I need to derive 2 columns. I need a sequence for each unique row2,
> and
> then I need a sequence that restarts based on a cutoff value for row1
> and unique row2. The following table is what is -should- look like
> using
> a cutoff of 3 for row4
>
>
>
> row1 row2 row3 row4
>
> 1 0 1 1 1
>
> 2 1 1 2 2
>
> 3 2 1 3 3
>
> 4 3 1 4 1
>
> 5 4 1 5 2
>
> 6 5 1 6 3
>
> 7 1 2 1 1
>
> 8 2 2 2 2
>
> 9 3 2 3 1
>
> 10 4 2 4 2
>
>
>
> I need something like row3<-sequence(nrow(unique(stuff$row2))) that
> actually works :-) Here is the for loop that functions properly for
> row3:
>
>
>
> stuff$row3<-c(1)
>
> for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
> stuff$row3[i] = stuff$row3[i-1]+1}}
>
> Thanks!
>
>
>
> Jason Baucom
>
> Ateb, Inc.
>
> 919.882.4992 O
>
> 919.872.1645 F
>
> www.ateb.com <http://www.ateb.com/>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list