[R] generating multiple sequences in subsets of data

Jason Baucom jason.baucom at ateb.com
Fri Sep 11 22:44:10 CEST 2009


A bit of debugging information

> merged_cut_col$pickseq <- ave(as.numeric(as.Date(merged_cut_col$pickts)),merged_cut_col$cpid,as.numeric(as.Date(merged_cut_col$pickts)) > as.numeric(as.Date("2008-12-01")),FUN=seq)
Error: cannot allocate vector of size 55 Kb
> memory.size()
[1] 1882.56
> object.size(merged_cut_col)
75250816 bytes
> gc()
           used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   226664   6.1    1423891   38.1   3463550   92.5
Vcells 19186778 146.4  156381436 1193.1 241372511 1841.6

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help at r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data


On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:

> I got this to work. Thanks for the insight! row7 is what I need.
>
>
>
>> checkLimit <-function(x) x<3
>
>> stuff$row6<-checkLimit(stuff$row1)

You don't actually need those intermediate steps:

 > stuff$row7 <- with(stuff, ave(row1, row2, row1 < 3, FUN = seq))
 > stuff
    row1 row2 row7
1     0    1    1
2     1    1    2
3     2    1    3
4     3    1    1
5     4    1    2
6     5    1    3
7     1    2    1
8     2    2    2
9     3    2    1
10    4    2    2

The expression row1 < 3 gets turned into a logical vector that ave()  
is perfectly happy with.

-- 
David Winsemius

>
>> stuff$row7 <- with(stuff, ave(row1,row2, row6, FUN = sequence))
>
>> stuff
>
>   row1 row2 row3 row4 row5  row6 row7
>
> 1     0    1    1    1    1  TRUE    1
>
> 2     1    1    2    2    2  TRUE    2
>
> 3     2    1    3    3    3  TRUE    3
>
> 4     3    1    4    1    4 FALSE    1
>
> 5     4    1    5    1    5 FALSE    2
>
> 6     5    1    6    1    6 FALSE    3
>
> 7     1    2    1    1    1  TRUE    1
>
> 8     2    2    2    2    2  TRUE    2
>
> 9     3    2    3    1    3 FALSE    1
>
> 10    4    2    4    1    4 FALSE    2
>
>
>
> Jason
>
>
>
> ________________________________
>
> From: Henrique Dallazuanna [mailto:wwwhsd at gmail.com]
> Sent: Thursday, August 27, 2009 11:02 AM
> To: Jason Baucom
> Cc: r-help at r-project.org; Steven Few
> Subject: Re: [R] generating multiple sequences in subsets of data
>
>
>
> Try this;
>
> stuff$row3 <- with(stuff, ave(row1, row2, FUN = seq))
>
> I don't understand the fourth column
>
> On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom  
> <jason.baucom at ateb.com> wrote:
>
> I'm running into a problem I can't seem to find a solution for. I'm
> attempting to add sequences into an existing data set based on subsets
> of the data.  I've done this using a for loop with a small subset of
> data, but attempting the same process using real data (200k rows) is
> taking way too long.
>
>
>
> Here is some sample data and my ultimate goal
>
>> row1<-c(0,1,2,3,4,5,1,2,3,4)
>
>> row2<-c(1,1,1,1,1,1,2,2,2,2)
>
>> stuff<-data.frame(row1=row1,row2=row2)
>
>> stuff
>
>  row1 row2
>
> 1     0    1
>
> 2     1    1
>
> 3     2    1
>
> 4     3    1
>
> 5     4    1
>
> 6     5    1
>
> 7     1    2
>
> 8     2    2
>
> 9     3    2
>
> 10    4    2
>
>
>
>
>
> I need to derive 2 columns. I need a sequence for each unique row2,  
> and
> then I need a sequence that restarts based on a cutoff value for row1
> and unique row2. The following table is what is -should- look like  
> using
> a cutoff of 3 for row4
>
>
>
>  row1 row2 row3 row4
>
> 1     0    1    1    1
>
> 2     1    1    2    2
>
> 3     2    1    3    3
>
> 4     3    1    4    1
>
> 5     4    1    5    2
>
> 6     5    1    6    3
>
> 7     1    2    1    1
>
> 8     2    2    2    2
>
> 9     3    2    3    1
>
> 10    4    2    4    2
>
>
>
> I need something like row3<-sequence(nrow(unique(stuff$row2))) that
> actually works :-) Here is the for loop that functions properly for
> row3:
>
>
>
> stuff$row3<-c(1)
>
> for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
> stuff$row3[i] = stuff$row3[i-1]+1}}
>
> Thanks!
>
>
>
> Jason Baucom
>
> Ateb, Inc.
>
> 919.882.4992 O
>
> 919.872.1645 F
>
> www.ateb.com <http://www.ateb.com/>
>
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> -- 
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list