[R] problem applying the same function twice
Curtis Burkhalter
curtisburkhalter at gmail.com
Tue Mar 10 21:57:14 CET 2015
Sarah,
I have 669 sites and each site has 7 years of data, so if I'm thinking
correctly then there should be 4683 possible combinations of site x year.
For each year though I need 3 sampling periods so that there is something
like the following:
site 1 year1 sample 1
site 1 year1 sample 2
site 1 year1 sample 3
site 2 year1 sample 1
site 2 year1 sample 2
site 2 year1 sample 3.....
site 669 year7 sample 1
site 669 year7 sample 2
site 669 year7 sample 3.
I have my max memory allocation set to the amount of RAM (8GB) on my
laptop, but it still 'times out' due to memory problems.
On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> You said your data only had 14000 rows, which really isn't many.
>
> How many possible combinations do you have, and how many do you need to
> add?
>
> On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
> <curtisburkhalter at gmail.com> wrote:
> > Sarah,
> >
> > This strategy works great for this small dataset, but when I attempt your
> > method with my data set I reach the maximum allowable memory allocation
> and
> > the operation just stalls and then stops completely before it is
> finished.
> > Do you know of a way around this?
> >
> > Thanks
> >
> > On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee <sarah.goslee at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I didn't work through your code, because it looked overly complicated.
> >> Here's a more general approach that does what you appear to want:
> >>
> >> # use dput() to provide reproducible data please!
> >> comAn <- structure(list(animals = c("bird", "bird", "bird", "bird",
> >> "bird",
> >> "bird", "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat",
> >> "cat", "cat"), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
> >> 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
> >> 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
> >> )), .Names = c("animals", "animalYears", "animalMass"), class =
> >> "data.frame", row.names = c("1",
> >> "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> >> "14", "15", "16"))
> >>
> >>
> >> # add reps to comAn
> >> # assumes comAn is already sorted on animals, animalYears
> >> comAn$reps <- unlist(sapply(rle(do.call("paste",
> >> comAn[,1:2]))$lengths, seq_len))
> >>
> >> # create full set of combinations
> >> outgrid <- expand.grid(animals=unique(comAn$animals),
> >> animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
> >> stringsAsFactors=FALSE)
> >>
> >> # combine with comAn
> >> comAn.full <- merge(outgrid, comAn, all.x=TRUE)
> >>
> >> > comAn.full
> >> animals animalYears reps animalMass
> >> 1 bird 1 1 29
> >> 2 bird 1 2 48
> >> 3 bird 1 3 36
> >> 4 bird 2 1 20
> >> 5 bird 2 2 34
> >> 6 bird 2 3 34
> >> 7 cat 1 1 46
> >> 8 cat 1 2 33
> >> 9 cat 1 3 48
> >> 10 cat 2 1 21
> >> 11 cat 2 2 NA
> >> 12 cat 2 3 NA
> >> 13 dog 1 1 21
> >> 14 dog 1 2 28
> >> 15 dog 1 3 25
> >> 16 dog 2 1 35
> >> 17 dog 2 2 18
> >> 18 dog 2 3 11
> >> >
> >>
> >> On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
> >> <curtisburkhalter at gmail.com> wrote:
> >> > Hey everyone,
> >> >
> >> > I've written a function that adds NAs to a dataframe where data is
> >> > missing
> >> > and it seems to work great if I only need to run it once, but if I run
> >> > it
> >> > two times in a row I run into problems. I've created a workable
> example
> >> > to
> >> > explain what I mean and why I would do this.
> >> >
> >> > In my dataframe there are areas where I need to add two rows of NAs
> (b/c
> >> > I
> >> > need to have 3 animal x year combos and for cat in year 2 I only have
> >> > one)
> >> > so I thought that I'd just run my code twice using the function in the
> >> > code
> >> > below. Everything works great when I run it the first time, but when I
> >> > run
> >> > it again it says that the value returned to the list 'x' is of length
> 0.
> >> > I
> >> > don't understand why the function works the first time around and adds
> >> > an
> >> > NA to the 'animalMass' column, but won't do it again. I've used
> >> > (print(str(dataframe)) to see if there is a change in class or type
> when
> >> > the function runs through the original dataframe and there is for
> >> > 'animalYears', but I just convert it back before rerunning the
> function
> >> > for
> >> > second time.
> >> >
> >> > Any thoughts on this would be greatly appreciated b/c my actual data
> >> > dataframe I have to input into WinBUGS is 14000x12, so it's not a
> >> > trivial
> >> > thing to just add in an NA here or there.
> >> >
> >> >>comAn
> >> > animals animalYears animalMass
> >> > 1 bird 1 29
> >> > 2 bird 1 48
> >> > 3 bird 1 36
> >> > 4 bird 2 20
> >> > 5 bird 2 34
> >> > 6 bird 2 34
> >> > 7 dog 1 21
> >> > 8 dog 1 28
> >> > 9 dog 1 25
> >> > 10 dog 2 35
> >> > 11 dog 2 18
> >> > 12 dog 2 11
> >> > 13 cat 1 46
> >> > 14 cat 1 33
> >> > 15 cat 1 48
> >> > 16 cat 2 21
> >> >
> >> > So every animal has 3 measurements per year, except for the cat in
> year
> >> > two
> >> > which has only 1. I run the code below and get:
> >> >
> >> > #combs defines the different combinations of
> >> > #animals and animalYears
> >> > combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> > #counts defines how long the different combinations are
> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> > #missing defines the combs that have length less than one and puts it
> in
> >> > #the data frame missing
> >> > missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
> >> >
> >> > genRows<-function(dat){
> >> > vals<-strsplit(dat[1],':')[[1]]
> >> > #not sure why dat[2] is being converted to a string
> >> > newRows<-2-as.numeric(dat[2])
> >> > newDf<-data.frame(animals=rep(vals[1],newRows),
> >> > animalYears=rep(vals[2],newRows),
> >> > animalMass=rep(NA,newRows))
> >> > return(newDf)
> >> > }
> >> >
> >> >
> >> > x<-apply(missing,1,genRows)
> >> > comAn=rbind(comAn,
> >> > do.call(rbind,x))
> >> >
> >> >> comAn
> >> > animals animalYears animalMass
> >> > 1 bird 1 29
> >> > 2 bird 1 48
> >> > 3 bird 1 36
> >> > 4 bird 2 20
> >> > 5 bird 2 34
> >> > 6 bird 2 34
> >> > 7 dog 1 21
> >> > 8 dog 1 28
> >> > 9 dog 1 25
> >> > 10 dog 2 35
> >> > 11 dog 2 18
> >> > 12 dog 2 11
> >> > 13 cat 1 46
> >> > 14 cat 1 33
> >> > 15 cat 1 48
> >> > 16 cat 2 21
> >> > 17 cat 2 <NA>
> >> >
> >> > So far so good, but then I adjust the code so that it reads (**notice
> >> > the
> >> > change in the specification in 'missing' to counts<3**):
> >> >
> >> > #combs defines the different combinations of
> >> > #animals and animalYears
> >> > combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> > #counts defines how long the different combinations are
> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> > #missing defines the combs that have length less than one and puts it
> in
> >> > #the data frame missing
> >> > missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
> >> >
> >> > genRows<-function(dat){
> >> > vals<-strsplit(dat[1],':')[[1]]
> >> > #not sure why dat[2] is being converted to a string
> >> > newRows<-2-as.numeric(dat[2])
> >> > newDf<-data.frame(animals=rep(vals[1],newRows),
> >> > animalYears=rep(vals[2],newRows),
> >> > animalMass=rep(NA,newRows))
> >> > return(newDf)
> >> > }
> >> >
> >> >
> >> > x<-apply(missing,1,genRows)
> >> > comAn=rbind(comAn,
> >> > do.call(rbind,x))
> >> >
> >> > The result for 'x' then reads:
> >> >
> >> >> x
> >> > [[1]]
> >> > [1] animals animalYears animalMass
> >> > <0 rows> (or 0-length row.names)
> >> >
> >> > Any thoughts on why it might be doing this instead of adding an
> >> > additional
> >> > row to get the result:
> >> >
> >> >> comAn
> >> > animals animalYears animalMass
> >> > 1 bird 1 29
> >> > 2 bird 1 48
> >> > 3 bird 1 36
> >> > 4 bird 2 20
> >> > 5 bird 2 34
> >> > 6 bird 2 34
> >> > 7 dog 1 21
> >> > 8 dog 1 28
> >> > 9 dog 1 25
> >> > 10 dog 2 35
> >> > 11 dog 2 18
> >> > 12 dog 2 11
> >> > 13 cat 1 46
> >> > 14 cat 1 33
> >> > 15 cat 1 48
> >> > 16 cat 2 21
> >> > 17 cat 2 <NA>
> >> > 18 cat 2 <NA>
> >> >
> >> > Thanks
> >> > --
> >> > Curtis Burkhalter
> >
> >
>
--
Curtis Burkhalter
https://sites.google.com/site/curtisburkhalter/
[[alternative HTML version deleted]]
More information about the R-help
mailing list