[R] problem applying the same function twice
Sarah Goslee
sarah.goslee at gmail.com
Tue Mar 10 21:04:37 CET 2015
Hi,
I didn't work through your code, because it looked overly complicated.
Here's a more general approach that does what you appear to want:
# use dput() to provide reproducible data please!
comAn <- structure(list(animals = c("bird", "bird", "bird", "bird", "bird",
"bird", "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat",
"cat", "cat"), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
)), .Names = c("animals", "animalYears", "animalMass"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16"))
# add reps to comAn
# assumes comAn is already sorted on animals, animalYears
comAn$reps <- unlist(sapply(rle(do.call("paste",
comAn[,1:2]))$lengths, seq_len))
# create full set of combinations
outgrid <- expand.grid(animals=unique(comAn$animals),
animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
stringsAsFactors=FALSE)
# combine with comAn
comAn.full <- merge(outgrid, comAn, all.x=TRUE)
> comAn.full
animals animalYears reps animalMass
1 bird 1 1 29
2 bird 1 2 48
3 bird 1 3 36
4 bird 2 1 20
5 bird 2 2 34
6 bird 2 3 34
7 cat 1 1 46
8 cat 1 2 33
9 cat 1 3 48
10 cat 2 1 21
11 cat 2 2 NA
12 cat 2 3 NA
13 dog 1 1 21
14 dog 1 2 28
15 dog 1 3 25
16 dog 2 1 35
17 dog 2 2 18
18 dog 2 3 11
>
On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
<curtisburkhalter at gmail.com> wrote:
> Hey everyone,
>
> I've written a function that adds NAs to a dataframe where data is missing
> and it seems to work great if I only need to run it once, but if I run it
> two times in a row I run into problems. I've created a workable example to
> explain what I mean and why I would do this.
>
> In my dataframe there are areas where I need to add two rows of NAs (b/c I
> need to have 3 animal x year combos and for cat in year 2 I only have one)
> so I thought that I'd just run my code twice using the function in the code
> below. Everything works great when I run it the first time, but when I run
> it again it says that the value returned to the list 'x' is of length 0. I
> don't understand why the function works the first time around and adds an
> NA to the 'animalMass' column, but won't do it again. I've used
> (print(str(dataframe)) to see if there is a change in class or type when
> the function runs through the original dataframe and there is for
> 'animalYears', but I just convert it back before rerunning the function for
> second time.
>
> Any thoughts on this would be greatly appreciated b/c my actual data
> dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial
> thing to just add in an NA here or there.
>
>>comAn
> animals animalYears animalMass
> 1 bird 1 29
> 2 bird 1 48
> 3 bird 1 36
> 4 bird 2 20
> 5 bird 2 34
> 6 bird 2 34
> 7 dog 1 21
> 8 dog 1 28
> 9 dog 1 25
> 10 dog 2 35
> 11 dog 2 18
> 12 dog 2 11
> 13 cat 1 46
> 14 cat 1 33
> 15 cat 1 48
> 16 cat 2 21
>
> So every animal has 3 measurements per year, except for the cat in year two
> which has only 1. I run the code below and get:
>
> #combs defines the different combinations of
> #animals and animalYears
> combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> #counts defines how long the different combinations are
> counts<-ave(1:nrow(comAn),combs,FUN=length)
> #missing defines the combs that have length less than one and puts it in
> #the data frame missing
> missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
>
> genRows<-function(dat){
> vals<-strsplit(dat[1],':')[[1]]
> #not sure why dat[2] is being converted to a string
> newRows<-2-as.numeric(dat[2])
> newDf<-data.frame(animals=rep(vals[1],newRows),
> animalYears=rep(vals[2],newRows),
> animalMass=rep(NA,newRows))
> return(newDf)
> }
>
>
> x<-apply(missing,1,genRows)
> comAn=rbind(comAn,
> do.call(rbind,x))
>
>> comAn
> animals animalYears animalMass
> 1 bird 1 29
> 2 bird 1 48
> 3 bird 1 36
> 4 bird 2 20
> 5 bird 2 34
> 6 bird 2 34
> 7 dog 1 21
> 8 dog 1 28
> 9 dog 1 25
> 10 dog 2 35
> 11 dog 2 18
> 12 dog 2 11
> 13 cat 1 46
> 14 cat 1 33
> 15 cat 1 48
> 16 cat 2 21
> 17 cat 2 <NA>
>
> So far so good, but then I adjust the code so that it reads (**notice the
> change in the specification in 'missing' to counts<3**):
>
> #combs defines the different combinations of
> #animals and animalYears
> combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> #counts defines how long the different combinations are
> counts<-ave(1:nrow(comAn),combs,FUN=length)
> #missing defines the combs that have length less than one and puts it in
> #the data frame missing
> missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
>
> genRows<-function(dat){
> vals<-strsplit(dat[1],':')[[1]]
> #not sure why dat[2] is being converted to a string
> newRows<-2-as.numeric(dat[2])
> newDf<-data.frame(animals=rep(vals[1],newRows),
> animalYears=rep(vals[2],newRows),
> animalMass=rep(NA,newRows))
> return(newDf)
> }
>
>
> x<-apply(missing,1,genRows)
> comAn=rbind(comAn,
> do.call(rbind,x))
>
> The result for 'x' then reads:
>
>> x
> [[1]]
> [1] animals animalYears animalMass
> <0 rows> (or 0-length row.names)
>
> Any thoughts on why it might be doing this instead of adding an additional
> row to get the result:
>
>> comAn
> animals animalYears animalMass
> 1 bird 1 29
> 2 bird 1 48
> 3 bird 1 36
> 4 bird 2 20
> 5 bird 2 34
> 6 bird 2 34
> 7 dog 1 21
> 8 dog 1 28
> 9 dog 1 25
> 10 dog 2 35
> 11 dog 2 18
> 12 dog 2 11
> 13 cat 1 46
> 14 cat 1 33
> 15 cat 1 48
> 16 cat 2 21
> 17 cat 2 <NA>
> 18 cat 2 <NA>
>
> Thanks
> --
> Curtis Burkhalter
More information about the R-help
mailing list