[R] reshaping a dataset

Wed Sep 13 19:06:38 CEST 2006

Thank you Gabor,

I'll need to explore a bit the reshape package to see what benefits I  
get compared with the basic "reshape" function, but I'm glad you made  
me aware of it.

And your solution for fixing NAs just for the columns I want is just  
what I wanted.

Many thanks,

Denis
Le 06-09-13 à 00:55, Gabor Grothendieck a écrit :

> I missed your second question which was how to set the NAs to zero
> for some of the columns.  Suppose we want to replace the NAs
> in columns ic and for sake of example suppose ic specifies
> columns 1 to 8:
>
> library(reshape)
> testm <- melt(test, id = 1:6)
> out <- cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum)
>
> # fix up NAs
> ic <- 1:8
> out2 <- out[,ic]
> out2[is.na(out2)] <- 0
> out[,ic] <- out2
>
> On 9/13/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>> If I understand this correctly we want to sum the mass over each  
>> combination
>> of the first 6 variables and display the result with the 6th, prey,
>> along the top and the others along the side.
>>
>> library(reshape)
>> testm <- melt(test, id = 1:6)
>> cast(testm, nbpc + trip + set + tagno + depth ~ prey)
>>
>> Now fix up the NAs.
>>
>> On 9/12/06, Denis Chabot <chabotd at globetrotter.net> wrote:
>> > Hi,
>> >
>> > I'm trying to move to R the last few data handling routines I was
>> > performing in SAS.
>> >
>> > I'm working on stomach content data. In the simplified example I
>> > provide below, there are variables describing the origin of each  
>> prey
>> > item (nbpc is a ship number, each ship may have been used on
>> > different trips, each trip has stations, and individual fish  
>> (tagno)
>> > can be caught at each station.
>> >
>> > For each stomach the number of lines corresponds to the number of
>> > prey items. Thus a variable identifies prey type, and others (here
>> > only one, mass) provide information on prey abundance or size or
>> > digestion level.
>> >
>> > Finally, there can be accompanying variables that are not used but
>> > that I need to keep for later analyses (e.g. depth in the example
>> > below).
>> >
>> > At some point I need to transform such a dataset into another  
>> format
>> > where each stomach occupies a single line, and there are columns  
>> for
>> > each prey item.
>> >
>> > The "reshape" function works really well, my program is in fact
>> > simpler than the SAS equivalent (not shown, don't want to bore you,
>> > but available on request), except that I need zeros when prey types
>> > are absent from a stomach instead of NAs, a problem for which I  
>> only
>> > have a shaky solution at the moment:
>> >
>> > 1) creation of a dummy dataset:
>> > #######
>> > nbpc <- rep(c(20,34), c(110,90))
>> > trip <- c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
>> > set <- c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep 
>> (1:3,
>> > rep(10,3)),
>> >          rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
>> > depth <- c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
>> > (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
>> >          rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
>> > (15,2)))
>> > tagno <- rep(round(runif(42,1,200)),
>> >              c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,
>> > 7,3,
>> >                6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
>> > prey.codes <-c(187, 438, 792, 811)
>> > prey <- sample(prey.codes, 200, replace=T)
>> > mass <- runif(200, 0, 10)
>> >
>> > test <- data.frame(nbpc, trip, set, depth, tagno, prey, mass)
>> > ########
>> >
>> > Because there are often multiple occurrences of the same prey in a
>> > single stomach, I need to sum them for each stomach before using
>> > "reshape". Here I use summarizeBy because my understanding of the
>> > many variants of "apply" is not very good:
>> >
>> > ########
>> > test2 <- summaryBy(mass~nbpc+trip+set+tagno+prey, data=test,  
>> FUN=sum,
>> > keep.names=T, id=~depth)
>> >
>> > #this messes up sorting order, I fix it
>> > k <- order(test2$nbpc, test2$trip, test2$set, test2$tagno)
>> > test3 <- test2[k,]
>> > result <- reshape(test3, v.names="mass", idvar=c("nbpc", "trip",
>> > "set", "tagno"),
>> >                 timevar="prey", direction="wide")
>> > #########
>> >
>> > I'm quite happy with this, although you may know of better ways of
>> > doing it.
>> > But my problem is with preys that are absent from a stomach. In  
>> later
>> > analyses, I need them to have zero abundance instead of NA.
>> > My shaky solution is:
>> > #########
>> > empties <- is.na(result)
>> > result[empties] <- 0
>> > #########
>> >
>> > which did the job in this example, but it won't always. For  
>> instance
>> > there could have been NAs for "depth", which I do not want to  
>> become
>> > zero.
>> >
>> > Is there a way to transform NAs into zeros for multiple columns  
>> of a
>> > dataframe in one step, while ignoring some columns?
>> >
>> > Or maybe there is another way to achieve this that would have put
>> > zeros where I need them (i.e. something else than "reshape")?
>> >
>> > Thanking you in advance,
>> >
>> > Denis Chabot
>> >
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/ 
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>