[R] reshaping a dataset

Gabor Grothendieck ggrothendieck at gmail.com
Wed Sep 13 06:55:43 CEST 2006


I missed your second question which was how to set the NAs to zero
for some of the columns.  Suppose we want to replace the NAs
in columns ic and for sake of example suppose ic specifies
columns 1 to 8:

library(reshape)
testm <- melt(test, id = 1:6)
out <- cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum)

# fix up NAs
ic <- 1:8
out2 <- out[,ic]
out2[is.na(out2)] <- 0
out[,ic] <- out2

On 9/13/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> If I understand this correctly we want to sum the mass over each combination
> of the first 6 variables and display the result with the 6th, prey,
> along the top and the others along the side.
>
> library(reshape)
> testm <- melt(test, id = 1:6)
> cast(testm, nbpc + trip + set + tagno + depth ~ prey)
>
> Now fix up the NAs.
>
> On 9/12/06, Denis Chabot <chabotd at globetrotter.net> wrote:
> > Hi,
> >
> > I'm trying to move to R the last few data handling routines I was
> > performing in SAS.
> >
> > I'm working on stomach content data. In the simplified example I
> > provide below, there are variables describing the origin of each prey
> > item (nbpc is a ship number, each ship may have been used on
> > different trips, each trip has stations, and individual fish (tagno)
> > can be caught at each station.
> >
> > For each stomach the number of lines corresponds to the number of
> > prey items. Thus a variable identifies prey type, and others (here
> > only one, mass) provide information on prey abundance or size or
> > digestion level.
> >
> > Finally, there can be accompanying variables that are not used but
> > that I need to keep for later analyses (e.g. depth in the example
> > below).
> >
> > At some point I need to transform such a dataset into another format
> > where each stomach occupies a single line, and there are columns for
> > each prey item.
> >
> > The "reshape" function works really well, my program is in fact
> > simpler than the SAS equivalent (not shown, don't want to bore you,
> > but available on request), except that I need zeros when prey types
> > are absent from a stomach instead of NAs, a problem for which I only
> > have a shaky solution at the moment:
> >
> > 1) creation of a dummy dataset:
> > #######
> > nbpc <- rep(c(20,34), c(110,90))
> > trip <- c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
> > set <- c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3,
> > rep(10,3)),
> >          rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
> > depth <- c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
> > (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
> >          rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
> > (15,2)))
> > tagno <- rep(round(runif(42,1,200)),
> >              c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,
> > 7,3,
> >                6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
> > prey.codes <-c(187, 438, 792, 811)
> > prey <- sample(prey.codes, 200, replace=T)
> > mass <- runif(200, 0, 10)
> >
> > test <- data.frame(nbpc, trip, set, depth, tagno, prey, mass)
> > ########
> >
> > Because there are often multiple occurrences of the same prey in a
> > single stomach, I need to sum them for each stomach before using
> > "reshape". Here I use summarizeBy because my understanding of the
> > many variants of "apply" is not very good:
> >
> > ########
> > test2 <- summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum,
> > keep.names=T, id=~depth)
> >
> > #this messes up sorting order, I fix it
> > k <- order(test2$nbpc, test2$trip, test2$set, test2$tagno)
> > test3 <- test2[k,]
> > result <- reshape(test3, v.names="mass", idvar=c("nbpc", "trip",
> > "set", "tagno"),
> >                 timevar="prey", direction="wide")
> > #########
> >
> > I'm quite happy with this, although you may know of better ways of
> > doing it.
> > But my problem is with preys that are absent from a stomach. In later
> > analyses, I need them to have zero abundance instead of NA.
> > My shaky solution is:
> > #########
> > empties <- is.na(result)
> > result[empties] <- 0
> > #########
> >
> > which did the job in this example, but it won't always. For instance
> > there could have been NAs for "depth", which I do not want to become
> > zero.
> >
> > Is there a way to transform NAs into zeros for multiple columns of a
> > dataframe in one step, while ignoring some columns?
> >
> > Or maybe there is another way to achieve this that would have put
> > zeros where I need them (i.e. something else than "reshape")?
> >
> > Thanking you in advance,
> >
> > Denis Chabot
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list